roboflow
diff --git a/‎notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb‎
Lines changed: 14 additions & 9 deletions b/‎notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb‎
Lines changed: 14 additions & 9 deletions
@@ -3108,20 +3108,15 @@
         "\n",
         "---\n",
         "\n",
+        "[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-finetune-florence-2-on-detection-dataset.ipynb)\n",
         "[![Roboflow](https://raw.githubusercontent.com/roboflow-ai/notebooks/main/assets/badges/roboflow-blogpost.svg)](https://blog.roboflow.com/florence-2/)\n",
         "[![arXiv](https://img.shields.io/badge/arXiv-2311.06242-b31b1b.svg)](https://arxiv.org/abs/2311.06242)\n",
         "\n",
         "Florence-2 is a lightweight vision-language model open-sourced by Microsoft under the MIT license. The model demonstrates strong zero-shot and fine-tuning capabilities across tasks such as captioning, object detection, grounding, and segmentation.\n",
         "\n",
-        "![Florence-2 Figure.1](https://storage.googleapis.com/com-roboflow-marketing/notebooks/examples/florence-2-figure-1.png)\n",
-        "\n",
-        "*Figure 1. Illustration showing the level of spatial hierarchy and semantic granularity expressed by each task. Source: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.*\n",
-        "\n",
         "The model takes images and task prompts as input, generating the desired results in text format. It uses a DaViT vision encoder to convert images into visual token embeddings. These are then concatenated with BERT-generated text embeddings and processed by a transformer-based multi-modal encoder-decoder to generate the response.\n",
         "\n",
-        "![Florence-2 Figure.2](https://storage.googleapis.com/com-roboflow-marketing/notebooks/examples/florence-2-figure-2.png)\n",
-        "\n",
-        "*Figure 2. Overview of Florence-2 architecture. Source: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks.*\n",
+        "![Florence-2 Figure.1](https://storage.googleapis.com/com-roboflow-marketing/notebooks/examples/florence-2-figure-1.png)\n",
         "\n"
       ],
       "metadata": {
@@ -5296,9 +5291,19 @@
     {
       "cell_type": "markdown",
       "source": [
-        "# Congratulations\n",
+        "<div align=\"center\">\n",
+        "  <p>\n",
+        "    Looking for more tutorials or have questions?\n",
+        "    Check out our <a href=\"https://github.com/roboflow/notebooks\">GitHub repo</a> for more notebooks,\n",
+        "    or visit our <a href=\"https://discord.gg/GbfgXGJ8Bk\">discord</a>.\n",
+        "  </p>\n",
+        "  \n",
+        "  <p>\n",
+        "    <strong>If you found this helpful, please consider giving us a ⭐\n",
+        "    <a href=\"https://github.com/roboflow/notebooks\">on GitHub</a>!</strong>\n",
+        "  </p>\n",
         "\n",
-        "⭐️ If you enjoyed this notebook, [**star the Roboflow Notebooks repo**](https://https://github.com/roboflow/notebooks) (and [**supervision**](https://github.com/roboflow/supervision) while you're at it) and let us know what tutorials you'd like to see us do next. ⭐️"
+        "</div>"
       ],
       "metadata": {
         "id": "ag0XROk7fcd_"