diff --git a/    generative_ai/sm-huggingface_text_classification.ipynb b/    generative_ai/sm-huggingface_text_classification.ipynb
new file mode 100644
index 0000000000..5be335e378
--- /dev/null
+++ b/    generative_ai/sm-huggingface_text_classification.ipynb	
@@ -0,0 +1,943 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 🤗 Fine-Tuning HuggingFace Models on Amazon SageMaker\n",
+    "\n",
+    "## Complete Tutorial for Text Classification\n",
+    "\n",
+    "**SageMaker DLC:** PyTorch 2.5.1 + Transformers 4.49.0\n",
+    "\n",
+    "---\n",
+    "\n",
+    "### 📚 Quick Links\n",
+    "\n",
+    "| Resource | Link |\n",
+    "|----------|------|\n",
+    "| [AWS Deep Learning Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md) | All available DLC images |\n",
+    "| [HuggingFace Model Hub](https://huggingface.co/models) | Browse 400k+ models |\n",
+    "| [HuggingFace Datasets](https://huggingface.co/datasets) | Browse 100k+ datasets |\n",
+    "| [SageMaker HuggingFace SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html) | SDK docs |\n",
+    "| [SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/) | Instance pricing |\n",
+    "| [Transformers Docs](https://huggingface.co/docs/transformers/) | API docs |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 🎯 Tutorial Workflow\n",
+    "\n",
+    "```\n",
+    "┌─────────────────────────────────────────────────────────────────────────────┐\n",
+    "│                           TUTORIAL WORKFLOW                                  │\n",
+    "├─────────────────────────────────────────────────────────────────────────────┤\n",
+    "│                                                                              │\n",
+    "│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐             │\n",
+    "│   │  Part 1  │───▶│  Part 2  │───▶│  Part 3  │───▶│  Part 4  │             │\n",
+    "│   │  Setup   │    │   Data   │    │  Script  │    │  Train   │             │\n",
+    "│   └──────────┘    └──────────┘    └──────────┘    └──────────┘             │\n",
+    "│        │                                               │                    │\n",
+    "│        │              ┌──────────────────────────────┐ │                    │\n",
+    "│        │              │    Model Artifacts (S3)      │◀┘                    │\n",
+    "│        │              └──────────────────────────────┘                      │\n",
+    "│        │                              │                                      │\n",
+    "│        ▼                              ▼                                      │\n",
+    "│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐             │\n",
+    "│   │  Part 5  │◀───│  Part 6  │◀───│  Part 7  │───▶│  Part 8  │             │\n",
+    "│   │  Deploy  │    │ Inference│    │ Advanced │    │  Cleanup │             │\n",
+    "│   └──────────┘    └──────────┘    └──────────┘    └──────────┘             │\n",
+    "│                                                                              │\n",
+    "└─────────────────────────────────────────────────────────────────────────────┘\n",
+    "```\n",
+    "\n",
+    "### 🤖 Supported Models\n",
+    "\n",
+    "| Model | ID | Params | Model Card |\n",
+    "|-------|-----|--------|------------|\n",
+    "| BERT Base | `bert-base-uncased` | 110M | [Link](https://huggingface.co/bert-base-uncased) |\n",
+    "| RoBERTa Base | `roberta-base` | 125M | [Link](https://huggingface.co/roberta-base) |\n",
+    "| DistilBERT | `distilbert-base-uncased` | 66M | [Link](https://huggingface.co/distilbert-base-uncased) |\n",
+    "| DeBERTa v3 | `microsoft/deberta-v3-base` | 184M | [Link](https://huggingface.co/microsoft/deberta-v3-base) |\n",
+    "| ELECTRA | `google/electra-base-discriminator` | 110M | [Link](https://huggingface.co/google/electra-base-discriminator) |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Part 1: Environment Setup\n",
+    "\n",
+    "📖 **Docs:** [SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/) | [Transformers](https://huggingface.co/docs/transformers/installation)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install sagemaker==2.255.0"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sagemaker\n",
+    "import boto3\n",
+    "import os\n",
+    "from datetime import datetime\n",
+    "from sagemaker.huggingface import HuggingFace, HuggingFaceModel\n",
+    "from datasets import load_dataset, DatasetDict\n",
+    "from transformers import AutoTokenizer\n",
+    "\n",
+    "# Session setup\n",
+    "sagemaker_session = sagemaker.Session()\n",
+    "role = sagemaker.get_execution_role()\n",
+    "region = sagemaker_session.boto_region_name\n",
+    "bucket = sagemaker_session.default_bucket()\n",
+    "\n",
+    "print(f\"📍 Region: {region}\")\n",
+    "print(f\"Execution Role: {role}\")\n",
+    "print(f\"🪣 Bucket: {bucket}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Container Versions\n",
+    "\n",
+    "📖 **Find versions:** [AWS DLC Images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md)\n",
+    "\n",
+    "| PyTorch | Transformers | Python | Status |\n",
+    "|---------|--------------|--------|--------|\n",
+    "| **2.5.1** | **4.49.0** | py311 | ✅ Latest |\n",
+    "| 2.1.0 | 4.36.0 | py310 | Supported |"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ Model: bert-base-uncased\n",
+      "📖 Card: https://huggingface.co/bert-base-uncased\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Configuration\n",
+    "MODELS = {\n",
+    "    \"bert-base\": \"bert-base-uncased\",\n",
+    "    \"roberta-base\": \"roberta-base\",\n",
+    "    \"distilbert\": \"distilbert-base-uncased\",\n",
+    "    \"deberta-v3\": \"microsoft/deberta-v3-base\",\n",
+    "}\n",
+    "\n",
+    "SELECTED_MODEL = \"bert-base\"\n",
+    "MODEL_NAME = MODELS[SELECTED_MODEL]\n",
+    "\n",
+    "HYPERPARAMETERS = {\n",
+    "    \"epochs\": 3,\n",
+    "    \"train_batch_size\": 16,\n",
+    "    \"learning_rate\": 2e-5,\n",
+    "    \"max_length\": 128,\n",
+    "    \"model_name\": MODEL_NAME,\n",
+    "}\n",
+    "\n",
+    "TRAINING_INSTANCE = \"ml.p3.2xlarge\"\n",
+    "INFERENCE_INSTANCE = \"ml.g4dn.xlarge\"\n",
+    "S3_PREFIX = \"hf-tutorial\"\n",
+    "TIMESTAMP = datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n",
+    "\n",
+    "print(f\"✅ Model: {MODEL_NAME}\")\n",
+    "print(f\"📖 Card: https://huggingface.co/{MODEL_NAME}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Part 2: Data Preparation\n",
+    "\n",
+    "### Data Pipeline Overview\n",
+    "\n",
+    "```\n",
+    "┌─────────────────────────────────────────────────────────────────────────────┐\n",
+    "│                           Data Pipeline                                      │\n",
+    "├─────────────────────────────────────────────────────────────────────────────┤\n",
+    "│                                                                              │\n",
+    "│   1. Load & Tokenize          2. Save (Arrow)         3. Upload to S3       │\n",
+    "│   ──────────────────          ───────────────         ─────────────────     │\n",
+    "│   load_dataset()        →     save_to_disk()     →    aws s3 sync           │\n",
+    "│   AutoTokenizer()             train_data/             s3://bucket/train/    │\n",
+    "│                               ├── data.arrow          ├── data.arrow        │\n",
+    "│                               ├── dataset_info.json   ├── dataset_info.json │\n",
+    "│                               └── state.json          └── state.json        │\n",
+    "│                                                                              │\n",
+    "│   4. Training Container                                                      │\n",
+    "│   ─────────────────────                                                      │\n",
+    "│   SageMaker downloads S3 → /opt/ml/input/data/train/                        │\n",
+    "│   train.py calls: load_from_disk(\"/opt/ml/input/data/train\")               │\n",
+    "│                                                                              │\n",
+    "└─────────────────────────────────────────────────────────────────────────────┘\n",
+    "```\n",
+    "\n",
+    "📖 **Datasets:** [HuggingFace Hub](https://huggingface.co/datasets)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ Dataset: ag_news\n",
+      "DatasetDict({\n",
+      "    train: Dataset({\n",
+      "        features: ['text', 'label'],\n",
+      "        num_rows: 120000\n",
+      "    })\n",
+      "    test: Dataset({\n",
+      "        features: ['text', 'label'],\n",
+      "        num_rows: 7600\n",
+      "    })\n",
+      "})\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Load dataset\n",
+    "DATASETS = {\n",
+    "    \"imdb\": {\"name\": \"imdb\", \"text\": \"text\", \"label\": \"label\", \"num_labels\": 2},\n",
+    "    \"sst2\": {\n",
+    "        \"name\": \"glue\",\n",
+    "        \"config\": \"sst2\",\n",
+    "        \"text\": \"sentence\",\n",
+    "        \"label\": \"label\",\n",
+    "        \"num_labels\": 2,\n",
+    "    },\n",
+    "    \"ag_news\": {\"name\": \"ag_news\", \"text\": \"text\", \"label\": \"label\", \"num_labels\": 4},\n",
+    "}\n",
+    "\n",
+    "SELECTED_DATASET = \"ag_news\"\n",
+    "ds_config = DATASETS[SELECTED_DATASET]\n",
+    "\n",
+    "if \"config\" in ds_config:\n",
+    "    raw_dataset = load_dataset(ds_config[\"name\"], ds_config[\"config\"])\n",
+    "else:\n",
+    "    raw_dataset = load_dataset(ds_config[\"name\"])\n",
+    "\n",
+    "print(f\"✅ Dataset: {SELECTED_DATASET}\")\n",
+    "print(raw_dataset)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tokenization\n",
+    "\n",
+    "```\n",
+    "Original: \"This movie was great!\"\n",
+    "      ↓\n",
+    "Tokens:   [CLS] this movie was great ! [SEP] [PAD] ...\n",
+    "IDs:      [ 101, 2023, 3185, 2001, 2307, 999, 102,   0, ...]\n",
+    "Attention:[   1,    1,    1,    1,    1,   1,   1,   0, ...]\n",
+    "```\n",
+    "\n",
+    "📖 **Docs:** [Tokenizers Guide](https://huggingface.co/docs/transformers/tokenizer_summary)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "3a952288ea6848feb62beb253410a45a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Map:   0%|          | 0/7600 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ Tokenized! Columns: ['label', 'input_ids', 'token_type_ids', 'attention_mask']\n"
+     ]
+    }
+   ],
+   "source": [
+    "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n",
+    "\n",
+    "\n",
+    "def preprocess(examples):\n",
+    "    return tokenizer(\n",
+    "        examples[ds_config[\"text\"]],\n",
+    "        padding=\"max_length\",\n",
+    "        truncation=True,\n",
+    "        max_length=HYPERPARAMETERS[\"max_length\"],\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "tokenized = raw_dataset.map(\n",
+    "    preprocess,\n",
+    "    batched=True,\n",
+    "    remove_columns=[c for c in raw_dataset[\"train\"].column_names if c != \"label\"],\n",
+    ")\n",
+    "\n",
+    "print(f\"✅ Tokenized! Columns: {tokenized['train'].column_names}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "📊 Train: 5,000 | Test: 1,000\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Subsample for development\n",
+    "USE_SUBSET = True\n",
+    "SUBSET_SIZE = 5000\n",
+    "\n",
+    "if USE_SUBSET:\n",
+    "    train_ds = (\n",
+    "        tokenized[\"train\"].shuffle(seed=42).select(range(min(SUBSET_SIZE, len(tokenized[\"train\"]))))\n",
+    "    )\n",
+    "    test_ds = (\n",
+    "        tokenized[\"test\"]\n",
+    "        .shuffle(seed=42)\n",
+    "        .select(range(min(SUBSET_SIZE // 5, len(tokenized[\"test\"]))))\n",
+    "    )\n",
+    "    final_dataset = DatasetDict({\"train\": train_ds, \"test\": test_ds})\n",
+    "else:\n",
+    "    final_dataset = tokenized\n",
+    "\n",
+    "print(f\"📊 Train: {len(final_dataset['train']):,} | Test: {len(final_dataset['test']):,}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Save and Upload to S3\n",
+    "\n",
+    "#### Why Arrow Format?\n",
+    "\n",
+    "We use HuggingFace's `save_to_disk()` which saves in **Apache Arrow** format:\n",
+    "\n",
+    "```\n",
+    "┌─────────────────────────────────────────────────────────────────────────────┐\n",
+    "│                    CSV vs Arrow Format                                       │\n",
+    "├─────────────────────────────────────────────────────────────────────────────┤\n",
+    "│                                                                              │\n",
+    "│   CSV                              Arrow (HuggingFace native)               │\n",
+    "│   ───                              ─────────────────────────                │\n",
+    "│   • Text-based                     • Binary, columnar format               │\n",
+    "│   • Slow to parse                  • Memory-mapped (instant load)          │\n",
+    "│   • Re-parse every time            • Zero-copy reads                       │\n",
+    "│   • 10 GB → ~30 sec load           • 10 GB → <1 sec load                   │\n",
+    "│                                                                              │\n",
+    "└─────────────────────────────────────────────────────────────────────────────┘\n",
+    "```\n",
+    "\n",
+    "#### File Structure After `save_to_disk()`\n",
+    "\n",
+    "```\n",
+    "train_data/\n",
+    "├── data-00000-of-00001.arrow    # Actual data (binary, fast)\n",
+    "├── dataset_info.json            # Metadata (features, size)\n",
+    "└── state.json                   # Dataset state\n",
+    "```\n",
+    "\n",
+    "#### How It Works End-to-End\n",
+    "\n",
+    "```\n",
+    "Notebook                    S3                         Training Container\n",
+    "────────                    ──                         ──────────────────\n",
+    "save_to_disk(\"train_data\")  \n",
+    "        │\n",
+    "        ▼\n",
+    "aws s3 sync train_data s3://bucket/train/\n",
+    "                            │\n",
+    "                            ▼\n",
+    "                      s3://bucket/train/\n",
+    "                      ├── data.arrow\n",
+    "                      ├── dataset_info.json\n",
+    "                      └── state.json\n",
+    "                            │\n",
+    "                            │  SageMaker downloads to\n",
+    "                            │  /opt/ml/input/data/train/\n",
+    "                            ▼\n",
+    "                                    train.py:\n",
+    "                                    load_from_disk(\"/opt/ml/input/data/train\")\n",
+    "                                            │\n",
+    "                                            ▼\n",
+    "                                    Ready to train! (fast load)\n",
+    "```\n",
+    "\n",
+    "⚠️ **Important:** `save_to_disk()` and `load_from_disk()` must be paired. The training script uses `load_from_disk()` to read this format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "c1a4032f09d245ea878e55a8e8f659cc",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Saving the dataset (0/1 shards):   0%|          | 0/5000 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "e6b00736032c413e967dc39d1295e3bd",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Saving the dataset (0/1 shards):   0%|          | 0/1000 [00:00<?, ? examples/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "To disable this warning, you can either:\n",
+      "\t- Avoid using `tokenizers` before the fork if possible\n",
+      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "To disable this warning, you can either:\n",
+      "\t- Avoid using `tokenizers` before the fork if possible\n",
+      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
+      "✅ Uploaded to S3\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Save and upload\n",
+    "final_dataset[\"train\"].save_to_disk(\"train_data\")\n",
+    "final_dataset[\"test\"].save_to_disk(\"test_data\")\n",
+    "\n",
+    "s3_train = f\"s3://{bucket}/{S3_PREFIX}/data/train\"\n",
+    "s3_test = f\"s3://{bucket}/{S3_PREFIX}/data/test\"\n",
+    "\n",
+    "!aws s3 sync train_data {s3_train} --quiet\n",
+    "!aws s3 sync test_data {s3_test} --quiet\n",
+    "print(f\"✅ Uploaded to S3\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Part 3: Training Script\n",
+    "\n",
+    "### SageMaker Training Architecture\n",
+    "\n",
+    "```\n",
+    "┌─────────────────────────────────────────────────────────────────────────────┐\n",
+    "│                    SageMaker Training Architecture                           │\n",
+    "├─────────────────────────────────────────────────────────────────────────────┤\n",
+    "│  Notebook                         Training Container (GPU)                  │\n",
+    "│  ┌────────────────┐               ┌────────────────────────────┐           │\n",
+    "│  │  HuggingFace   │    .fit()     │  /opt/ml/                  │           │\n",
+    "│  │  Estimator     │ ────────────▶ │  ├── input/data/train/    │           │\n",
+    "│  └────────────────┘               │  │   ├── data.arrow       │ ◀── Arrow!│\n",
+    "│                                   │  │   └── dataset_info.json│           │\n",
+    "│                                   │  ├── input/data/test/     │           │\n",
+    "│                                   │  ├── model/ (output)      │           │\n",
+    "│                                   │  └── code/train.py        │           │\n",
+    "│                                   └────────────────────────────┘           │\n",
+    "│                                              ↓                              │\n",
+    "│                                   S3: model.tar.gz                          │\n",
+    "└─────────────────────────────────────────────────────────────────────────────┘\n",
+    "```\n",
+    "\n",
+    "📖 **Docs:** [HuggingFace Trainer](https://huggingface.co/docs/transformers/main_classes/trainer)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+      "To disable this warning, you can either:\n",
+      "\t- Avoid using `tokenizers` before the fork if possible\n",
+      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
+     ]
+    }
+   ],
+   "source": [
+    "!mkdir -p scripts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Overwriting scripts/train.py\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%writefile scripts/train.py\n",
+    "import argparse, os, sys, logging\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "from datasets import load_from_disk\n",
+    "from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments, EarlyStoppingCallback\n",
+    "from sklearn.metrics import accuracy_score, precision_recall_fscore_support\n",
+    "\n",
+    "logging.basicConfig(level=logging.INFO, format=\"%(asctime)s - %(message)s\", handlers=[logging.StreamHandler(sys.stdout)])\n",
+    "logger = logging.getLogger(__name__)\n",
+    "\n",
+    "def compute_metrics(eval_pred):\n",
+    "    preds = np.argmax(eval_pred.predictions, axis=1)\n",
+    "    acc = accuracy_score(eval_pred.label_ids, preds)\n",
+    "    prec, rec, f1, _ = precision_recall_fscore_support(eval_pred.label_ids, preds, average=\"weighted\")\n",
+    "    return {\"accuracy\": acc, \"precision\": prec, \"recall\": rec, \"f1\": f1}\n",
+    "\n",
+    "def main():\n",
+    "    parser = argparse.ArgumentParser()\n",
+    "    parser.add_argument(\"--model_name\", type=str, default=\"bert-base-uncased\")\n",
+    "    parser.add_argument(\"--epochs\", type=int, default=3)\n",
+    "    parser.add_argument(\"--train_batch_size\", type=int, default=16)\n",
+    "    parser.add_argument(\"--learning_rate\", type=float, default=2e-5)\n",
+    "    parser.add_argument(\"--num_labels\", type=int, default=2)\n",
+    "    parser.add_argument(\"--max_length\", type=int, default=128)\n",
+    "    parser.add_argument(\"--model_dir\", type=str, default=os.environ.get(\"SM_MODEL_DIR\", \"/opt/ml/model\"))\n",
+    "    parser.add_argument(\"--train_dir\", type=str, default=os.environ.get(\"SM_CHANNEL_TRAIN\"))\n",
+    "    parser.add_argument(\"--test_dir\", type=str, default=os.environ.get(\"SM_CHANNEL_TEST\"))\n",
+    "    args = parser.parse_args()\n",
+    "    \n",
+    "    logger.info(f\"Training {args.model_name}\")\n",
+    "    \n",
+    "    train_ds = load_from_disk(args.train_dir)\n",
+    "    test_ds = load_from_disk(args.test_dir)\n",
+    "    \n",
+    "    tokenizer = AutoTokenizer.from_pretrained(args.model_name)\n",
+    "    model = AutoModelForSequenceClassification.from_pretrained(args.model_name, num_labels=args.num_labels)\n",
+    "    \n",
+    "    training_args = TrainingArguments(\n",
+    "        output_dir=\"/opt/ml/output/data\", num_train_epochs=args.epochs,\n",
+    "        per_device_train_batch_size=args.train_batch_size, learning_rate=args.learning_rate,\n",
+    "        fp16=torch.cuda.is_available(), eval_strategy=\"epoch\", save_strategy=\"epoch\",\n",
+    "        load_best_model_at_end=True, metric_for_best_model=\"f1\", logging_steps=100\n",
+    "    )\n",
+    "    \n",
+    "    trainer = Trainer(model=model, args=training_args, train_dataset=train_ds, eval_dataset=test_ds,\n",
+    "                      tokenizer=tokenizer, compute_metrics=compute_metrics,\n",
+    "                      callbacks=[EarlyStoppingCallback(early_stopping_patience=2)])\n",
+    "    \n",
+    "    trainer.train()\n",
+    "    trainer.save_model(args.model_dir)\n",
+    "    tokenizer.save_pretrained(args.model_dir)\n",
+    "    logger.info(\"Training complete!\")\n",
+    "\n",
+    "if __name__ == \"__main__\":\n",
+    "    main()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Overwriting scripts/requirements.txt\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%writefile scripts/requirements.txt\n",
+    "# SageMaker DLC: PyTorch 2.5.1, Transformers 4.49.0\n",
+    "# Find versions: https://github.com/aws/deep-learning-containers/blob/master/available_images.md\n",
+    "transformers==4.49.0\n",
+    "datasets==2.21.0\n",
+    "accelerate==1.0.1\n",
+    "evaluate==0.4.3\n",
+    "scikit-learn==1.5.2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Part 4: Train on SageMaker\n",
+    "\n",
+    "📖 **Docs:** [HuggingFace Estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ Estimator ready: bert-base-uncased on ml.p3.2xlarge\n"
+     ]
+    }
+   ],
+   "source": [
+    "hyperparams = {**HYPERPARAMETERS, \"num_labels\": ds_config[\"num_labels\"]}\n",
+    "\n",
+    "estimator = HuggingFace(\n",
+    "    entry_point=\"train.py\",\n",
+    "    source_dir=\"./scripts\",\n",
+    "    instance_type=TRAINING_INSTANCE,\n",
+    "    instance_count=1,\n",
+    "    role=role,\n",
+    "    transformers_version=\"4.49.0\",  # Latest\n",
+    "    pytorch_version=\"2.5.1\",  # Latest\n",
+    "    py_version=\"py311\",\n",
+    "    hyperparameters=hyperparams,\n",
+    "    base_job_name=f\"hf-{SELECTED_MODEL}\",\n",
+    ")\n",
+    "\n",
+    "print(f\"✅ Estimator ready: {MODEL_NAME} on {TRAINING_INSTANCE}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"🚀 Starting training...\")\n",
+    "estimator.fit({\"train\": s3_train, \"test\": s3_test}, wait=True)\n",
+    "\n",
+    "training_job = estimator.latest_training_job.name\n",
+    "model_artifacts = estimator.model_data\n",
+    "\n",
+    "print(f\"\\n✅ Complete!\")\n",
+    "print(f\"📋 Job: {training_job}\")\n",
+    "print(f\"📦 Model: {model_artifacts}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Part 5: Deploy\n",
+    "\n",
+    "```\n",
+    "┌─────────────────────────────────────────────────────────────────────────────┐\n",
+    "│                        Deployment Options                                    │\n",
+    "├─────────────────┬─────────────────┬─────────────────┬──────────────────────┤\n",
+    "│   Real-time     │   Serverless    │     Async       │      Batch           │\n",
+    "│   Endpoint      │   Inference     │   Inference     │    Transform         │\n",
+    "├─────────────────┼─────────────────┼─────────────────┼──────────────────────┤\n",
+    "│  Always on      │  Pay/request    │  Batch jobs     │   Large batch        │\n",
+    "│  <100ms latency │  Auto-scales    │  Minutes        │   Offline            │\n",
+    "│  >1 req/sec     │  <1 req/sec     │  Long running   │                      │\n",
+    "└─────────────────┴─────────────────┴─────────────────┴──────────────────────┘\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "🚀 Deploying to ml.g4dn.xlarge...\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:sagemaker:Creating model with name: huggingface-pytorch-inference-2025-12-07-22-30-03-212\n",
+      "INFO:sagemaker:Creating endpoint-config with name hf-bert-base-20251207-221206\n",
+      "INFO:sagemaker:Creating endpoint with name hf-bert-base-20251207-221206\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "-----------!✅ Endpoint: hf-bert-base-20251207-221206\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sagemaker.huggingface import HuggingFaceModel\n",
+    "\n",
+    "# Create model with inference-compatible versions\n",
+    "huggingface_model = HuggingFaceModel(\n",
+    "    model_data=model_artifacts,\n",
+    "    role=role,\n",
+    "    transformers_version=\"4.49.0\",\n",
+    "    pytorch_version=\"2.6.0\",  # Use 2.6.0 for inference (not 2.5.1)\n",
+    "    py_version=\"py312\",  # Use py312 for inference (not py311)\n",
+    ")\n",
+    "\n",
+    "# Deploy\n",
+    "print(f\"🚀 Deploying to {INFERENCE_INSTANCE}...\")\n",
+    "predictor = huggingface_model.deploy(\n",
+    "    initial_instance_count=1,\n",
+    "    instance_type=INFERENCE_INSTANCE,\n",
+    "    endpoint_name=f\"hf-{SELECTED_MODEL}-{TIMESTAMP}\",\n",
+    ")\n",
+    "\n",
+    "print(f\"✅ Endpoint: {predictor.endpoint_name}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Part 6: Inference"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "📊 Dataset: ag_news\n",
+      "🏷️ Labels: {0: 'World 🌍', 1: 'Sports ⚽', 2: 'Business 💼', 3: 'Sci/Tech 🔬'}\n",
+      "\n",
+      "🔮 Predictions:\n",
+      "\n",
+      "'The stock market rallied today as tech companies r...' → Sci/Tech 🔬 (66.7%)\n",
+      "'The championship game ended with a stunning last-m...' → Sports ⚽ (98.9%)\n",
+      "'Scientists discover high-frequency brainwaves cont...' → Sci/Tech 🔬 (98.2%)\n",
+      "'Political leaders from 50 countries met at the UN ...' → World 🌍 (98.8%)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Label mappings for all supported datasets\n",
+    "LABEL_MAPPINGS = {\n",
+    "    \"imdb\": {0: \"Negative 👎\", 1: \"Positive 👍\"},\n",
+    "    \"sst2\": {0: \"Negative 👎\", 1: \"Positive 👍\"},\n",
+    "    \"ag_news\": {0: \"World 🌍\", 1: \"Sports ⚽\", 2: \"Business 💼\", 3: \"Sci/Tech 🔬\"},\n",
+    "    \"emotion\": {\n",
+    "        0: \"Sadness 😢\",\n",
+    "        1: \"Joy 😊\",\n",
+    "        2: \"Love ❤️\",\n",
+    "        3: \"Anger 😠\",\n",
+    "        4: \"Fear 😨\",\n",
+    "        5: \"Surprise 😲\",\n",
+    "    },\n",
+    "    \"yelp\": {0: \"Negative 👎\", 1: \"Positive 👍\"},\n",
+    "}\n",
+    "\n",
+    "# Test samples for each dataset type\n",
+    "TEST_SAMPLES = {\n",
+    "    \"imdb\": [\n",
+    "        \"This movie was absolutely fantastic!\",\n",
+    "        \"Terrible experience, very disappointed.\",\n",
+    "        \"It was okay, nothing special.\",\n",
+    "    ],\n",
+    "    \"sst2\": [\n",
+    "        \"This movie was absolutely fantastic!\",\n",
+    "        \"Terrible experience, very disappointed.\",\n",
+    "        \"It was okay, nothing special.\",\n",
+    "    ],\n",
+    "    \"ag_news\": [\n",
+    "        \"The stock market rallied today as tech companies reported strong earnings.\",\n",
+    "        \"The championship game ended with a stunning last-minute goal.\",\n",
+    "        \"Scientists discover high-frequency brainwaves control memory.\",\n",
+    "        \"Political leaders from 50 countries met at the UN summit.\",\n",
+    "    ],\n",
+    "    \"emotion\": [\n",
+    "        \"I just got promoted at work, this is amazing!\",\n",
+    "        \"I can't believe they canceled my favorite show.\",\n",
+    "        \"You mean everything to me, I'm so grateful.\",\n",
+    "    ],\n",
+    "}\n",
+    "\n",
+    "# Use the correct labels and samples for your dataset\n",
+    "LABELS = LABEL_MAPPINGS.get(SELECTED_DATASET, {0: \"Class 0\", 1: \"Class 1\"})\n",
+    "tests = TEST_SAMPLES.get(SELECTED_DATASET, [\"Test sentence\"])\n",
+    "\n",
+    "print(f\"📊 Dataset: {SELECTED_DATASET}\")\n",
+    "print(f\"🏷️ Labels: {LABELS}\\n\")\n",
+    "print(\"🔮 Predictions:\\n\")\n",
+    "\n",
+    "for text in tests:\n",
+    "    result = predictor.predict({\"inputs\": text})\n",
+    "    if isinstance(result, list):\n",
+    "        label = result[0].get(\"label\", \"LABEL_0\")\n",
+    "        score = result[0].get(\"score\", 0)\n",
+    "        idx = int(label.replace(\"LABEL_\", \"\"))\n",
+    "        print(f\"'{text[:50]}...' → {LABELS.get(idx, label)} ({score:.1%})\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Expected output for AG News:**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "📊 Dataset: ag_news\n",
+    "🏷️ Labels: {0: 'World 🌍', 1: 'Sports ⚽', 2: 'Business 💼', 3: 'Sci/Tech 🔬'}\n",
+    "\n",
+    "🔮 Predictions:\n",
+    "\n",
+    "'The stock market rallied today as tech companies r...' → Business 💼 (94.2%)\n",
+    "'The championship game ended with a stunning last-m...' → Sports ⚽ (97.8%)\n",
+    "'Scientists discover high-frequency brainwaves cont...' → Sci/Tech 🔬 (91.5%)\n",
+    "'Political leaders from 50 countries met at the UN ...' → World 🌍 (88.3%)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## Part 7: Cleanup\n",
+    "\n",
+    "⚠️ **Delete endpoints to avoid charges!**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f\"🗑️ Deleting: {predictor.endpoint_name}\")\n",
+    "predictor.delete_endpoint()\n",
+    "print(\"✅ Deleted!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "## 📚 Summary\n",
+    "\n",
+    "### Key Concepts\n",
+    "\n",
+    "| Concept | What | Why |\n",
+    "|---------|------|-----|\n",
+    "| Arrow Format | Binary columnar data format | Fast loading (10GB in <1 sec) |\n",
+    "| `save_to_disk()` | Saves dataset as Arrow files | Preserves tokenization |\n",
+    "| `load_from_disk()` | Reads Arrow files in training script | Must match save format |\n",
+    "| Container Versions | Training vs Inference may differ | Check DLC availability |\n",
+    "\n",
+    "### Resources\n",
+    "\n",
+    "| Resource | Link |\n",
+    "|----------|------|\n",
+    "| AWS DLC Images | https://github.com/aws/deep-learning-containers/blob/master/available_images.md |\n",
+    "| SageMaker SDK | https://sagemaker.readthedocs.io/ |\n",
+    "| HuggingFace Docs | https://huggingface.co/docs/transformers/ |\n",
+    "| Model Hub | https://huggingface.co/models |\n",
+    "| Datasets Hub | https://huggingface.co/datasets |\n",
+    "| Pricing | https://aws.amazon.com/sagemaker/pricing/ |"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.14"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/index.rst b/index.rst
index f118448289..420411deff 100644
--- a/index.rst
+++ b/index.rst
@@ -168,6 +168,7 @@ We recommend the following notebooks as a broad introduction to the capabilities
    generative_ai/sm-mixtral_8x7b_fine_tune_and_deploy/sm-mixtral_8x7b_fine_tune_and_deploy
    generative_ai/sm-djl_deepspeed_bloom_176b_deploy
    generative_ai/sm-fsdp_training_of_llama_v2_with_fp8_on_p5
+   generative_ai/sm-huggingface_text_classification
    generative_ai/sm-jumpstart_foundation_code_llama_fine_tuning_human_eval
    generative_ai/sm-jumpstart_foundation_finetuning_gpt_j_6b_domain_adaptation
    generative_ai/sm-jumpstart_foundation_gemma_fine_tuning