Skip to content

Commit 85c5eab

Browse files
committed
updated version of Fine-Tune RF-DETR on Object Detection Dataset notebook with instructions for hyperparameters tuning and multi-GPU training
1 parent 284292e commit 85c5eab

File tree

1 file changed

+31
-14
lines changed

1 file changed

+31
-14
lines changed

notebooks/how-to-finetune-rf-detr-on-detection-dataset.ipynb

Lines changed: 31 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -771,7 +771,7 @@
771771
"metadata": {
772772
"id": "_dGO4a7eTbFX"
773773
},
774-
"execution_count": 1,
774+
"execution_count": null,
775775
"outputs": []
776776
},
777777
{
@@ -797,7 +797,7 @@
797797
"id": "ms0ps6ZCT2xs",
798798
"outputId": "6a5ffafe-73c6-45f9-f16b-36e400cc2ded"
799799
},
800-
"execution_count": 2,
800+
"execution_count": null,
801801
"outputs": [
802802
{
803803
"output_type": "stream",
@@ -865,7 +865,7 @@
865865
"metadata": {
866866
"id": "8DCCwexcU6gO"
867867
},
868-
"execution_count": 4,
868+
"execution_count": null,
869869
"outputs": []
870870
},
871871
{
@@ -949,7 +949,7 @@
949949
"id": "PtOufRspVekp",
950950
"outputId": "711403bd-2543-4582-de1b-1e6b369045af"
951951
},
952-
"execution_count": 5,
952+
"execution_count": null,
953953
"outputs": [
954954
{
955955
"output_type": "stream",
@@ -1037,7 +1037,7 @@
10371037
"id": "hQkMUyB0lROT",
10381038
"outputId": "01fc243a-7911-4762-f44b-4652f748e138"
10391039
},
1040-
"execution_count": 6,
1040+
"execution_count": null,
10411041
"outputs": [
10421042
{
10431043
"output_type": "stream",
@@ -1074,7 +1074,24 @@
10741074
{
10751075
"cell_type": "markdown",
10761076
"source": [
1077-
"## Train RF-DETR on custom dataset"
1077+
"## Train RF-DETR on custom dataset\n",
1078+
"\n",
1079+
"### Choose the right `batch_size`\n",
1080+
"\n",
1081+
"Different GPUs have different amounts of VRAM (video memory), which limits how much data they can handle at once during training. To make training work well on any machine, you can adjust two settings: `batch_size` and `grad_accum_steps`. These control how many samples are processed at a time. The key is to keep their product equal to 16 — that’s our recommended total batch size. For example, on powerful GPUs like the A100, set `batch_size=16` and `grad_accum_steps=1`. On smaller GPUs like the T4, use `batch_size=4` and `grad_accum_steps=4`. We use a method called gradient accumulation, which lets the model simulate training with a larger batch size by gradually collecting updates before adjusting the weights.\n",
1082+
"\n",
1083+
"### Train with multiple GPUs\n",
1084+
"\n",
1085+
"You can fine-tune RF-DETR on multiple GPUs using PyTorch’s Distributed Data Parallel (DDP). Create a `main.py` script that initializes your model and calls `.train()` as usual than run it in terminal.\n",
1086+
"\n",
1087+
"```bash\n",
1088+
"python -m torch.distributed.launch \\\n",
1089+
" --nproc_per_node=8 \\\n",
1090+
" --use_env \\\n",
1091+
" main.py\n",
1092+
"```\n",
1093+
"\n",
1094+
"Replace `8` in the `--nproc_per_node argument` with the number of GPUs you want to use. This approach creates one training process per GPU and splits the workload automatically. Note that your effective batch size is multiplied by the number of GPUs, so you may need to adjust your `batch_size` and `grad_accum_steps` to maintain the same overall batch size."
10781095
],
10791096
"metadata": {
10801097
"id": "vmT8f_bAq3zX"
@@ -1143,7 +1160,7 @@
11431160
"id": "7UEX3GVCmYaq",
11441161
"outputId": "a579ba09-43ab-4863-d2b6-94dbd77bbb6b"
11451162
},
1146-
"execution_count": 8,
1163+
"execution_count": null,
11471164
"outputs": [
11481165
{
11491166
"output_type": "display_data",
@@ -1203,7 +1220,7 @@
12031220
"id": "gcjxmZeqqAdv",
12041221
"outputId": "356c1fea-316e-4b3d-888c-fe869e8adffd"
12051222
},
1206-
"execution_count": 9,
1223+
"execution_count": null,
12071224
"outputs": [
12081225
{
12091226
"output_type": "display_data",
@@ -1249,7 +1266,7 @@
12491266
"metadata": {
12501267
"id": "xm-lmRWLswO4"
12511268
},
1252-
"execution_count": 10,
1269+
"execution_count": null,
12531270
"outputs": []
12541271
},
12551272
{
@@ -1304,7 +1321,7 @@
13041321
"id": "msor_5HgAkm3",
13051322
"outputId": "3c43efc2-c7d8-42b8-8d6d-c235f537222a"
13061323
},
1307-
"execution_count": 11,
1324+
"execution_count": null,
13081325
"outputs": [
13091326
{
13101327
"output_type": "display_data",
@@ -1365,7 +1382,7 @@
13651382
"id": "RFEgIOz1YDCe",
13661383
"outputId": "8f885c5c-5122-44a6-d8ed-3783e442cc69"
13671384
},
1368-
"execution_count": 14,
1385+
"execution_count": null,
13691386
"outputs": [
13701387
{
13711388
"output_type": "display_data",
@@ -1412,7 +1429,7 @@
14121429
"id": "szxs3PZsBVxa",
14131430
"outputId": "46b8ab4b-41a0-4fc3-ff63-bb28bad0b442"
14141431
},
1415-
"execution_count": 20,
1432+
"execution_count": null,
14161433
"outputs": [
14171434
{
14181435
"output_type": "stream",
@@ -1439,7 +1456,7 @@
14391456
"id": "fxqvXOQcsRF2",
14401457
"outputId": "240e349f-4051-4c7a-eb74-2c521ac03e2b"
14411458
},
1442-
"execution_count": 21,
1459+
"execution_count": null,
14431460
"outputs": [
14441461
{
14451462
"output_type": "display_data",
@@ -1472,7 +1489,7 @@
14721489
"id": "WuiNB-UM1xsJ",
14731490
"outputId": "822fea57-acb9-4d09-ad34-02aa640ad808"
14741491
},
1475-
"execution_count": 24,
1492+
"execution_count": null,
14761493
"outputs": [
14771494
{
14781495
"output_type": "display_data",

0 commit comments

Comments
 (0)