Real-time Transformer-based Object Detector:

This model is not yet complete.

Results on the COCO-val

For the backbone of the image encoder, we use the IN-1K classification pretrained weight from torchvision, which is different from the official RT-DETR. It might be hard to train RT-DETR from scratch without IN-1K pretrained weight.
For training, we train RT-DETR series with 6x (~72 epochs) schedule on COCO and use ModelEMA trick. We close the fp16 training trick.
For data augmentation, we use the color jitter, random hflip, random crop, and multi-scale training trick.
For optimizer, we use AdamW with weight decay 0.0001 and base per image lr 0.0001 / 16.
For learning rate scheduler, we use constant learning rate (=0.0001), following the official setting.
For post-processing, we think it is still a little helpful to deploy NMS even if it is not essential.

Train RT-DETR

Single GPU

Taking training RT-DETR-R18 on COCO as the example,

python train.py --cuda -d coco --root path/to/coco -m rtdetr_r18 -bs 16 -size 640 --max_epoch 72 --eval_epoch 1 --ema --multi_scale

Multi GPU

Taking training RT-DETR-R18 on COCO with 4 GPUs as the example,

python -m torch.distributed.run --nproc_per_node=4 train.py --cuda -dist -d coco --root /data/datasets/ -m rtdetr_r18 -bs 16 -size 640 --max_epoch 72 --eval_epoch 1 --ema --sybn --multi_scale

Test RT-DETR

Taking testing RT-DETR-R18 on COCO-val as the example,

python test.py --cuda -d coco --root path/to/coco -m rtdetr_r18 --weight path/to/rtdetr_r18.pth -size 640 -ct 0.4 --show

Evaluate RT-DETR

Taking evaluating RT-DETR-R18 on COCO-val as the example,

python eval.py --cuda -d coco --root path/to/coco -m rtdetr_r18 --weight path/to/rtdetr_r18.pth -size 640

Demo

Detect with Image

python demo.py --mode image --path_to_img path/to/image_dirs/ --cuda -m rtdetr_r18 --weight path/to/weight -size 640 -ct 0.4 --show

Detect with Video

python demo.py --mode video --path_to_vid path/to/video --cuda -m rtdetr_r18 --weight path/to/weight -size 640 -ct 0.4 --show --gif

Detect with Camera

python demo.py --mode camera --cuda -m rtdetr_r18 --weight path/to/weight -size 640 -ct 0.4 --show --gif

Model	Batch	Scale	AP^val 0.5:0.95	AP^val 0.5	FLOPs ^(G)	Params ^(M)	Weight	Los
RT-DETR-R18	4xb4	640	45.5	63.0	66.8	21.0	ckpt	log
RT-DETR-R50	4xb4	640	50.2	68.5	113.7	40.4	ckpt	log
RT-DETR-R101	4xb4	640

README.md 3.3 KB History Raw