YOLOX:

For training, we train YOLOX series with 300 epochs on COCO.
For data augmentation, we use the large scale jitter (LSJ), Mosaic augmentation and Mixup augmentation.
For optimizer, we use SGD with weight decay 0.0005 and base per image lr 0.01 / 64,.
For learning rate scheduler, we use Cosine decay scheduler.
I am trying to retrain YOLOX-M and YOLOX-L with more GPUs, and I will update the AP of YOLOX-M and YOLOX-L in the table in the future.

Model	Backbone	Batch	Scale	AP^val 0.5:0.95	AP^val 0.5	FLOPs ^(G)	Params ^(M)	Weight
YOLOX-N	CSPDarkNet-N	8xb8	640	30.4	48.9	7.5	2.3	ckpt
YOLOX-S	CSPDarkNet-S	8xb8	640	39.0	58.8	26.8	8.9	ckpt
YOLOX-M	CSPDarkNet-M	8xb8	640	44.6	63.8	74.3	25.4	ckpt
YOLOX-L	CSPDarkNet-L	8xb8	640	48.7	68.0	155.4	54.2	ckpt