For training, we train my YOLOvx series series with 300 epochs on COCO.
For data augmentation, we use the large scale jitter (LSJ), Mosaic augmentation and Mixup augmentation, following the setting of YOLOX, but we remove the rotation transformation which is used in YOLOX's strong augmentation.
For optimizer, we use AdamW with weight decay 0.05 and base per image lr 0.001 / 64.
For learning rate scheduler, we use linear decay scheduler.
Due to my limited computing resources, I can not train YOLOvx-X with the setting of batch size=128.