1 year ago · b3dca6aa77
--- a/requirements.txt
+++ b/requirements.txt
@@ -16,12 +16,4 @@ imageio
 
				 
			
 
				 pycocotools
			
 
				 
			
 
				-onnxsim
			
 
				-
			
 
				-onnxruntime
			
 
				-
			
 
				-openvino
			
 
				-
			
 
				-loguru
			
 
				-
			
 
				 albumentations
			
--- a/yolo/README.md
+++ b/yolo/README.md
@@ -1,14 +1,265 @@
 
				-# General Object Detection for Open World
			
 
				+# YOLO系列教程
			
 
				+这部分是YOLO系列教程的项目代码，同时，RT-DETR系列也包含在本项目中。
			
 
				+
			
 
				+## 配置环境
			
 
				+- 首先，创建一个conda虚拟环境，例如，我们创建一个名为`yolo_tutorial`的虚拟环境，并设置python版本为3.10：
			
 
				+```Shell
			
 
				+conda create -n yolo_tutorial python=3.10
			
 
				+```
			
 
				+
			
 
				+- 随后，激活创建好的虚拟环境
			
 
				+```Shell
			
 
				+conda activate yolo_tutorial
			
 
				+```
			
 
				+
			
 
				+- 然后，安装本项目所用到的各种python包和库
			
 
				+1. 安装必要的各种包和库
			
 
				+```Shell
			
 
				+pip install -r requirements.txt 
			
 
				+```
			
 
				+
			
 
				+2. (可选) 为了能够加快RT-DETR模型中的可形变自注意力计算，可以考虑编译 `MSDeformableAttention` 的CUDA算子
			
 
				+
			
 
				+```bash
			
 
				+cd models/rtdetr/basic_modules/ext_op/
			
 
				+python setup_ms_deformable_attn_op.py install
			
 
				+```
			
 
				+
			
 
				+下面是笔者常用的环境配置，可供读者参考:
			
 
				+- PyTorch = 2.2.0
			
 
				+- Torchvision = 0.17.0
			
 
				+
			
 
				+如果你的设备不支持2.x版本的torch，可以自行安装其他版本的，确保torch版本是在1.0以上即可。
			
 
				+
			
 
				+## 实验内容
			
 
				+### 准备 VOC 数据
			
 
				+- 下载VOC2007 和VOC2012数据.
			
 
				+```Shell
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+cd dataset/scripts/
			
 
				+sh VOC2007.sh
			
 
				+sh VOC2012.sh
			
 
				+```
			
 
				+
			
 
				+- 检查VOC数据
			
 
				+```Shell
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+cd dataset/
			
 
				+python voc.py --is_train --aug_type yolo
			
 
				+```
			
 
				+
			
 
				+### COCO
			
 
				+
			
 
				+- 下载 COCO 2017 数据.
			
 
				+```Shell
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+cd dataset/scripts/
			
 
				+sh COCO2017.sh
			
 
				+```
			
 
				+
			
 
				+- 检查 COCO 数据
			
 
				+```Shell
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+cd dataset/
			
 
				+python coco.py --is_train --aug_type yolo
			
 
				+```
			
 
				+
			
 
				+## Train
			
 
				+对于训练，我们提供了一个名为`train.sh`的脚本，方便读者可以一键启动命令，不过，为了顺利使用这个脚本，该脚本需要接受一些命令行参数，读者可以参考下面的格式来正确使用本训练脚本：
			
 
				+
			
 
				+```Shell
			
 
				+bash train.sh <model> <data> <data_path> <batch_size> <num_gpus> <master_port> <resume_weight>
			
 
				+```
			
 
				+其中，<model>是要训练的模型名称；<data>是数据集名称；<data_path>是数据集的存放路径；<batch_size>顾名思义；<num_gpus>顾名思义；<master_port>DDP所需的port值，随意设定即可，如1234，4662；<resume_weight>已保存的checkpoint，训练中断后继续训练时会用到，如果从头训练，则设置为`None`即可。
			
 
				+
			
 
				+例如，我们使用该脚本来训练本项目的`YOLOv1-R18`模型，使用4张GPU，从头开始训练：
			
 
				+```Shell
			
 
				+bash train.sh yolov1_r18 coco path/to/coco 128 4 1699 None
			
 
				+```
			
 
				+
			
 
				+假如训练中断了，我们要接着继续训练，则参考下面的命令：
			
 
				+```Shell
			
 
				+bash train.sh yolov1_r18 coco path/to/coco 128 4 1699 path/to/yolov1_r18_coco.pth
			
 
				+```
			
 
				+其中，最后的命令行参数`path/to/yolov1_r18_coco.pth`是在上一次训练阶段中已保存的checkpoint文件。
			
 
				+
			
 
				+## 训练自定义数据
			
 
				+除了本教程所介绍的VOC和COCO两大主流数据集，本项目也支持训练读者自定义的数据。不过，需要按照本项目的要求来从头开始准备数据，包括标注和格式转换（COCO格式）。如果读者手中的数据已经都准备好了，倘若不符合本项目的格式，还请另寻他法，切不可强行使用本项目，否则出了问题，我们也无法提供解决策略，只能后果自负。为了能够顺利使用本项目，请读者遵循以下的步骤来开始准备数据
			
 
				+
			
 
				+- 第1步，准备图片，诸如jpg格式、png格式等都可以，构建为自定义数据集，不妨起名为`CustomedDataset`，然后使用开源的`labelimg`制作标签文件。有关于`labelimg`的使用方法，请自行了解。完成标注后，应得到如下所示的文件目录格式：
			
 
				+
			
 
				+```
			
 
				+CustomedDataset
			
 
				+|_ train
			
 
				+|  |_ images     
			
 
				+|     |_ 0.jpg
			
 
				+|     |_ 1.jpg
			
 
				+|     |_ ...
			
 
				+|  |_ annotations
			
 
				+|     |_ 0.xml
			
 
				+|     |_ 1.xml
			
 
				+|     |_ ...
			
 
				+|_ val
			
 
				+|  |_ images     
			
 
				+|     |_ 0.jpg
			
 
				+|     |_ 1.jpg
			
 
				+|     |_ ...
			
 
				+|  |_ annotations
			
 
				+|     |_ 0.xml
			
 
				+|     |_ 1.xml
			
 
				+|     |_ ...
			
 
				+|  ...
			
 
				+```
			
 
				+
			
 
				+- 第2步: 修改与数据有关的配置参数
			
 
				+读者需要修改定义在`dataset/customed.py`文件中的`customed_class_indexs` 和 `customed_class_labels`两个参数，前者是类别索引，后者是类别名称。例如，我们使用了如下的定义以供读者参考:
			
 
				+```Shell
			
 
				+# dataset/customed.py
			
 
				+customed_class_indexs = [0, 1, 2, 3, 4, 5, 6, 7, 8]
			
 
				+customed_class_labels = ('bird', 'butterfly', 'cat', 'cow', 'dog', 'lion', 'person', 'pig', 'tiger', )
			
 
				+```
			
 
				+
			
 
				+- 第3步: 将数据转换为COCO的json格式
			
 
				+尽管已经使用`labelimg`软件标准了XML格式的标签文件，不过，为了能够顺利使用coco的evaluation工具，我们建议进一步将数据的格式转换为coco的json格式，具体操作如下所示，分别准备好`train`的json文件和`val`的json文件。
			
 
				+
			
 
				+```Shell
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+cd tools
			
 
				+# convert train split
			
 
				+python convert_ours_to_coco.py --root path/to/customed_dataset/ --split train
			
 
				+# convert val split
			
 
				+python convert_ours_to_coco.py --root path/to/customed_dataset/ --split val
			
 
				+```
			
 
				+
			
 
				+随后，我们便可得到一个名为`train.json` 的文件和一个名为 `val.json` 文件，如下所示.
			
 
				+```
			
 
				+CustomedDataset
			
 
				+|_ train
			
 
				+|  |_ images     
			
 
				+|     |_ 0.jpg
			
 
				+|     |_ 1.jpg
			
 
				+|     |_ ...
			
 
				+|  |_ annotations
			
 
				+|     |_ 0.xml
			
 
				+|     |_ 1.xml
			
 
				+|     |_ ...
			
 
				+|     |_ train.json
			
 
				+|_ val
			
 
				+|  |_ images     
			
 
				+|     |_ 0.jpg
			
 
				+|     |_ 1.jpg
			
 
				+|     |_ ...
			
 
				+|  |_ annotations
			
 
				+|     |_ 0.xml
			
 
				+|     |_ 1.xml
			
 
				+|     |_ ...
			
 
				+|     |_ val.json
			
 
				+|  ...
			
 
				+```
			
 
				+
			
 
				+- 第4步：检查数据
			
 
				+然后，我们来检查数据是否准备完毕，对此，可以通过运行读取数据的代码文件来查看数据可视化结果，如果能顺利看到数据可视化结果，表明数据已准备完毕。读者可参考如下所示的运行命令。
			
 
				+
			
 
				+```Shell
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+cd dataset
			
 
				+# convert train split
			
 
				+python customed.py --root path/to/customed_dataset/ --split train
			
 
				+# convert val split
			
 
				+python customed.py --root path/to/customed_dataset/ --split val
			
 
				+```
			
 
				+
			
 
				+- 第5步：使用自定义数据训练模型
			
 
				+接下来，我们就可以使用自定义的数据训练本项目的模型，例如，训练`YOLOv1-R18`模型，可参考如下的运行命令：
			
 
				+
			
 
				+```Shell
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+bash train.sh yolov1_r18 customed path/to/customed_dataset/ 128 4 1699 None
			
 
				+```
			
 
				+
			
 
				+- 第6步：使用自定义数据测试模型
			
 
				+
			
 
				+我们就可以使用自定义的数据测试已训练好的模型，例如，测试`YOLOv1-R18`模型，观察检测结果的可视化图像，可参考如下的运行命令：
			
 
				+
			
 
				+```Shell
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+python test.py -d customed --root path/to/customed_dataset/ -d customed -m yolov1_r18 --weight path/to/checkpoint --show
			
 
				+```
			
 
				+
			
 
				+- 第7步：使用自定义数据验证模型
			
 
				+
			
 
				+我们就可以使用自定义的数据验证已训练好的模型，例如，验证`YOLOv1-R18`模型的性能，得到AP指标，可参考如下的运行命令：
			
 
				+
			
 
				+```Shell
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+python eval.py -d customed --root path/to/customed_dataset/ -d customed -m yolov1_r18 --weight path/to/checkpoint
			
 
				+```
			
 
				+
			
 
				+## Demo
			
 
				+本项目提供了用于检查本地图片的代码文件`demo.py`，支持使用VOC/COCO/自定义数据所训练出来的模型来检查本地的图片/视频/笔记本摄像头。
			
 
				+
			
 
				+例如，检查本地的图片，使用COCO数据训练的模型`YOLOv1_R18`：
			
 
				+```Shell
			
 
				+python demo.py --mode image \
			
 
				+               --path_to_img data/demo/images/ \
			
 
				+               --cuda \
			
 
				+               --img_size 640 \
			
 
				+               --model yolov1_r18 \
			
 
				+               --weight path/to/weight \
			
 
				+               --dataset coco \
			
 
				+               --show
			
 
				+```
			
 
				+
			
 
				+例如，检查本地的视频，使用COCO数据训练的模型`YOLOv1_R18`，并将视频结果保存为GIF图片：
			
 
				+```Shell
			
 
				+python demo.py --mode video \
			
 
				+               --path_to_vid data/demo/video \
			
 
				+               --cuda \
			
 
				+               --img_size 640 \
			
 
				+               --model yolov1_r18 \
			
 
				+               --weight path/to/weight \
			
 
				+               --dataset coco \
			
 
				+               --show \
			
 
				+               --gif
			
 
				+```
			
 
				+
			
 
				+例如，检查笔记本的摄像头做实时检测，使用COCO数据训练的模型`YOLOv1_R18`，并将视频结果保存为GIF图片：
			
 
				+```Shell
			
 
				+python demo.py --mode camera \
			
 
				+               --cuda \
			
 
				+               --img_size 640 \
			
 
				+               --model yolov1_r18 \
			
 
				+               --weight path/to/weight \
			
 
				+               --dataset coco \
			
 
				+               --show \
			
 
				+               --gif
			
 
				+```
			
 
				+如果是外接的摄像头，需要读者略微调整`demo.py`文件中的代码，如下所示：
			
 
				+```Shell
			
 
				+    # --------------- Camera ---------------
			
 
				+    if mode == 'camera':
			
 
				+        ...
			
 
				+
			
 
				+        # 笔记本摄像头，index=0；外接摄像头，index=1；
			
 
				+        cap = cv2.VideoCapture(index=0, apiPreference=cv2.CAP_DSHOW)
			
 
				+    
			
 
				+    ...
			
 
				+```
			
 
				+
			
 
				+
			
 
				+-------------------
			
 
				+
			
 
				+# Tutorial of YOLO series
			
 
				 
			
 
				 ## Requirements
			
 
				-- We recommend you to use Anaconda to create a conda environment:
			
 
				+- We recommend you to use Anaconda to create a conda environment. For example, we create a one named `yolo_tutorial` with python=3.10:
			
 
				 ```Shell
			
 
				-conda create -n odlab python=3.10
			
 
				+conda create -n yolo_tutorial python=3.10
			
 
				 ```
			
 
				 
			
 
				 - Then, activate the environment:
			
 
				 ```Shell
			
 
				-conda activate odlab
			
 
				+conda activate yolo_tutorial
			
 
				 ```
			
 
				 
			
 
				 - Requirements:
			
@@ -17,26 +268,24 @@ conda activate odlab
 
				 pip install -r requirements.txt 
			
 
				 ```
			
 
				 
			
 
				-2. (optional) Compile MSDeformableAttention ops for DETR series
			
 
				+2. (optional) Compile `MSDeformableAttention` ops for RT-DETR series
			
 
				 
			
 
				 ```bash
			
 
				-cd ./ppdet/modeling/transformers/ext_op/
			
 
				-
			
 
				+cd models/rtdetr/basic_modules/ext_op/
			
 
				 python setup_ms_deformable_attn_op.py install
			
 
				 ```
			
 
				-See [details](./models/detectors/rtdetr/basic_modules/ext_op/)
			
 
				 
			
 
				 My environment:
			
 
				 - PyTorch = 2.2.0
			
 
				 - Torchvision = 0.17.0
			
 
				 
			
 
				-At least, please make sure your torch is version 1.x.
			
 
				+At least, please make sure your torch >= 1.0.
			
 
				 
			
 
				 ## Experiments
			
 
				 ### VOC
			
 
				 - Download VOC.
			
 
				 ```Shell
			
 
				-cd <ODLab-World>
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				 cd dataset/scripts/
			
 
				 sh VOC2007.sh
			
 
				 sh VOC2012.sh
			
@@ -44,31 +293,25 @@ sh VOC2012.sh
 
				 
			
 
				 - Check VOC
			
 
				 ```Shell
			
 
				-cd <ODLab-World>
			
 
				-python dataset/voc.py
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+cd dataset/
			
 
				+python voc.py --is_train --aug_type yolo
			
 
				 ```
			
 
				 
			
 
				 ### COCO
			
 
				 
			
 
				 - Download COCO.
			
 
				 ```Shell
			
 
				-cd <ODLab-World>
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				 cd dataset/scripts/
			
 
				 sh COCO2017.sh
			
 
				 ```
			
 
				 
			
 
				-- Clean COCO
			
 
				-```Shell
			
 
				-cd <ODLab-World>
			
 
				-cd tools/
			
 
				-python clean_coco.py --root path/to/coco --image_set val
			
 
				-python clean_coco.py --root path/to/coco --image_set train
			
 
				-```
			
 
				-
			
 
				 - Check COCO
			
 
				 ```Shell
			
 
				-cd <ODLab-World>
			
 
				-python dataset/coco.py
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+cd dataset/
			
 
				+python coco.py --is_train --aug_type yolo
			
 
				 ```
			
 
				 
			
 
				 ## Train 
			
@@ -77,14 +320,14 @@ We kindly provide a script `train.sh` to run the training code. You need to foll
 
				 bash train.sh <model> <data> <data_path> <batch_size> <num_gpus> <master_port> <resume_weight>
			
 
				 ```
			
 
				 
			
 
				-For example, we use this script to train YOLO-N from the epoch-0:
			
 
				+For example, we use this script to train `YOLOv1-R18` from the epoch-0 with 4 gpus:
			
 
				 ```Shell
			
 
				-bash train.sh yolo_n coco path/to/coco 128 4 1699 None
			
 
				+bash train.sh yolov1_r18 coco path/to/coco 128 4 1699 None
			
 
				 ```
			
 
				 
			
 
				-We can also continue training from existing weights by passing the model's weight file to the resume parameter.
			
 
				+We can also continue training from the existing weight by passing the model's weight file to the resume parameter.
			
 
				 ```Shell
			
 
				-bash train.sh yolo_n coco path/to/coco 128 4 1699 path/to/yolo_n.pth
			
 
				+bash train.sh yolov1_r18 coco path/to/coco 128 4 1699 path/to/yolov1_r18_coco.pth
			
 
				 ```
			
 
				 
			
 
				 
			
@@ -116,33 +359,24 @@ CustomedDataset
 
				 ```
			
 
				 
			
 
				 - Step-2: Make the configuration for our dataset.
			
 
				-```Shell
			
 
				-cd <ODLab-World>
			
 
				-cd config/data_config
			
 
				-```
			
 
				-You need to edit the `dataset_cfg` defined in `dataset_config.py`. You can refer to the `customed` defined in `dataset_cfg` to modify the relevant parameters, such as `num_classes`, `classes_names`, to adapt to our dataset.
			
 
				+You need to edit the `customed_class_indexs` and `customed_class_labels` defined in `dataset/customed.py` to adapt to your customed dataset.
			
 
				 
			
 
				 For example:
			
 
				 ```Shell
			
 
				-dataset_cfg = {
			
 
				-    'customed':{
			
 
				-        'data_name': 'AnimalDataset',
			
 
				-        'num_classes': 9,
			
 
				-        'class_indexs': (0, 1, 2, 3, 4, 5, 6, 7, 8),
			
 
				-        'class_names': ('bird', 'butterfly', 'cat', 'cow', 'dog', 'lion', 'person', 'pig', 'tiger', ),
			
 
				-    },
			
 
				-}
			
 
				+# dataset/customed.py
			
 
				+customed_class_indexs = [0, 1, 2, 3, 4, 5, 6, 7, 8]
			
 
				+customed_class_labels = ('bird', 'butterfly', 'cat', 'cow', 'dog', 'lion', 'person', 'pig', 'tiger', )
			
 
				 ```
			
 
				 
			
 
				 - Step-3: Convert customed to COCO format.
			
 
				 
			
 
				 ```Shell
			
 
				-cd <ODLab-World>
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				 cd tools
			
 
				 # convert train split
			
 
				-python convert_ours_to_coco.py --root path/to/dataset/ --split train
			
 
				+python convert_ours_to_coco.py --root path/to/customed_dataset/ --split train
			
 
				 # convert val split
			
 
				-python convert_ours_to_coco.py --root path/to/dataset/ --split val
			
 
				+python convert_ours_to_coco.py --root path/to/customed_dataset/ --split val
			
 
				 ```
			
 
				 Then, we can get a `train.json` file and a `val.json` file, as shown below.
			
 
				 ```
			
@@ -173,40 +407,98 @@ CustomedDataset
 
				 - Step-4 Check the data.
			
 
				 
			
 
				 ```Shell
			
 
				-cd <ODLab-World>
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				 cd dataset
			
 
				 # convert train split
			
 
				-python customed.py --root path/to/dataset/ --split train
			
 
				+python customed.py --root path/to/customed_dataset/ --split train
			
 
				 # convert val split
			
 
				-python customed.py --root path/to/dataset/ --split val
			
 
				+python customed.py --root path/to/customed_dataset/ --split val
			
 
				 ```
			
 
				 
			
 
				-- Step-5 **Train**
			
 
				+- Step-5 **Train** on the customed dataset
			
 
				 
			
 
				 For example:
			
 
				 
			
 
				 ```Shell
			
 
				-cd <ODLab-World>
			
 
				-python train.py --root path/to/dataset/ -d customed -m yolo_n -bs 16 -p path/to/yolo_n_coco.pth
			
 
				+# With coco pretrained weight
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+python train.py --root path/to/customed_dataset/ -d customed -m yolov1_r18 -bs 16 -p path/to/yolov1_r18_coco.pth
			
 
				+```
			
 
				+
			
 
				+```Shell
			
 
				+# Without coco pretrained weight
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+python train.py --root path/to/customed_dataset/ -d customed -m yolov1_r18 -bs 16
			
 
				 ```
			
 
				 
			
 
				-- Step-6 **Test**
			
 
				+- Step-6 **Test** on the customed dataset
			
 
				 
			
 
				 For example:
			
 
				 
			
 
				 ```Shell
			
 
				-cd <ODLab-World>
			
 
				-python test.py --root path/to/dataset/ -d customed -m yolo_n --weight path/to/checkpoint --show
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+python test.py --root path/to/customed_dataset/ -d customed -m yolov1_r18 --weight path/to/checkpoint --show
			
 
				 ```
			
 
				 
			
 
				-- Step-7 **Eval**
			
 
				+- Step-7 **Eval** on the customed dataset
			
 
				 
			
 
				 For example:
			
 
				 
			
 
				 ```Shell
			
 
				-cd <ODLab-World>
			
 
				-python eval.py --root path/to/dataset/ -d customed -m yolo_n --weight path/to/checkpoint
			
 
				+cd <YOLO-TUTORIAL-V2/yolo/>
			
 
				+python eval.py --root path/to/customed_dataset/ -d customed -m yolov1_r18 --weight path/to/checkpoint
			
 
				 ```
			
 
				 
			
 
				-## Deployment
			
 
				-1. [ONNX export and an ONNXRuntime](./deployment/ONNXRuntime/)
			
 
				+## Demo
			
 
				+I have provide some images in `data/demo/images/`, so you can run following command to run a demo with coco pretrained model:
			
 
				+
			
 
				+```Shell
			
 
				+python demo.py --mode image \
			
 
				+               --path_to_img data/demo/images/ \
			
 
				+               --cuda \
			
 
				+               --img_size 640 \
			
 
				+               --model yolov1_r18 \
			
 
				+               --weight path/to/weight \
			
 
				+               --dataset coco \
			
 
				+               --num_classes 80 \
			
 
				+               --show
			
 
				+```
			
 
				+
			
 
				+If you want to try this command with voc pretrained model, you could refer to the following command:
			
 
				+```Shell
			
 
				+python demo.py --mode image \
			
 
				+               --path_to_img data/demo/images/ \
			
 
				+               --cuda \
			
 
				+               --img_size 640 \
			
 
				+               --model yolov1_r18 \
			
 
				+               --weight path/to/weight \
			
 
				+               --dataset voc \
			
 
				+               --num_classes 20 \
			
 
				+               --show
			
 
				+```
			
 
				+
			
 
				+
			
 
				+If you want run a demo of streaming video detection, you need to set `--mode` to `video`, and give the path to video `--path_to_vid`。
			
 
				+
			
 
				+```Shell
			
 
				+python demo.py --mode video \
			
 
				+               --path_to_vid data/demo/videos/your_video \
			
 
				+               --cuda \
			
 
				+               --img_size 640 \
			
 
				+               --model yolov1_r18 \
			
 
				+               --weight path/to/weight \
			
 
				+               --show \
			
 
				+               --gif
			
 
				+```
			
 
				+
			
 
				+If you want run video detection with your camera, you need to set `--mode` to `camera`。
			
 
				+
			
 
				+```Shell
			
 
				+python demo.py --mode camera \
			
 
				+               --cuda \
			
 
				+               --img_size 640 \
			
 
				+               --model yolov1_r18 \
			
 
				+               --weight path/to/weight \
			
 
				+               --show \
			
 
				+               --gif
			
 
				+```
			
--- a/yolo/dataset/coco.py
+++ b/yolo/dataset/coco.py
@@ -15,8 +15,6 @@ except:
 
				 coco_class_indexs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90]
			
 
				 coco_class_labels = ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat',  'traffic light',  'fire hydrant',  'stop sign',  'parking meter',  'bench',  'bird',  'cat',  'dog',  'horse',  'sheep',  'cow',  'elephant',  'bear',  'zebra',  'giraffe',  'backpack',  'umbrella',  'handbag',  'tie',  'suitcase',  'frisbee',  'skis',  'snowboard',  'sports ball',  'kite',  'baseball bat',  'baseball glove',  'skateboard',  'surfboard',  'tennis racket',  'bottle',  'wine glass',  'cup',  'fork',  'knife',  'spoon',  'bowl',  'banana',  'apple',  'sandwich',  'orange',  'broccoli',  'carrot',  'hot dog',  'pizza',  'donut',  'cake',  'chair',  'couch',  'potted plant',  'bed',  'dining table',  'toilet',  'tv',  'laptop',  'mouse',  'remote',  'keyboard',  'cell phone',  'microwave',  'oven',  'toaster',  'sink',  'refrigerator',  'book',  'clock',  'vase',  'scissors',  'teddy bear',  'hair drier',  'toothbrush')
			
 
				 coco_json_files = {
			
 
				-    'train2017_clean': 'instances_train2017_clean.json',
			
 
				-    'val2017_clean'  : 'instances_val2017_clean.json',
			
 
				     'train2017'      : 'instances_train2017.json',
			
 
				     'val2017'        : 'instances_val2017.json',
			
 
				     'test2017'       : 'image_info_test.json',
			
@@ -39,12 +37,8 @@ class COCODataset(Dataset):
 
				         self.use_mask  = use_mask
			
 
				         self.num_classes = 80
			
 
				         # ----------- Data parameters -----------
			
 
				-        try:
			
 
				-            self.json_file = coco_json_files['{}_clean'.format(image_set)]
			
 
				-            self.coco = COCO(os.path.join(self.data_dir, 'annotations', self.json_file))
			
 
				-        except:
			
 
				-            self.json_file = coco_json_files['{}'.format(image_set)]
			
 
				-            self.coco = COCO(os.path.join(self.data_dir, 'annotations', self.json_file))
			
 
				+        self.json_file = coco_json_files['{}'.format(image_set)]
			
 
				+        self.coco = COCO(os.path.join(self.data_dir, 'annotations', self.json_file))
			
 
				         self.ids = self.coco.getImgIds()
			
 
				         self.class_ids = sorted(self.coco.getCatIds())
			
 
				         self.dataset_size = len(self.ids)
			
--- a/yolo/dataset/customed.py
+++ b/yolo/dataset/customed.py
@@ -28,7 +28,6 @@ class CustomedDataset(Dataset):
 
				         self.image_set = image_set
			
 
				         self.is_train  = is_train
			
 
				         self.num_classes = len(customed_class_labels)
			
 
				-        self.num_classes = 9
			
 
				         # ----------- Path parameters -----------
			
 
				         self.data_dir = data_dir
			
 
				         self.json_file = '{}.json'.format(image_set)
			
@@ -200,6 +199,8 @@ if __name__ == "__main__":
 
				                         help='data root')
			
 
				     parser.add_argument('--is_train', action="store_true", default=False,
			
 
				                         help='mixup augmentation.')
			
 
				+    parser.add_argument('--aug_type', default="yolo", type=str, choices=["yolo", "ssd"],
			
 
				+                        help='yolo, ssd.')
			
 
				     
			
 
				     args = parser.parse_args()
			
 
				 
			
@@ -218,7 +219,6 @@ if __name__ == "__main__":
 
				             ## Transforms
			
 
				             self.train_img_size = 640
			
 
				             self.test_img_size  = 640
			
 
				-            self.random_crop_size = [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800]
			
 
				             self.use_ablu = True
			
 
				             self.aug_type = 'yolo'
			
 
				             self.affine_params = {
			
@@ -232,7 +232,7 @@ if __name__ == "__main__":
 
				                 'hsv_v': 0.4,
			
 
				             }
			
 
				 
			
 
				-    class RTDetrBaseConfig(object):
			
 
				+    class SSDBaseConfig(object):
			
 
				         def __init__(self) -> None:
			
 
				             self.max_stride = 32
			
 
				             # ---------------- Data process config ----------------
			
@@ -247,12 +247,12 @@ if __name__ == "__main__":
 
				             ## Transforms
			
 
				             self.train_img_size = 640
			
 
				             self.test_img_size  = 640
			
 
				-            self.aug_type = 'rtdetr'
			
 
				+            self.aug_type = 'ssd'
			
 
				 
			
 
				     if args.aug_type == "yolo":
			
 
				         cfg = YoloBaseConfig()
			
 
				-    elif args.aug_type == "rtdetr":
			
 
				-        cfg = RTDetrBaseConfig()
			
 
				+    elif args.aug_type == "ssd":
			
 
				+        cfg = SSDBaseConfig()
			
 
				 
			
 
				     transform = build_transform(cfg, args.is_train)
			
 
				     dataset = CustomedDataset(cfg, args.root, 'val', transform, args.is_train)
			
--- a/yolo/demo.py
+++ b/yolo/demo.py
@@ -16,8 +16,11 @@ from utils.box_ops import rescale_bboxes
 
				 from utils.vis_tools import visualize
			
 
				 
			
 
				 from models import build_model
			
 
				-from config import build_config
			
 
				+from config import build_config\
			
 
				+
			
 
				+from dataset.voc  import voc_class_labels
			
 
				 from dataset.coco import coco_class_labels
			
 
				+from dataset.customed import customed_class_labels
			
 
				 
			
 
				 
			
 
				 def parse_args():
			
@@ -43,18 +46,14 @@ def parse_args():
 
				     # Model setting
			
 
				     parser.add_argument('-m', '--model', default='yolo_n', type=str,
			
 
				                         help='build yolo')
			
 
				-    parser.add_argument('-nc', '--num_classes', default=80, type=int,
			
 
				-                        help='number of classes.')
			
 
				     parser.add_argument('--weight', default=None,
			
 
				                         type=str, help='Trained state_dict file path to open')
			
 
				-    parser.add_argument("--deploy", action="store_true", default=False,
			
 
				-                        help="deploy mode or not")
			
 
				     parser.add_argument('--fuse_conv_bn', action='store_true', default=False,
			
 
				                         help='fuse Conv & BN')
			
 
				 
			
 
				     # Data setting
			
 
				     parser.add_argument('-d', '--dataset', default='coco',
			
 
				-                        help='coco, voc, crowdhuman, widerface.')
			
 
				+                        help='coco, voc, customed.')
			
 
				 
			
 
				     return parser.parse_args()
			
 
				                     
			
@@ -86,7 +85,8 @@ def detect(args,
 
				         print(save_video_name)
			
 
				         image_list = []
			
 
				 
			
 
				-        cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)
			
 
				+        # 笔记本摄像头，index=0；外接摄像头，index=1；
			
 
				+        cap = cv2.VideoCapture(index=0, apiPreference=cv2.CAP_DSHOW)
			
 
				         while True:
			
 
				             ret, frame = cap.read()
			
 
				             if ret:
			
@@ -252,30 +252,41 @@ def detect(args,
 
				 
			
 
				 def run():
			
 
				     args = parse_args()
			
 
				+    # Dataset config
			
 
				+    if   args.dataset == "voc":
			
 
				+        cfg.num_classes = 20
			
 
				+        cfg.class_labels = voc_class_labels
			
 
				+    elif args.dataset == "coco":
			
 
				+        cfg.num_classes = 80
			
 
				+        cfg.class_labels = coco_class_labels
			
 
				+    elif args.dataset == "customed":
			
 
				+        cfg.num_classes = len(customed_class_labels)
			
 
				+        cfg.class_labels = customed_class_labels
			
 
				+    else:
			
 
				+        raise NotImplementedError("Unknown dataset: {}".format(args.dataset))
			
 
				+    
			
 
				     # cuda
			
 
				-    if args.cuda:
			
 
				+    if args.cuda and torch.cuda.is_available():
			
 
				         print('use cuda')
			
 
				         device = torch.device("cuda")
			
 
				     else:
			
 
				         device = torch.device("cpu")
			
 
				 
			
 
				-    # config
			
 
				+    # Build config
			
 
				     cfg = build_config(args)
			
 
				-    cfg.num_classes = 80
			
 
				-    cfg.class_labels = coco_class_labels
			
 
				-    
			
 
				-    # build model
			
 
				+
			
 
				+    # Build model
			
 
				     model = build_model(args, cfg, False)
			
 
				 
			
 
				-    # load trained weight
			
 
				+    # Load trained weight
			
 
				     model = load_weight(model, args.weight, args.fuse_conv_bn)
			
 
				     model.to(device).eval()
			
 
				 
			
 
				-    # transform
			
 
				+    # Build transform
			
 
				     transform = build_transform(cfg, is_train=False)
			
 
				 
			
 
				     print("================= DETECT =================")
			
 
				-    # run
			
 
				+    # Run demo
			
 
				     detect(args         = args,
			
 
				            mode         = args.mode,
			
 
				            model        = model, 
			
--- a/yolo/tools/__init__.py
+++ b/yolo/tools/__init__.py
--- a/yolo/tools/convert_ours_to_coco.py
+++ b/yolo/tools/convert_ours_to_coco.py
@@ -0,0 +1,149 @@
 
				+import os
			
 
				+import json
			
 
				+import xml.etree.ElementTree as ET
			
 
				+import glob
			
 
				+
			
 
				+import sys
			
 
				+sys.path.append("..")
			
 
				+from dataset.customed import customed_class_labels
			
 
				+num_classes = len(customed_class_labels)
			
 
				+categories  = customed_class_labels
			
 
				+START_BOUNDING_BOX_ID = 1
			
 
				+PRE_DEFINE_CATEGORIES = {categories[i]: i + 1 for i in range(num_classes)} 
			
 
				+
			
 
				+
			
 
				+def get(root, name):
			
 
				+    vars = root.findall(name)
			
 
				+    return vars
			
 
				+
			
 
				+def get_and_check(root, name, length):
			
 
				+    vars = root.findall(name)
			
 
				+    if len(vars) == 0:
			
 
				+        raise ValueError("Can not find %s in %s." % (name, root.tag))
			
 
				+    if length > 0 and len(vars) != length:
			
 
				+        raise ValueError(
			
 
				+            "The size of %s is supposed to be %d, but is %d."
			
 
				+            % (name, length, len(vars))
			
 
				+        )
			
 
				+    if length == 1:
			
 
				+        vars = vars[0]
			
 
				+    return vars
			
 
				+
			
 
				+def get_filename_as_int(filename):
			
 
				+    try:
			
 
				+        filename = filename.replace("\\", "/")
			
 
				+        filename = os.path.splitext(os.path.basename(filename))[0]
			
 
				+        return int(filename)
			
 
				+    except:
			
 
				+        raise ValueError("Filename %s is supposed to be an integer." % (filename))
			
 
				+
			
 
				+def get_categories(xml_files):
			
 
				+    """Generate category name to id mapping from a list of xml files.
			
 
				+    
			
 
				+    Arguments:
			
 
				+        xml_files {list} -- A list of xml file paths.
			
 
				+    
			
 
				+    Returns:
			
 
				+        dict -- category name to id mapping.
			
 
				+    """
			
 
				+    classes_names = []
			
 
				+    for xml_file in xml_files:
			
 
				+        tree = ET.parse(xml_file)
			
 
				+        root = tree.getroot()
			
 
				+        for member in root.findall("object"):
			
 
				+            classes_names.append(member[0].text)
			
 
				+    classes_names = list(set(classes_names))
			
 
				+    classes_names.sort()
			
 
				+    return {name: i for i, name in enumerate(classes_names)}
			
 
				+
			
 
				+def convert(xml_files, json_file):
			
 
				+    json_dict = {"images": [], "type": "instances", "annotations": [], "categories": []}
			
 
				+    if PRE_DEFINE_CATEGORIES is not None:
			
 
				+        categories = PRE_DEFINE_CATEGORIES
			
 
				+    else:
			
 
				+        categories = get_categories(xml_files)
			
 
				+    bnd_id = START_BOUNDING_BOX_ID
			
 
				+    for i, xml_file in enumerate(xml_files):
			
 
				+        if i % 100 == 0:
			
 
				+            print('[{}] / [{}]'.format(i, len(xml_files)))
			
 
				+        tree = ET.parse(xml_file)
			
 
				+        root = tree.getroot()
			
 
				+        filename = get_and_check(root, "filename", 1).text
			
 
				+        ## The filename must be a number
			
 
				+        image_id = get_filename_as_int(filename)
			
 
				+        size = get_and_check(root, "size", 1)
			
 
				+        width = int(get_and_check(size, "width", 1).text)
			
 
				+        height = int(get_and_check(size, "height", 1).text)
			
 
				+        image = {
			
 
				+            "file_name": filename,
			
 
				+            "height": height,
			
 
				+            "width": width,
			
 
				+            "id": image_id,
			
 
				+        }
			
 
				+        json_dict["images"].append(image)
			
 
				+        ## Currently we do not support segmentation.
			
 
				+        #  segmented = get_and_check(root, 'segmented', 1).text
			
 
				+        #  assert segmented == '0'
			
 
				+        for obj in get(root, "object"):
			
 
				+            category = get_and_check(obj, "name", 1).text
			
 
				+            if category not in categories:
			
 
				+                new_id = len(categories)
			
 
				+                categories[category] = new_id
			
 
				+            category_id = categories[category]
			
 
				+            bndbox = get_and_check(obj, "bndbox", 1)
			
 
				+            xmin = int(get_and_check(bndbox, "xmin", 1).text) - 1
			
 
				+            ymin = int(get_and_check(bndbox, "ymin", 1).text) - 1
			
 
				+            xmax = int(get_and_check(bndbox, "xmax", 1).text)
			
 
				+            ymax = int(get_and_check(bndbox, "ymax", 1).text)
			
 
				+            assert xmax > xmin
			
 
				+            assert ymax > ymin
			
 
				+            o_width = abs(xmax - xmin)
			
 
				+            o_height = abs(ymax - ymin)
			
 
				+            ann = {
			
 
				+                "area": o_width * o_height,
			
 
				+                "iscrowd": 0,
			
 
				+                "image_id": image_id,
			
 
				+                "bbox": [xmin, ymin, o_width, o_height],
			
 
				+                "category_id": category_id,
			
 
				+                "id": bnd_id,
			
 
				+                "ignore": 0,
			
 
				+                "segmentation": [],
			
 
				+            }
			
 
				+            json_dict["annotations"].append(ann)
			
 
				+            bnd_id = bnd_id + 1
			
 
				+
			
 
				+    for cate, cid in categories.items():
			
 
				+        cat = {"supercategory": "none", "id": cid, "name": cate}
			
 
				+        json_dict["categories"].append(cat)
			
 
				+
			
 
				+    os.makedirs(os.path.dirname(json_file), exist_ok=True)
			
 
				+    json_fp = open(json_file, "w")
			
 
				+    json_str = json.dumps(json_dict)
			
 
				+    json_fp.write(json_str)
			
 
				+    json_fp.close()
			
 
				+
			
 
				+
			
 
				+if __name__ == "__main__":
			
 
				+    import argparse
			
 
				+
			
 
				+    parser = argparse.ArgumentParser(
			
 
				+        description="Convert VOC-style annotation labele by LabelImg to COCO format."
			
 
				+    )
			
 
				+    parser.add_argument("--root", default="path/to/customed_dataset", type=str,
			
 
				+                        help="Directory path to dataset.", )
			
 
				+    parser.add_argument("--split", default='train', 
			
 
				+                        help="split of dataset.", type=str)
			
 
				+    parser.add_argument("-anno", "--annotations", default='annotations', 
			
 
				+                        help="Directory path to xml files.", type=str)
			
 
				+    parser.add_argument("-json", "--json_file", default='train.json',
			
 
				+                        help="Output COCO format json file.", type=str)
			
 
				+    args = parser.parse_args()
			
 
				+
			
 
				+    data_dir = os.path.join(args.root, args.split)
			
 
				+    anno_path = os.path.join(data_dir, args.annotations)
			
 
				+    xml_files = glob.glob(os.path.join(anno_path, "*.xml"))
			
 
				+    json_file = os.path.join(data_dir, args.annotations, '{}.json'.format(args.split))
			
 
				+    print("Number of xml files: {}".format(len(xml_files)))
			
 
				+    print("Converting to COCO format ...")
			
 
				+    convert(xml_files, json_file)
			
 
				+    print("Success: {}".format(json_file))
			
--- a/yolo/train.py
+++ b/yolo/train.py
@@ -196,7 +196,7 @@ def train():
 
				         trainer.eval(model_eval)
			
 
				         return
			
 
				 
			
 
				-    garbage = torch.randn(640, 1024, 73, 73).to(device) # 15 G
			
 
				+    # garbage = torch.randn(640, 1024, 73, 73).to(device) # 15 G
			
 
				 
			
 
				     # ---------------------------- Train pipeline ----------------------------
			
 
				     trainer.train(model)