yjh0410 2a3974d075 add dl tutorial		1 year ago
..
data	2a3974d075 add dl tutorial	1 year ago
models	2a3974d075 add dl tutorial	1 year ago
utils	2a3974d075 add dl tutorial	1 year ago
.gitignore	2a3974d075 add dl tutorial	1 year ago
README.md	2a3974d075 add dl tutorial	1 year ago
engine_finetune.py	2a3974d075 add dl tutorial	1 year ago
engine_pretrain.py	2a3974d075 add dl tutorial	1 year ago
main_finetune.py	2a3974d075 add dl tutorial	1 year ago
main_pretrain.py	2a3974d075 add dl tutorial	1 year ago
requirements.txt	2a3974d075 add dl tutorial	1 year ago

Masked AutoEncoder

1. Pretrain

We have kindly provided the bash script main_pretrain.sh file for pretraining. You can modify some hyperparameters in the script file according to your own needs.

cd Vision-Pretraining-Tutorial/masked_image_modeling/
python main_pretrain.py --cuda \
                        --dataset cifar10 \
                        --model vit_t \
                        --mask_ratio 0.75 \
                        --batch_size 128 \
                        --optimizer adamw \
                        --weight_decay 0.05 \
                        --lr_scheduler cosine \
                        --base_lr 0.00015 \
                        --min_lr 0.0 \
                        --max_epoch 400 \
                        --eval_epoch 20

2. Finetune

We have kindly provided the bash script main_finetune.sh file for finetuning. You can modify some hyperparameters in the script file according to your own needs.

cd Vision-Pretraining-Tutorial/masked_image_modeling/
python main_finetune.py --cuda \
                        --dataset cifar10 \
                        --model vit_t \
                        --batch_size 256 \
                        --optimizer adamw \
                        --weight_decay 0.05 \
                        --base_lr 0.0005 \
                        --min_lr 0.000001 \
                        --max_epoch 100 \
                        --wp_epoch 5 \
                        --eval_epoch 5 \
                        --pretrained path/to/vit_t.pth

3. Evaluate

Evaluate the top1 & top5 accuracy of ViT-Tiny on CIFAR10 dataset:

python main_finetune.py --cuda \
                    --dataset cifar10 \
                    -m vit_t \
                    --batch_size 256 \
                    --eval \
                    --resume path/to/vit_t_cifar10.pth

4. Visualize Image Reconstruction

Evaluate ViT-Tiny pretrained by MAE framework on CIFAR10 dataset:

python main_pretrain.py --cuda \
                    --dataset cifar10 \
                    -m vit_t \
                    --resume path/to/mae_vit_t_cifar10.pth \
                    --eval \
                    --batch_size 1

5. Experiments

On CIFAR10

Method	Model	Epoch	Top 1	Weight	MAE weight
MAE	ViT-T	100	91.2	ckpt	ckpt

6. Acknowledgment

Thank you to Kaiming He for his inspiring work on MAE. His research effectively elucidates the semantic distinctions between vision and language, offering valuable insights for subsequent vision-related studies. I would also like to express my gratitude for the official source code of MAE. Additionally, I appreciate the efforts of IcarusWizard for reproducing the MAE implementation.

README.md