yjh0410 2a3974d075 add dl tutorial 1 year ago
..
data 2a3974d075 add dl tutorial 1 year ago
models 2a3974d075 add dl tutorial 1 year ago
utils 2a3974d075 add dl tutorial 1 year ago
.gitignore 2a3974d075 add dl tutorial 1 year ago
README.md 2a3974d075 add dl tutorial 1 year ago
engine_finetune.py 2a3974d075 add dl tutorial 1 year ago
engine_pretrain.py 2a3974d075 add dl tutorial 1 year ago
main_finetune.py 2a3974d075 add dl tutorial 1 year ago
main_pretrain.py 2a3974d075 add dl tutorial 1 year ago
requirements.txt 2a3974d075 add dl tutorial 1 year ago

README.md

Masked AutoEncoder

1. Pretrain

We have kindly provided the bash script main_pretrain.sh file for pretraining. You can modify some hyperparameters in the script file according to your own needs.

cd Vision-Pretraining-Tutorial/masked_image_modeling/
python main_pretrain.py --cuda \
                        --dataset cifar10 \
                        --model vit_t \
                        --mask_ratio 0.75 \
                        --batch_size 128 \
                        --optimizer adamw \
                        --weight_decay 0.05 \
                        --lr_scheduler cosine \
                        --base_lr 0.00015 \
                        --min_lr 0.0 \
                        --max_epoch 400 \
                        --eval_epoch 20

2. Finetune

We have kindly provided the bash script main_finetune.sh file for finetuning. You can modify some hyperparameters in the script file according to your own needs.

cd Vision-Pretraining-Tutorial/masked_image_modeling/
python main_finetune.py --cuda \
                        --dataset cifar10 \
                        --model vit_t \
                        --batch_size 256 \
                        --optimizer adamw \
                        --weight_decay 0.05 \
                        --base_lr 0.0005 \
                        --min_lr 0.000001 \
                        --max_epoch 100 \
                        --wp_epoch 5 \
                        --eval_epoch 5 \
                        --pretrained path/to/vit_t.pth

3. Evaluate

  • Evaluate the top1 & top5 accuracy of ViT-Tiny on CIFAR10 dataset:

    python main_finetune.py --cuda \
                        --dataset cifar10 \
                        -m vit_t \
                        --batch_size 256 \
                        --eval \
                        --resume path/to/vit_t_cifar10.pth
    

4. Visualize Image Reconstruction

  • Evaluate ViT-Tiny pretrained by MAE framework on CIFAR10 dataset:

    python main_pretrain.py --cuda \
                        --dataset cifar10 \
                        -m vit_t \
                        --resume path/to/mae_vit_t_cifar10.pth \
                        --eval \
                        --batch_size 1
    

5. Experiments

  • On CIFAR10
Method Model Epoch Top 1 Weight MAE weight
MAE ViT-T 100 91.2 ckpt ckpt

6. Acknowledgment

Thank you to Kaiming He for his inspiring work on MAE. His research effectively elucidates the semantic distinctions between vision and language, offering valuable insights for subsequent vision-related studies. I would also like to express my gratitude for the official source code of MAE. Additionally, I appreciate the efforts of IcarusWizard for reproducing the MAE implementation.