DeepMoE EfficientNet-B0 fine-tuned on iNaturalist 2019
This model is a Mixture-of-Experts (DeepMoE) variant of EfficientNet-B0, fine-tuned on the iNaturalist 2019 dataset to optimize both accuracy and computational efficiency (FLOP reduction).
Training Results
- Final Score (Acc/FLOPs composite): 82.7375
- Final Validation Accuracy: 68.2%
- Expert Activation Ratio: 31.5%
- FLOPs Usage: 56.5% (compared to baseline B0)
- Baseline B0 Reference FLOPs: 388,184,000
- Total Runtime: 5402.31 seconds
Hyperparameters
- Batch Size: 256
- Gradient Accumulation Steps: 4
- Weight Decay: 0.005
Epochs
- Total Epochs: 10
- Joint Training Epochs: 10
- Routing-Frozen Finetuning Epochs: 0
DeepMoE Architecture & Routing
- MoE Start Stage: 1
- Latent Dimension: 32
- Sparsity Penalty ($\lambda_g$): 0.0007
- Target Sparsity ($\mu$): 0.5
- ReLU Init (Val / Std): 1 / 1
Learning Rates
- MoE Routing Parameters: 1.00e-02
- Classification Head: 2.00e-02
- Base Model (Body): 2.00e-03
- Finetune Phase (Frozen Routing): 0.00e+00
Training was tracked using Weights & Biases.
- Downloads last month
- 41