DeepMoE EfficientNet-B0 fine-tuned on iNaturalist 2019

This model is a Mixture-of-Experts (DeepMoE) variant of EfficientNet-B0, fine-tuned on the iNaturalist 2019 dataset to optimize both accuracy and computational efficiency (FLOP reduction).

Training Results

  • Final Score (Acc/FLOPs composite): 82.7375
  • Final Validation Accuracy: 68.2%
  • Expert Activation Ratio: 31.5%
  • FLOPs Usage: 56.5% (compared to baseline B0)
  • Baseline B0 Reference FLOPs: 388,184,000
  • Total Runtime: 5402.31 seconds

Hyperparameters

  • Batch Size: 256
  • Gradient Accumulation Steps: 4
  • Weight Decay: 0.005

Epochs

  • Total Epochs: 10
    • Joint Training Epochs: 10
    • Routing-Frozen Finetuning Epochs: 0

DeepMoE Architecture & Routing

  • MoE Start Stage: 1
  • Latent Dimension: 32
  • Sparsity Penalty ($\lambda_g$): 0.0007
  • Target Sparsity ($\mu$): 0.5
  • ReLU Init (Val / Std): 1 / 1

Learning Rates

  • MoE Routing Parameters: 1.00e-02
  • Classification Head: 2.00e-02
  • Base Model (Body): 2.00e-03
  • Finetune Phase (Frozen Routing): 0.00e+00

Training was tracked using Weights & Biases.

Downloads last month
41
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support