Memory-SmolVLA 路 LIBERO 路 mean-pool bank 路 V4 hyperparameters

SmolVLA finetuned on the HuggingFaceVLA/libero dataset with a temporal memory-bank wrapper. The bank stores one mean-pooled token per write (compression_mode=mean_pool, ~170x reduction of the prefix). Bank capacity 16, FIFO eviction, write_stride=50, residual gate, injection at VLM layer 8. Action expert + state projection finetuned at lr=1e-5; memory modules trained from scratch at lr=1e-4. 30,000 optimizer steps, grad_accum=1.

Repository (training code, eval pipeline, run logs): https://github.com/aleksantari/memory-smolVLA/tree/claude/feature/v5-all-fixes

Base model: HuggingFaceVLA/smolvla_libero

Eval results (sim, LIBERO via lerobot pipeline)

5 episodes 脳 10 tasks per suite (n=50/suite). All numbers run through scripts/eval_memory_libero_v2.py against the upstream HuggingFaceVLA/smolvla_libero baseline measured the same way.

Suite Baseline smolvla_libero (no memory) This model (no fix) This model + step_increment=1
libero_spatial 76% 64% 62%
libero_object 86% 86% 88%
libero_goal 82% 80% 76%
libero_10 42% 30% 44%
Overall 71.5% 65% 67.5%

Important caveat on step_increment: the inference-time policy parameter step_increment must be set to 1 to match the bank-fill cadence the model was trained against. The default of 50 (= chunk_size) makes the bank cycle every 16 env steps instead of every 800, and at inference the bank ends up holding only the last 16 env steps' prefixes (effectively a short lookback) instead of the temporally-spread snapshots training expected. Without the fix, libero_10 reads 30%; with it, 44% (recovers most of the gap, neutral on baseline within sampling noise of n=50).

Honest reading on whether memory helps long-horizon: with 50-episode binomial SE ~7pp, the 44% (memory + step_inc=1) vs 42% (baseline) gap is within noise. Comparing memory-on (44%) vs same-finetune-no-memory (48%, from the zeroproj ablation at 5ep) suggests memory is approximately neutral or slightly negative on libero_10 at this scale. We have NOT proved memory improves long-horizon performance; we have proved that the architecture and training recipe are sound and the previous catastrophic "0% in sim" results were eval-pipeline bugs.

How to use

git clone https://github.com/aleksantari/memory-smolVLA
cd memory-smolVLA
git checkout claude/feature/v5-all-fixes
pip install -e ".[dev]"

# Download this model:
huggingface-cli download tarmus/memory-smolvla-libero-meanpool-v4hp final.pt config.yaml --local-dir checkpoints/meanpool_v4hp

# Eval (in WSL with osmesa headless rendering):
export MUJOCO_GL=osmesa PYOPENGL_PLATFORM=osmesa PYTHONPATH=/path/to/LIBERO
python scripts/eval_memory_libero_v2.py \
    --checkpoint checkpoints/meanpool_v4hp/final.pt \
    --config checkpoints/meanpool_v4hp/config.yaml \
    --suite libero_10 \
    --n-episodes 5 \
    --step-increment 1                 # critical, see results table

Training recipe (V4 optimizer hyperparameters)

Base HuggingFaceVLA/smolvla_libero
training_mode expert_finetune (action expert + state proj unfrozen, VLM frozen)
total_steps 30,000
grad_accum_steps 1
memory_lr 1e-4
expert_lr 1e-5
warmup_steps 500
weight_decay 1e-4
max_grad_norm 1.0
Hardware RTX 5080 (Windows) / WSL Ubuntu-22.04 for sim eval

See RUN_LOG.md in the repo for the full diagnostic narrative including why the V4 hyperparameters work where the broader v5 plan's batch fix hyperparameters (expert_lr=1e-4, 3000 steps, grad_accum=32) destroyed the action expert.

Downloads last month
7
Video Preview
loading

Model tree for tarmus/memory-smolvla-libero-meanpool-v4hp

Finetuned
(2)
this model