Memory-SmolVLA 路 LIBERO 路 mean-pool bank 路 V4 hyperparameters
SmolVLA finetuned on the HuggingFaceVLA/libero dataset with a temporal memory-bank wrapper. The bank stores one mean-pooled token per write (compression_mode=mean_pool, ~170x reduction of the prefix). Bank capacity 16, FIFO eviction, write_stride=50, residual gate, injection at VLM layer 8. Action expert + state projection finetuned at lr=1e-5; memory modules trained from scratch at lr=1e-4. 30,000 optimizer steps, grad_accum=1.
Repository (training code, eval pipeline, run logs): https://github.com/aleksantari/memory-smolVLA/tree/claude/feature/v5-all-fixes
Base model: HuggingFaceVLA/smolvla_libero
Eval results (sim, LIBERO via lerobot pipeline)
5 episodes 脳 10 tasks per suite (n=50/suite). All numbers run through
scripts/eval_memory_libero_v2.py against the upstream
HuggingFaceVLA/smolvla_libero baseline measured the same way.
| Suite | Baseline smolvla_libero (no memory) |
This model (no fix) | This model + step_increment=1 |
|---|---|---|---|
| libero_spatial | 76% | 64% | 62% |
| libero_object | 86% | 86% | 88% |
| libero_goal | 82% | 80% | 76% |
| libero_10 | 42% | 30% | 44% |
| Overall | 71.5% | 65% | 67.5% |
Important caveat on step_increment: the inference-time policy parameter
step_increment must be set to 1 to match the bank-fill cadence the
model was trained against. The default of 50 (= chunk_size) makes the bank
cycle every 16 env steps instead of every 800, and at inference the bank
ends up holding only the last 16 env steps' prefixes (effectively a short
lookback) instead of the temporally-spread snapshots training expected.
Without the fix, libero_10 reads 30%; with it, 44% (recovers most of the
gap, neutral on baseline within sampling noise of n=50).
Honest reading on whether memory helps long-horizon: with 50-episode
binomial SE ~7pp, the 44% (memory + step_inc=1) vs 42% (baseline) gap is
within noise. Comparing memory-on (44%) vs same-finetune-no-memory (48%,
from the zeroproj ablation at 5ep) suggests memory is approximately
neutral or slightly negative on libero_10 at this scale. We have NOT
proved memory improves long-horizon performance; we have proved that the
architecture and training recipe are sound and the previous catastrophic
"0% in sim" results were eval-pipeline bugs.
How to use
git clone https://github.com/aleksantari/memory-smolVLA
cd memory-smolVLA
git checkout claude/feature/v5-all-fixes
pip install -e ".[dev]"
# Download this model:
huggingface-cli download tarmus/memory-smolvla-libero-meanpool-v4hp final.pt config.yaml --local-dir checkpoints/meanpool_v4hp
# Eval (in WSL with osmesa headless rendering):
export MUJOCO_GL=osmesa PYOPENGL_PLATFORM=osmesa PYTHONPATH=/path/to/LIBERO
python scripts/eval_memory_libero_v2.py \
--checkpoint checkpoints/meanpool_v4hp/final.pt \
--config checkpoints/meanpool_v4hp/config.yaml \
--suite libero_10 \
--n-episodes 5 \
--step-increment 1 # critical, see results table
Training recipe (V4 optimizer hyperparameters)
| Base | HuggingFaceVLA/smolvla_libero |
| training_mode | expert_finetune (action expert + state proj unfrozen, VLM frozen) |
| total_steps | 30,000 |
| grad_accum_steps | 1 |
| memory_lr | 1e-4 |
| expert_lr | 1e-5 |
| warmup_steps | 500 |
| weight_decay | 1e-4 |
| max_grad_norm | 1.0 |
| Hardware | RTX 5080 (Windows) / WSL Ubuntu-22.04 for sim eval |
See RUN_LOG.md in the repo for the full diagnostic narrative including
why the V4 hyperparameters work where the broader v5 plan's batch fix
hyperparameters (expert_lr=1e-4, 3000 steps, grad_accum=32) destroyed
the action expert.
- Downloads last month
- 7