Memory-SmolVLA · LIBERO · mean-pool bank · V4 hyperparameters

SmolVLA finetuned on the HuggingFaceVLA/libero dataset with a temporal memory-bank wrapper. The bank stores one mean-pooled token per write (compression_mode=mean_pool, ~170x reduction of the prefix). Bank capacity 16, FIFO eviction, write_stride=50, residual gate, injection at VLM layer 8. Action expert + state projection finetuned at lr=1e-5; memory modules trained from scratch at lr=1e-4. 30,000 optimizer steps, grad_accum=1.

Repository (training code, eval pipeline, run logs): https://github.com/aleksantari/memory-smolVLA/tree/claude/feature/v5-all-fixes

Base model: HuggingFaceVLA/smolvla_libero

Eval results (sim, LIBERO via lerobot pipeline)

5 episodes × 10 tasks per suite (n=50/suite). All numbers run through scripts/eval_memory_libero_v2.py against the upstream HuggingFaceVLA/smolvla_libero baseline measured the same way.

Suite	Baseline `smolvla_libero` (no memory)	This model (no fix)	This model + `step_increment=1`
libero_spatial	76%	64%	62%
libero_object	86%	86%	88%
libero_goal	82%	80%	76%
libero_10	42%	30%	44%
Overall	71.5%	65%	67.5%

Important caveat on step_increment: the inference-time policy parameter step_increment must be set to 1 to match the bank-fill cadence the model was trained against. The default of 50 (= chunk_size) makes the bank cycle every 16 env steps instead of every 800, and at inference the bank ends up holding only the last 16 env steps' prefixes (effectively a short lookback) instead of the temporally-spread snapshots training expected. Without the fix, libero_10 reads 30%; with it, 44% (recovers most of the gap, neutral on baseline within sampling noise of n=50).

Honest reading on whether memory helps long-horizon: with 50-episode binomial SE ~7pp, the 44% (memory + step_inc=1) vs 42% (baseline) gap is within noise. Comparing memory-on (44%) vs same-finetune-no-memory (48%, from the zeroproj ablation at 5ep) suggests memory is approximately neutral or slightly negative on libero_10 at this scale. We have NOT proved memory improves long-horizon performance; we have proved that the architecture and training recipe are sound and the previous catastrophic "0% in sim" results were eval-pipeline bugs.

How to use

git clone https://github.com/aleksantari/memory-smolVLA
cd memory-smolVLA
git checkout claude/feature/v5-all-fixes
pip install -e ".[dev]"

# Download this model:
huggingface-cli download tarmus/memory-smolvla-libero-meanpool-v4hp final.pt config.yaml --local-dir checkpoints/meanpool_v4hp

# Eval (in WSL with osmesa headless rendering):
export MUJOCO_GL=osmesa PYOPENGL_PLATFORM=osmesa PYTHONPATH=/path/to/LIBERO
python scripts/eval_memory_libero_v2.py \
    --checkpoint checkpoints/meanpool_v4hp/final.pt \
    --config checkpoints/meanpool_v4hp/config.yaml \
    --suite libero_10 \
    --n-episodes 5 \
    --step-increment 1                 # critical, see results table

Training recipe (V4 optimizer hyperparameters)


Base	`HuggingFaceVLA/smolvla_libero`
training_mode	`expert_finetune` (action expert + state proj unfrozen, VLM frozen)
total_steps	30,000
grad_accum_steps	1
memory_lr	1e-4
expert_lr	1e-5
warmup_steps	500
weight_decay	1e-4
max_grad_norm	1.0
Hardware	RTX 5080 (Windows) / WSL Ubuntu-22.04 for sim eval

See RUN_LOG.md in the repo for the full diagnostic narrative including why the V4 hyperparameters work where the broader v5 plan's batch fix hyperparameters (expert_lr=1e-4, 3000 steps, grad_accum=32) destroyed the action expert.

Downloads last month: 7

Video Preview

Robotics

Model tree for tarmus/memory-smolvla-libero-meanpool-v4hp

Base model

lerobot/smolvla_base

Finetuned

HuggingFaceVLA/smolvla_libero

Finetuned

(2)

this model