Qwen3-4b playtesting the second draft of an RLVR environment of Mira's conceptualization.
Focus on one-shot roleplaying scenarios, even division of silly and serious, both narrative and problem-solving.
100 steps, cosine decay, batch size 4, learning rate 1e-5, rank 128, alpha 128.
They seemed fun, so releasing the merged model also, not just the adapter :)
- Downloads last month
- 15
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support