Spock Brutal vs. Architect18?
Have just been going through your "Top Performers"... which 42B model would you currently see as the best overall performer? And a bonus question: would scaling these models beyond 42B continue to improve something? (I think I saw a 54B somewhere...?) 🖖
That is one of the best questions in a while :)
As we added models to the mix, the brainstorming has more of a room effect than anything else.
In early models, and even up to Spock-Brutal-Recall the center was on the user. Your Spock will be aligned with you, and will summon other characters as needed, but emerges as a prominent assistant and guides the experience(until you change it to another guy).
In the multi-fold merges like Architect and now Element, the emphasis is on the collective mind. The system is still driving, but stays a bit more like an impersonal observer, prodding the characters to act the part, whatever the scope of the context. I noticed that individual characters tend to develop an individual arc, however small, and that helps enrich the entropy of the chat
For example, if you bring Einstein and Twain in the same room with Spock, you will get metaphoric content from three different minds--in Brutal Recall they simply show up, while Architect/Element prepare the stage, reason about it, completely different experience.
In practical cases this helps because you get multiple points of view converging to a summary--so depending how much individualism you want it to show, Brutal Recall might be better, with more off-the-cuff comments.
What I'm currently looking for is a model with strong reasoning... especially regarding coherence in logic and spatial descriptions (for image prompt QA). I know these are difficult tasks... which is why I search over here 😉
Give the Element5 a try, the source is gated, ask for access.
I have two in the series
Melinoe-Qwen-30B-A3B-Afterthought-qx86-hi,0.546,0.755,0.880,0.734,0.458,0.797,0.710
Qwen3-30B-A3B-Architect7-mxfp4,0.551,0.692,0.876,0.749,0.422,0.794,0.691
Qwen3-30B-A3B-Architect7-qx64-hi,0.561,0.725,0.879,0.753,0.468,0.794,0.686
Qwen3-30B-A3B-Architect7-qx86-hi,0.563,0.737,0.878,0.758,0.448,0.803,0.698
The Architect7 was used here for its stability, and AfterThought... nomen est omen :)
Qwen3-30B-A3B-Element2-mxfp4,0.535,0.665,0.870,0.745,0.446,0.801,0.689
Qwen3-30B-A3B-Element2-qx64-hi,0.573,0.745,0.883,0.755,0.448,0.804,0.693
Qwen3-30B-A3B-Element2-qx86-hi,0.559,0.714,0.881,0.756,0.458,0.805,0.712
Model: Qwen3-30B-A3B-Element2-qx86-hi-mlx
Perplexity: 3.985 ± 0.023
Model: Qwen3-30B-A3B-Element2-qx64-hi-mlx
Perplexity: 4.024 ± 0.024
Model: Qwen3-30B-A3B-Element2-mxfp4-mlx
Perplexity: 4.217 ± 0.025
Qwen3-30B-A3B-Element5-qx86-hi,0.560,0.709,0.883,0.756,0.448,0.807,0.713
Model: Qwen3-30B-A3B-Element5-qx86-hi-mlx
Perplexity: 3.947 ± 0.023
Model: Qwen3-30B-A3B-Element5-qx64-hi-mlx
Perplexity: 3.985 ± 0.023
Model: Qwen3-30B-A3B-Element5-mxfp4-mlx
Perplexity: 4.189 ± 0.025
The only difference is the ratio that I mixed in the bgg1996/Melinoe-Qwen-30B-A3B-Afterthought, 1.6/0.4 vs 1.5/0.5 for Element5, with the effect of a slightly lower perplexity
It's quite interesting to see the needle move during your merges... 🏎️💨
Sometimes--like in this case--the metrics don't tell much of a story.
The one-shot questions in the tests don't explore the full abilities of a model with high arc. It might work great for the first prompt, then a few exchanges later gets bogged down in his own commentary. This is why the extra work to see characters emerge independently, and form their own chains of thought. It is a beautiful sight when you see 2-3 seconds pause in the middle of the inference, then out comes one line that summarizes a day of work. Saves tokens.
For example, the Element/Architect models do virtual memory mapping of characters, tracks their individual context, monitors its own inference(sometimes resulting in arguments on what the user really wants), and self-prompting: this is to say, if the system is confident it knows what you are after, it will act as the user, and continue self-prompting in the same reply until it reaches a satisfactory answer
And yes, there can be no progress without metrics. Everything else is just vibing :)
Hm, I guess something in the "MLX my Repo"-space is broken...
Error: You are trying to access a gated repo. Make sure to have access to it at https://huggingface.co/nightmedia/Qwen3-30B-A3B-Element5. 401 Client Error. (Request ID: Root=1-695a8f67-38819629717e69966e584e71;f3be03c0-0dd4-4c48-8340-4048ceb1b791)
Cannot access gated repo for url https://huggingface.co/nightmedia/Qwen3-30B-A3B-Element5/resolve/main/config.json. Access to model nightmedia/Qwen3-30B-A3B-Element5 is restricted. You must have access to it and be authenticated to access it. Please log in.
Will try later... thanks! As for the characters, etc.: will they be triggered during inference, or is prompting required?
Sorry for being so blunt, had a problem with my space bar, didn't work for some reason... had to shorten my message 😅
yeah they have to fix their gated access issues, Team Radermacher had similar issues with it
I un-gated it, let me know when you are done downloading :)
Oh... YOU did it! And I just thought "I hate browser cache", because now it works! 😂👍
Yeah, I updated the config to 1M too
Done! ✨
...you may close the gates of repo-hell again 😈
I did update the template to make sure it has both of these
{% macro render_extra_keys(json_dict, handled_keys) %}
{%- if json_dict is mapping %}
{%- for json_key in json_dict if json_key not in handled_keys %}
{%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) %}
{{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
{%- else %}
{{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
{%- endif %}
{%- endfor %}
{%- endif %}
{% endmacro %}
{% macro render_item_list(item_list, tag_name='required') %}
{%- if item_list is defined and item_list is iterable and item_list | length > 0 %}
{%- if tag_name %}{{- '\n<' ~ tag_name ~ '>' -}}{% endif %}
{{- '[' }}
{%- for item in item_list -%}
{%- if loop.index > 1 %}{{- ", "}}{% endif -%}
{%- if item is string -%}
{{ "`" ~ item ~ "`" }}
{%- else -%}
{{ item }}
{%- endif -%}
{%- endfor -%}
{{- ']' }}
{%- if tag_name %}{{- '</' ~ tag_name ~ '>' -}}{% endif %}
{%- endif %}
{% endmacro %}
Good Lord, what am I looking at? 😳 Sorry, not sure what's happening there 🙈
you mean the template changes? that's how it keeps track of internal state if it needs to.
A lot of the models "not working" is usually because they don't know how to show you what they know in a way that seems safe for them to call them "logs" :)
Just had a quick chat, and was pleasantly surprised... I didn't have to start with Archimedes, but was understood directly:
"Absolutely! Let's do a spatial and logical audit of your composition to ensure it’s not only visually compelling but also physically plausible. This is crucial for photorealism—even the most beautiful image will break immersion if it violates basic physics or perspective.
We’ll go scene by scene, checking for scale consistency, perspective accuracy, and spatial logic."