Novaciano
AI & ML interests
Recent Activity
Organizations
@juiceb0xc0de First of all, thank you for replying; let me tell you that... in fact, your same idea has crossed my mind more than once. That's why I found it... interesting.
I apologize for taking so long to reply... I was trying to figure out how to organize the answer in the best possible way.
Regarding your question... let me tell you that... yes, I've noticed that difference, and your theory has a lot of merit.
Training on an already abliterated model (Heretic or another) usually yields faster and more direct results in uncensoring... but with a noticeable drop in overall quality: more hallucinations, less coherence in long roleplays, more fragile thinking, and a feeling of a "flat" or lobotomized personality.
The abliteration removes the rejection mechanism... but it also damages important vectors that the model used for reasoning and maintaining style uses.
In fact... the approach you mention (SFT first on a clean base + abliteration/Heretic afterwards) tends to give better results in most cases I've tested, especially for roleplay and specific voice styles.
But... it's not a hard and fast rule. It depends a lot on the base model, the quality and size of the SFT dataset, and... how aggressive the abliteration is. Sometimes early abliteration followed by fine-tuning recovers quite a bit (even improving in certain NSFW styles)... but in long roleplays with consistent voice acting, the SFT first + abliteration later sequence usually wins in subjective quality.
However... and this might interest you, I discovered... that parts lost or damaged by Heretic/Abliteration can be partially recovered through subsequent merging. I've noticed that by performing a strategic merge after abliteration (especially using layers or components from a sound base model or a well-executed previous SFT), damaged vectors can be "repaired" or replaced... without reintroducing strong refusals.
The end result is usually a model that is clearly superior to one that... was simply abliterated and trained on top of it without any repairs.
With the right donor model and mixing method... that's right, because otherwise you run the risk of it becoming lobotomized, brain-dead.
Personally... in my recent merges with Llama-3.2-1B and Gemma3 1B, I'm increasingly leaning towards... the second approach (SFT → soft abliteration → corrective merge). It's more work... and not always successful... but the final model feels less "lobotomized" and, at the same time, maintains better voice and coherence.
I don't want to sound arrogant, but... with a proper mix and adequate repair, a better version than the base model could emerge... even... smarter. I noticed this when I was working with TinyLlama... a very... let's say... limited model, to say the least.
Have you tried any repair merges after abliteration?
Excellent...