Instructions to use Lorenzob/synch-2-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Lorenzob/synch-2-merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Lorenzob/synch-2-merged", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Lorenzob/synch-2-merged", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Lorenzob/synch-2-merged", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Lorenzob/synch-2-merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Lorenzob/synch-2-merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lorenzob/synch-2-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Lorenzob/synch-2-merged

SGLang

How to use Lorenzob/synch-2-merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Lorenzob/synch-2-merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lorenzob/synch-2-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Lorenzob/synch-2-merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Lorenzob/synch-2-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Lorenzob/synch-2-merged with Docker Model Runner:
```
docker model run hf.co/Lorenzob/synch-2-merged
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Accept the Apache-2.0 license and the NVIDIA Nemotron-3 base license. The model embeds Lorenzo Bernardini's Fractal-RL / THESIA research; cite arxiv:2503.01307 for cognitive-behaviors methodology.

Lorenzob/synch-2-merged

Full merged model: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 (120B MoE · 12B active) + Lorenzob/synch-2 (v11 APOGEO LoRA · best reward +0.200).

Drop-in compatibile con: HF Dedicated Endpoints, TGI, vLLM, HF Inference API, Together AI, Modal, RunPod, Replicate.

Quickstart · Dedicated Endpoint

from huggingface_hub import InferenceClient
client = InferenceClient("Lorenzob/synch-2-merged", token="<HF_TOKEN>")
out = client.chat_completion(
    messages=[
        {"role": "user",
         "content": "Compute the Ollivier-Ricci curvature of K_5."},
    ],
    max_tokens=512, temperature=0.7,
)
print(out.choices[0].message.content)

Quickstart · TGI (Text Generation Inference)

docker run --gpus all --shm-size 1g -p 8080:80 \
    -v $PWD/data:/data \
    ghcr.io/huggingface/text-generation-inference:latest \
    --model-id Lorenzob/synch-2-merged --trust-remote-code \
    --num-shard 4 --max-input-length 4096 --max-total-tokens 8192

Quickstart · vLLM

python -m vllm.entrypoints.openai.api_server \
    --model Lorenzob/synch-2-merged --trust-remote-code \
    --dtype bfloat16 --tensor-parallel-size 4 \
    --max-model-len 8192

Quickstart · Locale (transformers full-weights)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained(
    "Lorenzob/synch-2-merged", trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "Lorenzob/synch-2-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

Suggested Hardware

AWS · 4x H100 80GB (consigliato) — accetta 8x A100 80GB — pesi BF16 totali ~245 GB, 50 shard model-XXXXX-of-00050.safetensors.

Merge Details

Base: NVIDIA Nemotron-3-Super-120B-A12B-BF16 (120B MoE, 12B active)
Adapter: Lorenzob/synch-2 (v11 APOGEO, best reward +0.200)
LoRA rank: 64 · LoRA alpha: 32
Merge type: weight addition (peft merge_and_unload)
Attention impl: SDPA (default), FlashAttention-2 supported

Governance

Vedi i documenti dedicati nel repo:

bias.md — bias analysis
safety.md — safety considerations
privacy.md — privacy implications
explainability.md — interpretability notes
accuracy_chart.png — eval results visual

Attribution

Cognitive behaviors: Gandhi et al. 2025 (arXiv:2503.01307)
Self-improving reasoner: karpathy/nanochat
Fractal RL · LCTR · THESIA: Lorenzo Bernardini publications.

License

Apache-2.0 (this merged model) · NVIDIA Nemotron-3 license applies to the underlying base weights distribution.

Downloads last month: 370

Safetensors

Model size

124B params

Tensor type

F32

BF16

Model tree for Lorenzob/synch-2-merged

Base model

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

Finetuned

(15)

this model

Paper for Lorenzob/synch-2-merged

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3, 2025 • 39

Evaluation results

Best Fractal-RL Composite Reward on Synch2 Internal Eval (private)
self-reported

0.200