Instructions to use Lorenzob/synch-2-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Lorenzob/synch-2-merged with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Lorenzob/synch-2-merged", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Lorenzob/synch-2-merged", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Lorenzob/synch-2-merged", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Lorenzob/synch-2-merged with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Lorenzob/synch-2-merged" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lorenzob/synch-2-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Lorenzob/synch-2-merged
- SGLang
How to use Lorenzob/synch-2-merged with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Lorenzob/synch-2-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lorenzob/synch-2-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Lorenzob/synch-2-merged" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Lorenzob/synch-2-merged", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Lorenzob/synch-2-merged with Docker Model Runner:
docker model run hf.co/Lorenzob/synch-2-merged
You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
Accept the Apache-2.0 license and the NVIDIA Nemotron-3 base license. The model embeds Lorenzo Bernardini's Fractal-RL / THESIA research; cite arxiv:2503.01307 for cognitive-behaviors methodology.
Log in or Sign Up to review the conditions and access this model content.
Lorenzob/synch-2-merged
Full merged model: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 (120B MoE · 12B active) +
Lorenzob/synch-2 (v11 APOGEO LoRA · best reward +0.200).
Drop-in compatibile con: HF Dedicated Endpoints, TGI, vLLM, HF Inference API, Together AI, Modal, RunPod, Replicate.
Quickstart · Dedicated Endpoint
from huggingface_hub import InferenceClient
client = InferenceClient("Lorenzob/synch-2-merged", token="<HF_TOKEN>")
out = client.chat_completion(
messages=[
{"role": "user",
"content": "Compute the Ollivier-Ricci curvature of K_5."},
],
max_tokens=512, temperature=0.7,
)
print(out.choices[0].message.content)
Quickstart · TGI (Text Generation Inference)
docker run --gpus all --shm-size 1g -p 8080:80 \
-v $PWD/data:/data \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id Lorenzob/synch-2-merged --trust-remote-code \
--num-shard 4 --max-input-length 4096 --max-total-tokens 8192
Quickstart · vLLM
python -m vllm.entrypoints.openai.api_server \
--model Lorenzob/synch-2-merged --trust-remote-code \
--dtype bfloat16 --tensor-parallel-size 4 \
--max-model-len 8192
Quickstart · Locale (transformers full-weights)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained(
"Lorenzob/synch-2-merged", trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
"Lorenzob/synch-2-merged",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
Suggested Hardware
AWS · 4x H100 80GB (consigliato) — accetta 8x A100 80GB — pesi BF16 totali ~245 GB,
50 shard model-XXXXX-of-00050.safetensors.
Merge Details
- Base: NVIDIA Nemotron-3-Super-120B-A12B-BF16 (120B MoE, 12B active)
- Adapter:
Lorenzob/synch-2(v11 APOGEO, best reward +0.200) - LoRA rank: 64 · LoRA alpha: 32
- Merge type: weight addition (peft
merge_and_unload) - Attention impl: SDPA (default), FlashAttention-2 supported
Governance
Vedi i documenti dedicati nel repo:
bias.md— bias analysissafety.md— safety considerationsprivacy.md— privacy implicationsexplainability.md— interpretability notesaccuracy_chart.png— eval results visual
Attribution
- Cognitive behaviors: Gandhi et al. 2025 (arXiv:2503.01307)
- Self-improving reasoner: karpathy/nanochat
- Fractal RL · LCTR · THESIA: Lorenzo Bernardini publications.
License
Apache-2.0 (this merged model) · NVIDIA Nemotron-3 license applies to the underlying base weights distribution.
- Downloads last month
- 370
Model tree for Lorenzob/synch-2-merged
Paper for Lorenzob/synch-2-merged
Evaluation results
- Best Fractal-RL Composite Reward on Synch2 Internal Eval (private)self-reported0.200