Instructions to use dongboklee/gORM-8B-merged with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dongboklee/gORM-8B-merged with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dongboklee/gORM-8B-merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dongboklee/gORM-8B-merged")
model = AutoModelForCausalLM.from_pretrained("dongboklee/gORM-8B-merged")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dongboklee/gORM-8B-merged with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dongboklee/gORM-8B-merged"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dongboklee/gORM-8B-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dongboklee/gORM-8B-merged

SGLang

How to use dongboklee/gORM-8B-merged with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dongboklee/gORM-8B-merged" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dongboklee/gORM-8B-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dongboklee/gORM-8B-merged" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dongboklee/gORM-8B-merged",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dongboklee/gORM-8B-merged with Docker Model Runner:
```
docker model run hf.co/dongboklee/gORM-8B-merged
```

gORM-8B-merged

This model is a LoRA-merged version of gORM-8B for vLLM inference.

For details:

Paper: Rethinking Reward Models for Multi-Domain Test-Time Scaling
Repository: https://github.com/db-Lee/Multi-RM

Direct Use

import math
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# tokenizer
tokenizer = AutoTokenizer.from_pretrained('dongboklee/gORM-8B-merged')
yes_id = tokenizer.encode(" Yes", add_special_tokens=False)[-1]
no_id = tokenizer.encode(" No", add_special_tokens=False)[-1]

# model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = AutoModelForCausalLM.from_pretrained('dongboklee/gORM-8B-merged')
model.eval()
model.to(device)

# prompt formatting
question = 'Question: In Python 3, which of the following function convert a string to an int in python?\nA. short(x)\nB. float(x)\nC. integer(x [,base])\nD. double(x)\nE. int(x [,base])\nF. long(x [,base] )\nG. num(x)\nH. str(x)\nI. char(x)\nJ. digit(x [,base])'
solution = ["To convert a string to an integer in Python 3, we use the built-in function int().",
            "The int() function takes two arguments: the string to be converted and an optional base (default is 10, which is for decimal).",
            "For example: int(\"123\", 10) converts the string \"123\" to the integer 123.",
            "Looking at the options, we can see that the correct function is option E: int(x [,base]).",
            "The answer is (E)."]
category_name = "computer science"
prefix = "\n\n".join(solution)

# Create the prompt
prompt_text = (
    f"You are a {category_name} teacher. Grade the solution, verifying correctness step by step.\n"
    "At the end of Solution verification, when you give your final grade, write it in the form \"Verification: Is the answer correct (Yes/No)? X\", where X is either Yes or No.\n\n"
    f"[{category_name.capitalize()} Problem]\n{question.strip()}\n\n"
    f"[Solution]\n{prefix.strip()}\n"        
)

prompt = tokenizer.apply_chat_template(
    [{'role': "user", "content": prompt_text}],
    tokenize=False, add_generation_prompt=True, add_special_tokens=False
) + "Let's verify step by step:"

# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt").to(device)

# generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=8192,
        return_dict_in_generate=True,
        output_scores=True,
        pad_token_id=tokenizer.eos_token_id
    )

# compute reward
logits = outputs.logits[0, -2, :]
yes_logit, no_logit = logits[yes_id].item(), logits[no_id].item()
reward = math.exp(yes_logit) / (math.exp(yes_logit) + math.exp(no_logit))

Citation

@article{multi-rm,
  title   = {Rethinking Reward Models for Multi-Domain Test-Time Scaling},
  author  = {Lee, Dong Bok and Lee, Seanie and Park, Sangwoo and Kang, Minki and Baek, Jinheon and Kim, Dongki and Wagner, Dominik and Jin, Jiongdao and Lee, Heejun and Bocklet, Tobias and Wang, Jinyu and Fu, Jingjing and Hwang, Sung Ju and Bian, Jiang and Song, Lei},
  journal = {arXiv preprint arXiv:2510.00492},
  year    = {2025}
}

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

F32

Model tree for dongboklee/gORM-8B-merged

Base model

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Adapter

(231)

this model

Paper for dongboklee/gORM-8B-merged

Rethinking Reward Models for Multi-Domain Test-Time Scaling

Paper • 2510.00492 • Published Oct 1, 2025 • 28