Instructions to use Motif-Technologies/Motif-2-12.7B-Reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Motif-Technologies/Motif-2-12.7B-Reasoning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Motif-Technologies/Motif-2-12.7B-Reasoning", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Motif-Technologies/Motif-2-12.7B-Reasoning", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Motif-Technologies/Motif-2-12.7B-Reasoning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Motif-Technologies/Motif-2-12.7B-Reasoning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Motif-Technologies/Motif-2-12.7B-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Motif-Technologies/Motif-2-12.7B-Reasoning

SGLang

How to use Motif-Technologies/Motif-2-12.7B-Reasoning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Motif-Technologies/Motif-2-12.7B-Reasoning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Motif-Technologies/Motif-2-12.7B-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Motif-Technologies/Motif-2-12.7B-Reasoning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Motif-Technologies/Motif-2-12.7B-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Motif-Technologies/Motif-2-12.7B-Reasoning with Docker Model Runner:
```
docker model run hf.co/Motif-Technologies/Motif-2-12.7B-Reasoning
```

add logitprocessor

by leejunhyeok - opened Dec 12, 2025

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

+132

-1

add logitprocessorbc61b509

leejunhyeok

Motif Technologies org Dec 12, 2025

No description provided.

dongseokmotif

Motif Technologies org Dec 12, 2025

•

edited Dec 12, 2025

리드미업데이트도 같이 부탁드립니다 (vllm, parser 사용법)

Update README.md51978884

Update README.md47d28325

Update README.md3bcafecb

dongseokmotif

Motif Technologies org Dec 12, 2025

vllm serve할때도 yarn (scale factor 2, max len 131072) 수정부탁드립니다

remove unused dependency in logitprocbd52fd36

Update README.md73e65dd8

TaehyunKimMotif

Motif Technologies org Dec 12, 2025

constant 들은 하드코딩 보다는 의미를 알수있게 변수화를 하는게 좋을것 같습니다
ex.
ngrams = [tuple(input_ids[i:i+n]) for i in range(0, len(input_ids) - n + 1, 256)]
freq = Counter(ngrams)
return {ng: c for ng, c in freq.items() if c > 7}

256 : search_window
7 :freq_threshold

Update logit_processors/logit_.pyf19aee39

TaehyunKimMotif

Motif Technologies org Dec 12, 2025

•

edited Dec 12, 2025

ThinkLogitsProcessor
에서 ratio 는 사용되는곳이 없는것 같은데 필요한곳이 있나요?

pr 이 잘려서 보였네요 ㅋㅋ

TaehyunKimMotif

Motif Technologies org Dec 12, 2025

logits = torch.full_like(logits, torch.finfo(torch.bfloat16).min)
logits 가 무조건 bf16 이라고 하더라도 logits 의 dtype 의 min 을 가져오는게 좋아보이네요

TaehyunKimMotif

Motif Technologies org Dec 12, 2025

past_token_ids 가 어떤 형태로 들어오나요?
geneation token 이 계속 concat 되는 형태라면

ngrams = [tuple(input_ids[i:i+n]) for i in range(0, len(input_ids) - n + 1, WINDOW_SIZE)]
중복검사가 많아보이는데 시작을 0 에서부터 안해도 되지 않나 싶습니다

TaehyunKimMotif

Motif Technologies org Dec 12, 2025

•

edited Dec 12, 2025

정확히 이해한게 맞는지는 모르겠지만
ratio 랑 ngram 은 independent 한 관계로 보이는데 맞을까요?
budget 이 남지 않으면 ngram 이랑 무관하게 think_end 를 시켜줘야할것 같은데
그렇다면 ratio check 를 먼저한후에 remaining budget 이 있다면, len(past_token_ids) % self.interval == 0 일때 ngram check 를 해주는게 나아보입니다

TaehyunKimMotif

Motif Technologies org Dec 12, 2025

•

edited Dec 12, 2025

ratio 도 logit processor 에서 자주 사용되는 개념일까요?
만약 아니라면 README 에 ratio 가 어떤 개념인지 설명이 있으면 좋을것 같습니다
자주 사용되는 개념이라도 외부에서 제어가능한 변수이기 때문에 README 에 설명이 있는게 좋아보이긴 하구요 ㅋㅋ