Instructions to use Motif-Technologies/Motif-2-12.7B-Reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Motif-Technologies/Motif-2-12.7B-Reasoning with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Motif-Technologies/Motif-2-12.7B-Reasoning", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Motif-Technologies/Motif-2-12.7B-Reasoning", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Motif-Technologies/Motif-2-12.7B-Reasoning with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Motif-Technologies/Motif-2-12.7B-Reasoning" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Motif-Technologies/Motif-2-12.7B-Reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Motif-Technologies/Motif-2-12.7B-Reasoning
- SGLang
How to use Motif-Technologies/Motif-2-12.7B-Reasoning with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Motif-Technologies/Motif-2-12.7B-Reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Motif-Technologies/Motif-2-12.7B-Reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Motif-Technologies/Motif-2-12.7B-Reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Motif-Technologies/Motif-2-12.7B-Reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Motif-Technologies/Motif-2-12.7B-Reasoning with Docker Model Runner:
docker model run hf.co/Motif-Technologies/Motif-2-12.7B-Reasoning
add logitprocessor
리λλ―Έμ λ°μ΄νΈλ κ°μ΄ λΆνλ립λλ€ (vllm, parser μ¬μ©λ²)
vllm serveν λλ yarn (scale factor 2, max len 131072) μμ λΆνλ립λλ€
constant λ€μ νλμ½λ© 보λ€λ μλ―Έλ₯Ό μμμκ² λ³μνλ₯Ό νλκ² μ’μκ² κ°μ΅λλ€
ex.
ngrams = [tuple(input_ids[i:i+n]) for i in range(0, len(input_ids) - n + 1, 256)]
freq = Counter(ngrams)
return {ng: c for ng, c in freq.items() if c > 7}
256 : search_window
7 :freq_threshold
ThinkLogitsProcessor
μμ ratio λ μ¬μ©λλκ³³μ΄ μλκ² κ°μλ° νμνκ³³μ΄ μλμ?
pr μ΄ μλ €μ 보μλ€μ γ γ
logits = torch.full_like(logits, torch.finfo(torch.bfloat16).min)
logits κ° λ¬΄μ‘°κ±΄ bf16 μ΄λΌκ³ νλλΌλ logits μ dtype μ min μ κ°μ Έμ€λκ² μ’μ보μ΄λ€μ
past_token_ids κ° μ΄λ€ ννλ‘ λ€μ΄μ€λμ?
geneation token μ΄ κ³μ concat λλ ννλΌλ©΄
ngrams = [tuple(input_ids[i:i+n]) for i in range(0, len(input_ids) - n + 1, WINDOW_SIZE)]
μ€λ³΅κ²μ¬κ° λ§μ보μ΄λλ° μμμ 0 μμλΆν° μν΄λ λμ§ μλ μΆμ΅λλ€
μ νν μ΄ν΄νκ² λ§λμ§λ λͺ¨λ₯΄κ² μ§λ§
ratio λ ngram μ independent ν κ΄κ³λ‘ 보μ΄λλ° λ§μκΉμ?
budget μ΄ λ¨μ§ μμΌλ©΄ ngram μ΄λ 무κ΄νκ² think_end λ₯Ό μμΌμ€μΌν κ² κ°μλ°
κ·Έλ λ€λ©΄ ratio check λ₯Ό λ¨Όμ ννμ remaining budget μ΄ μλ€λ©΄, len(past_token_ids) % self.interval == 0 μΌλ ngram check λ₯Ό ν΄μ£Όλκ² λμ보μ
λλ€
ratio λ logit processor μμ μμ£Ό μ¬μ©λλ κ°λ
μΌκΉμ?
λ§μ½ μλλΌλ©΄ README μ ratio κ° μ΄λ€ κ°λ
μΈμ§ μ€λͺ
μ΄ μμΌλ©΄ μ’μκ² κ°μ΅λλ€
μμ£Ό μ¬μ©λλ κ°λ
μ΄λΌλ μΈλΆμμ μ μ΄κ°λ₯ν λ³μμ΄κΈ° λλ¬Έμ README μ μ€λͺ
μ΄ μλκ² μ’μ보μ΄κΈ΄ νꡬμ γ
γ
- ratio 보λ€λ thinking_ratio κ°μ μλ―Έλ₯Ό μ’λ μμμμλ λ³μλͺ μ΄λ©΄ λ μ’μ보μ λλ€ γ γ
μ μ μ£Όμ λ΄μ©λ€μ μμ νμμ΅λλ€.
LGTM
LGTM