Instructions to use Qwen/Qwen2.5-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/Qwen2.5-7B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Qwen/Qwen2.5-7B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Qwen/Qwen2.5-7B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/Qwen2.5-7B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2.5-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/Qwen2.5-7B-Instruct

SGLang

How to use Qwen/Qwen2.5-7B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/Qwen2.5-7B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2.5-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/Qwen2.5-7B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen2.5-7B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Qwen/Qwen2.5-7B-Instruct with Docker Model Runner:
```
docker model run hf.co/Qwen/Qwen2.5-7B-Instruct
```

能做文本的embedding吗？

by dingguofeng - opened Sep 23, 2024

Discussion

dingguofeng

Sep 23, 2024

请问我该如何送一个文本进去，让模型输出他的embedding？

jklj077

Qwen org Sep 23, 2024

not supported

dingguofeng

Sep 23, 2024

not supported

QAQ～

wallacehuang

Sep 26, 2024

•

edited Sep 26, 2024

没记错的话，应该是可以反推LLM的embeeding模型

dingguofeng

Sep 26, 2024

没记错的话，应该是可以反推LLM的embeeding模型

你好，我是觉得如果能拿到embedding可以做其他很多事情，请问一下该如何反推呢？

dingguofeng

Oct 8, 2024

This comment has been hidden

ljy666666

Mar 10, 2025

请问解决了吗

dingguofeng

Mar 11, 2025

请问解决了吗

没有解决，好像就是不行

ljy666666

Mar 11, 2025

请问解决了吗

没有解决，好像就是不行

你说的把文字输进去，然后取llm里面的hidden states吗？

wallacehuang

Mar 11, 2025

应该是指 token向量化。我之前弄错了，反推embedding的难度有点大，很难实现。

我看qwen有提供embedding的api，不知道能不能满足你的需求：
import os
from openai import OpenAI

client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # 如果您没有配置环境变量，请在此处用您的API Key进行替换
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1" # 百炼服务的base_url
)

completion = client.embeddings.create(
model="text-embedding-v3",
input='The clothes are of good quality and look good, definitely worth the wait. I love them.',
dimensions=1024,
encoding_format="float"
)

print(completion.model_dump_json())

ljy666666

Mar 11, 2025

应该是指 token向量化。我之前弄错了，反推embedding的难度有点大，很难实现。

我看qwen有提供embedding的api，不知道能不能满足你的需求：
import os
from openai import OpenAI

client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # 如果您没有配置环境变量，请在此处用您的API Key进行替换
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1" # 百炼服务的base_url
)

completion = client.embeddings.create(
model="text-embedding-v3",
input='The clothes are of good quality and look good, definitely worth the wait. I love them.',
dimensions=1024,
encoding_format="float"
)

print(completion.model_dump_json())

Qwen第一层不就是 embed_tokens层，这个就是把输入的token转成emb。这层的输出是不是你想要的那个

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment