Instructions to use openai/gpt-oss-120b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openai/gpt-oss-120b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openai/gpt-oss-120b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")
model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-120b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openai/gpt-oss-120b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openai/gpt-oss-120b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openai/gpt-oss-120b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openai/gpt-oss-120b

SGLang

How to use openai/gpt-oss-120b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openai/gpt-oss-120b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openai/gpt-oss-120b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openai/gpt-oss-120b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openai/gpt-oss-120b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openai/gpt-oss-120b with Docker Model Runner:
```
docker model run hf.co/openai/gpt-oss-120b
```

ImportError: /lib64/libc.so.6: version `GLIBC_2.32' not found

#86

by yueqiren - opened Aug 8, 2025

Discussion

yueqiren

Aug 8, 2025

Hello,

I'm encountering a GLIBC compatibility issue when trying to use vLLM with flash-attention on a cluster system. The error occurs when attempting to import vLLM components that depend on CUDA extensions.
Error Message
ImportError: /lib64/libc.so.6: version GLIBC_2.32' not found

Environment:
GLIBC version: 2.28
Python: 3.12.8
PyTorch: 2.9.0.dev20250804+cu128
CUDA: 12.8

Would it be possible to work around this issue by modifying or rebuilding the dependencies to match the existing GLIBC version?

yangjunxiao2021

Aug 9, 2025

same issue. Are you using Ubuntu 20.04?

shuyuej

Aug 10, 2025

•

edited Aug 10, 2025

Please check my solution in the next post!

shuyuej

Aug 10, 2025

I solved this issue by installing the glibc-2.32 and glibc-2.38 (since it also requires glibc-2.34).

NOTE: my path is /projectnb/vkolagrp/brucejia. Please change it to yours.

For glibc-2.32:

wget -c https://ftp.gnu.org/gnu/glibc/glibc-2.32.tar.gz
tar -zxvf glibc-2.32.tar.gz
cd glibc-2.32
mkdir glibc-build && cd glibc-build
mkdir /projectnb/vkolagrp/brucejia/glibc
../configure --prefix=/projectnb/vkolagrp/brucejia/glibc
make -j"$(nproc)"
make install

For glibc-2.38:

export GLIBC_NEW=/projectnb/vkolagrp/brucejia/glibc-2.38
export SRC=/projectnb/vkolagrp/brucejia/src
mkdir -p "$SRC" && cd "$SRC"

wget -c https://ftp.gnu.org/gnu/glibc/glibc-2.38.tar.xz
tar -xf glibc-2.38.tar.xz
mkdir -p glibc-2.38-build && cd glibc-2.38-build
../glibc-2.38/configure --prefix="$GLIBC_NEW" --disable-werror
make -j"$(nproc)"
make install

Then, load the vllm using commands like these. Please change my path to your own path.

export GLIBC_NEW=/projectnb/vkolagrp/brucejia/glibc-2.38
export CONDA=/projectnb/vkolagrp/brucejia/.conda/envs/new
export GCC_LIBDIR="$(dirname "$(gcc -print-file-name=libstdc++.so.6)")"
export LD_LIBRARY_PATH="$GLIBC_NEW/lib:$CONDA/lib:$GCC_LIBDIR:${CUDA_HOME:+$CUDA_HOME/lib64}:$LD_LIBRARY_PATH"

$GLIBC_NEW/lib/ld-linux-x86-64.so.2 \
  --library-path "$GLIBC_NEW/lib:$CONDA/lib:$GCC_LIBDIR:${CUDA_HOME:+$CUDA_HOME/lib64}:$LD_LIBRARY_PATH" \
  "$CONDA/bin/python" -m vllm.entrypoints.cli.main serve openai/gpt-oss-20b

Best regards,

Shuyue
Aug 10th, 2025

MaxencedlBB

Aug 19, 2025

•

edited Aug 19, 2025

Same issue but the provided solution does not work for me. I am using uv as a python manager.

Update: The issue is fixed. I was just not using the most up to date vllm version. Make sure you are using vllm image with v0.10.1 or higher.

mikodham

Aug 20, 2025

I got this instead
[1;36m(APIServer pid=17315)[0;0m ERROR 08-20 06:57:55 [registry.py:415] subprocess.CalledProcessError: Command '['/[LOCAL_DIRECTORY]/.venv/bin/python', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGSEGV: 11>.
leading to
[1;36m(APIServer pid=17315)[0;0m Value error, Model architectures ['GptOssForCausalLM'] failed to be inspected. Please check the logs for more details. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]

I also made sure I installed the right vllm image v0.10.1, according to https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#quickstart.
And checking which vllmto make sure it points to the right installed vllm
Any other solutions? Thanks!
Aug 20, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment