Instructions to use openai/gpt-oss-120b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/gpt-oss-120b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openai/gpt-oss-120b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b") model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-120b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openai/gpt-oss-120b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openai/gpt-oss-120b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openai/gpt-oss-120b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openai/gpt-oss-120b
- SGLang
How to use openai/gpt-oss-120b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openai/gpt-oss-120b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openai/gpt-oss-120b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openai/gpt-oss-120b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openai/gpt-oss-120b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openai/gpt-oss-120b with Docker Model Runner:
docker model run hf.co/openai/gpt-oss-120b
ImportError: /lib64/libc.so.6: version `GLIBC_2.32' not found
Hello,
I'm encountering a GLIBC compatibility issue when trying to use vLLM with flash-attention on a cluster system. The error occurs when attempting to import vLLM components that depend on CUDA extensions.
Error Message
ImportError: /lib64/libc.so.6: version GLIBC_2.32' not found
Environment:
GLIBC version: 2.28
Python: 3.12.8
PyTorch: 2.9.0.dev20250804+cu128
CUDA: 12.8
Would it be possible to work around this issue by modifying or rebuilding the dependencies to match the existing GLIBC version?
same issue. Are you using Ubuntu 20.04?
Please check my solution in the next post!
I solved this issue by installing the glibc-2.32 and glibc-2.38 (since it also requires glibc-2.34).
NOTE: my path is /projectnb/vkolagrp/brucejia. Please change it to yours.
For glibc-2.32:
wget -c https://ftp.gnu.org/gnu/glibc/glibc-2.32.tar.gz
tar -zxvf glibc-2.32.tar.gz
cd glibc-2.32
mkdir glibc-build && cd glibc-build
mkdir /projectnb/vkolagrp/brucejia/glibc
../configure --prefix=/projectnb/vkolagrp/brucejia/glibc
make -j"$(nproc)"
make install
For glibc-2.38:
export GLIBC_NEW=/projectnb/vkolagrp/brucejia/glibc-2.38
export SRC=/projectnb/vkolagrp/brucejia/src
mkdir -p "$SRC" && cd "$SRC"
wget -c https://ftp.gnu.org/gnu/glibc/glibc-2.38.tar.xz
tar -xf glibc-2.38.tar.xz
mkdir -p glibc-2.38-build && cd glibc-2.38-build
../glibc-2.38/configure --prefix="$GLIBC_NEW" --disable-werror
make -j"$(nproc)"
make install
Then, load the vllm using commands like these. Please change my path to your own path.
export GLIBC_NEW=/projectnb/vkolagrp/brucejia/glibc-2.38
export CONDA=/projectnb/vkolagrp/brucejia/.conda/envs/new
export GCC_LIBDIR="$(dirname "$(gcc -print-file-name=libstdc++.so.6)")"
export LD_LIBRARY_PATH="$GLIBC_NEW/lib:$CONDA/lib:$GCC_LIBDIR:${CUDA_HOME:+$CUDA_HOME/lib64}:$LD_LIBRARY_PATH"
$GLIBC_NEW/lib/ld-linux-x86-64.so.2 \
--library-path "$GLIBC_NEW/lib:$CONDA/lib:$GCC_LIBDIR:${CUDA_HOME:+$CUDA_HOME/lib64}:$LD_LIBRARY_PATH" \
"$CONDA/bin/python" -m vllm.entrypoints.cli.main serve openai/gpt-oss-20b
Best regards,
Shuyue
Aug 10th, 2025
Same issue but the provided solution does not work for me. I am using uv as a python manager.
Update: The issue is fixed. I was just not using the most up to date vllm version. Make sure you are using vllm image with v0.10.1 or higher.
I got this instead[1;36m(APIServer pid=17315)[0;0m ERROR 08-20 06:57:55 [registry.py:415] subprocess.CalledProcessError: Command '['/[LOCAL_DIRECTORY]/.venv/bin/python', '-m', 'vllm.model_executor.models.registry']' died with <Signals.SIGSEGV: 11>.
leading to[1;36m(APIServer pid=17315)[0;0m Value error, Model architectures ['GptOssForCausalLM'] failed to be inspected. Please check the logs for more details. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
I also made sure I installed the right vllm image v0.10.1, according to https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#quickstart.
And checking which vllmto make sure it points to the right installed vllm
Any other solutions? Thanks!
Aug 20, 2025