Instructions to use google/gemma-2-2b-jpn-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use google/gemma-2-2b-jpn-it with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="google/gemma-2-2b-jpn-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-2b-jpn-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2-2b-jpn-it")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use google/gemma-2-2b-jpn-it with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "google/gemma-2-2b-jpn-it"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-2b-jpn-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/google/gemma-2-2b-jpn-it

SGLang

How to use google/gemma-2-2b-jpn-it with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "google/gemma-2-2b-jpn-it" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-2b-jpn-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "google/gemma-2-2b-jpn-it" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "google/gemma-2-2b-jpn-it",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use google/gemma-2-2b-jpn-it with Docker Model Runner:
```
docker model run hf.co/google/gemma-2-2b-jpn-it
```

hidden_act is missing　config.json

by s-natsu - opened Oct 16, 2024

Discussion

s-natsu

Oct 16, 2024

thank you for great work!

diff /gemma-2-2b-jpn-it/config.json /gemma-2-2b-it/config.json 
10,11c10,13
<   "dtype": "bfloat16",
<   "eos_token_id": 1,
---
>   "eos_token_id": [
>     1,
>     107
>   ],
13a16
>   "hidden_act": "gelu_pytorch_tanh",
24c27
<   "query_pre_attn_scalar": 224,
---
>   "query_pre_attn_scalar": 256,
29c32
<   "transformers_version": "4.44.2",
---
>   "transformers_version": "4.42.4",

"hidden_act": "gelu_pytorch_tanh", cause problem.
I try to serve gemma-2-2b-jpn-it with vLLM, it raise Error.

   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gemma2.py", line 421, in __init__
     self.model = Gemma2Model(config, cache_config, quant_config)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gemma2.py", line 265, in __init__
     self.start_layer, self.end_layer, self.layers = make_layers(
                                                     ^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 408, in make_layers
     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gemma2.py", line 267, in <lambda>
     lambda prefix: Gemma2DecoderLayer(int(prefix.split(".")[
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gemma2.py", line 200, in __init__
     hidden_act=config.hidden_act,
                ^^^^^^^^^^^^^^^^^
   File "/usr/local/lib/python3.12/dist-packages/transformers/configuration_utils.py", line 202, in __getattribute__
     return super().__getattribute__(key)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 AttributeError: 'Gemma2Config' object has no attribute 'hidden_act'. Did you mean: 'hidden_size'?

I copy and paste hidden_act line from /gemma-2-2b-it/config.json it works.

s-natsu changed discussion title from config,json diff from gemma-2-2b-it to hidden_act is missing　config.json Oct 16, 2024

GopiUppari

Google org Oct 17, 2024

Hi @s-natsu ,

hidden_act is a legacy or deprecated parameter in some configurations, but it exists for backward compatibility with older versions of models or configurations. This parameter is overwritten by hidden_activation.
hidden_act and hidden_size are different. Because hidden_act defines defines the non-linear activation function used in the model and hidden_size defines dimensionality of the hidden layers.

For further information, could you please refer to this reference

Thank you.

tmkyd

Nov 5, 2024

Hi, @GopiUppari

As shown in the code below, we are passing config.hidden_act to the constructor of Gemma2MLP, but hidden_act is not defined in the config.json of google/gemma-2-2b-jpn-it.
https://github.com/vllm-project/vllm/blob/ad23318928d40ef7ac969451afa0dc198428c04b/vllm/model_executor/models/gemma2.py#L202

In other Gemma2 models, such as google/gemma-2-2b-it, hidden_act is defined in config.json, so no error occurs in vLLM. In this case, should we correct vLLM, or is it more appropriate to modify the model’s config.json?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

hidden_act is missing config.json

hidden_act is missing　config.json