Instructions to use CohereLabs/aya-23-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CohereLabs/aya-23-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="CohereLabs/aya-23-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CohereLabs/aya-23-8B") model = AutoModelForCausalLM.from_pretrained("CohereLabs/aya-23-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use CohereLabs/aya-23-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CohereLabs/aya-23-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CohereLabs/aya-23-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/CohereLabs/aya-23-8B
- SGLang
How to use CohereLabs/aya-23-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CohereLabs/aya-23-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CohereLabs/aya-23-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CohereLabs/aya-23-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CohereLabs/aya-23-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use CohereLabs/aya-23-8B with Docker Model Runner:
docker model run hf.co/CohereLabs/aya-23-8B
Seems can not use gguf file with response_format setting.
#5
by svjack - opened
llm = llama_cpp.Llama.from_pretrained(
repo_id="bartowski/aya-23-8B-GGUF",
filename="aya-23-8B-Q4_K_M.gguf",
#tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained("CohereForAI/aya-23-8B"),
verbose=False,
n_gpu_layers = -1,
n_ctx = 3060 * 3
)
prompt = '''
将下面的json内容翻译成中文,并保留相应的json格式:
{'problem_description': "Two space agencies, Galactic Explorations and Interstellar Missions, are discussing the potential of Planet X-31 for human colonization. Galactic Explorations claims that Planet X-31 is an ideal candidate due to its Earth-like atmosphere and abundant water resources. Interstellar Missions, however, argues that Planet X-31 is not suitable for colonization because of its high levels of radiation, which they claim would make it impossible for humans to survive there. Galactic Explorations counters this argument by stating that humans could develop technology to shield themselves from radiation in the future. Which statement best describes the fallacy in Galactic Explorations' argument?", 'additional_problem_info': "A) The fallacy is that Galactic Explorations assumes humans can develop technology to shield themselves from radiation without any evidence. \nB) The fallacy is that Interstellar Missions is incorrect about the high levels of radiation on Planet X-31. \nC) The fallacy is that Galactic Explorations believes Planet X-31 is the only planet suitable for human colonization. \nD) The fallacy is that Interstellar Missions doesn't believe in the potential of human technological advancements.", 'chain_of_thought': "Galactic Explorations' argument assumes that humans will be able to develop technology to shield themselves from radiation in the future. However, there is no evidence presented in the problem description to support this claim. Therefore, their argument contains a fallacy.", 'correct_solution': 'A) The fallacy is that Galactic Explorations assumes humans can develop technology to shield themselves from radiation without any evidence.'}
'''
from IPython.display import clear_output
messages = [
{
"role": "user",
"content": prompt
}
]
response = llm.create_chat_completion(
messages=messages,
response_format={
"type": "json_object",
"schema": {
"type": "object",
"properties": {
"problem_description": {"type": "string"},
"additional_problem_info": {"type": "string"},
"chain_of_thought": {"type": "string"},
"correct_solution": {"type": "string"},
},
"required": ["problem_description", "additional_problem_info", "chain_of_thought", "correct_solution"],
}
},
stream=True,
)
req = ""
for chunk in response:
delta = chunk["choices"][0]["delta"]
if "content" not in delta:
continue
#print(delta["content"], end="", flush=True)
req += delta["content"]
clear_output(wait = True)
print(req)
when I run this, python kernel died.
Can someone help me ?😊
svjack changed discussion status to closed
svjack changed discussion status to open
You cannot do that with Llama cpp and you should do it with ONNX.
It is based on a T5 transformer.
alexrs changed discussion status to closed