Instructions to use quantumaikr/llama-2-70b-fb16-korean with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use quantumaikr/llama-2-70b-fb16-korean with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="quantumaikr/llama-2-70b-fb16-korean")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("quantumaikr/llama-2-70b-fb16-korean")
model = AutoModelForCausalLM.from_pretrained("quantumaikr/llama-2-70b-fb16-korean")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use quantumaikr/llama-2-70b-fb16-korean with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "quantumaikr/llama-2-70b-fb16-korean"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "quantumaikr/llama-2-70b-fb16-korean",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/quantumaikr/llama-2-70b-fb16-korean

SGLang

How to use quantumaikr/llama-2-70b-fb16-korean with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "quantumaikr/llama-2-70b-fb16-korean" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "quantumaikr/llama-2-70b-fb16-korean",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "quantumaikr/llama-2-70b-fb16-korean" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "quantumaikr/llama-2-70b-fb16-korean",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use quantumaikr/llama-2-70b-fb16-korean with Docker Model Runner:
```
docker model run hf.co/quantumaikr/llama-2-70b-fb16-korean
```

KoreanLM icon

quantumaikr/llama-2-70b-fb16-korean

Model Description

quantumaikr/llama-2-70b-fb16-korean is a Llama2 70B model finetuned the Korean Dataset

Usage

Start chatting with quantumaikr/llama-2-70b-fb16-korean using the following code snippet:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("quantumaikr/llama-2-70b-fb16-korean")
model = AutoModelForCausalLM.from_pretrained("quantumaikr/llama-2-70b-fb16-korean", torch_dtype=torch.float16, device_map="auto")

system_prompt = "### System:\n귀하는 지시를 매우 잘 따르는 AI인 QuantumLM입니다. 최대한 많이 도와주세요. 안전에 유의하고 불법적인 행동은 하지 마세요.\n\n"

message = "인공지능이란 무엇인가요?"
prompt = f"{system_prompt}### User: {message}\n\n### Assistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, do_sample=True, temperature=0.9, top_p=0.75, max_new_tokens=4096)

print(tokenizer.decode(output[0], skip_special_tokens=True))

QuantumLM should be used with this prompt format:

### System:
This is a system prompt, please behave and help the user.

### User:
Your prompt here

### Assistant
The output of QuantumLM

Use and Limitations

Intended Use

These models are intended for research only, in adherence with the CC BY-NC-4.0 license.

Limitations and bias

Although the aforementioned dataset helps to steer the base language models into "safer" distributions of text, not all biases and toxicity can be mitigated through fine-tuning. We ask that users be mindful of such potential issues that can arise in generated responses. Do not treat model outputs as substitutes for human judgment or as sources of truth. Please use it responsibly.

Downloads last month: 240

Safetensors

Model size

69B params

Tensor type

F16

Model tree for quantumaikr/llama-2-70b-fb16-korean

Adapters

4 models

quantumaikr
/

llama-2-70b-fb16-korean