Instructions to use iRASC/Meerkat-Ko-8B-d6-w5-dpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use iRASC/Meerkat-Ko-8B-d6-w5-dpo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="iRASC/Meerkat-Ko-8B-d6-w5-dpo") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("iRASC/Meerkat-Ko-8B-d6-w5-dpo") model = AutoModelForCausalLM.from_pretrained("iRASC/Meerkat-Ko-8B-d6-w5-dpo") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use iRASC/Meerkat-Ko-8B-d6-w5-dpo with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "iRASC/Meerkat-Ko-8B-d6-w5-dpo" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iRASC/Meerkat-Ko-8B-d6-w5-dpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/iRASC/Meerkat-Ko-8B-d6-w5-dpo
- SGLang
How to use iRASC/Meerkat-Ko-8B-d6-w5-dpo with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "iRASC/Meerkat-Ko-8B-d6-w5-dpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iRASC/Meerkat-Ko-8B-d6-w5-dpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "iRASC/Meerkat-Ko-8B-d6-w5-dpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "iRASC/Meerkat-Ko-8B-d6-w5-dpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use iRASC/Meerkat-Ko-8B-d6-w5-dpo with Docker Model Runner:
docker model run hf.co/iRASC/Meerkat-Ko-8B-d6-w5-dpo
iRASC/Meerkat-Ko-8B-d6-w5-dpo
μ΄ λͺ¨λΈμ iRASC/Llama-Ko-8B λͺ¨λΈ κ°λ°μ νμ μ°κ΅¬λ‘ μ§νλμμ΅λλ€.
λͺ¨λΈ μ€λͺ
μ΄ λͺ¨λΈμ νκ΅μ΄ μλ£ μ§μμλ΅(QA) μ±λ₯μ μ΅μ ννκΈ° μν΄ κ°λ°λ λκ·λͺ¨ μΈμ΄ λͺ¨λΈ(LLM)μ λλ€. νΉν, λ¨μΌ μλΉμμ© GPU(RTX 4090)μ κ°μ μμ μ μ½ νκ²½μμλ ν¨μ¨μ μΌλ‘ νμ©λ μ μλλ‘ νλΌλ―Έν° ν¨μ¨μ μΈ λ°©λ²λ‘ μ μ μ©νμ¬ κ΅¬μΆλμμ΅λλ€.
λ³Έ λͺ¨λΈμ λ€μκ³Ό κ°μ κ³Όμ μ ν΅ν΄ μμ±λμμ΅λλ€:
- λͺ¨λΈ λ³ν©(Model Merging): μμ΄ μλ£ νΉν λͺ¨λΈμΈ
dmis-lab/llama-3-meerkat-8b-v1.0κ³Ό λ²μ© νκ΅μ΄ λͺ¨λΈμΈbeomi/Llama-3-Open-Ko-8Bλ₯Ό DARE (Drop and Rescale) κΈ°λ² (λ°λ d=0.6)μ μ¬μ©νμ¬ λ³ν©νμμ΅λλ€. μ΄λ₯Ό ν΅ν΄ μμ΄ μλ£ μ§μκ³Ό νκ΅μ΄ λ₯λ ₯μ ν¨κ³Όμ μΌλ‘ κ²°ν©ν κ°λ ₯ν κΈ°λ° λͺ¨λΈμ ꡬμΆνμ΅λλ€. - μ§μ μ νΈλ μ΅μ ν (Direct Preference Optimization, DPO): μμ±λ λ³ν© λͺ¨λΈμ QLoRA (Quantized Low-Rank Adaptation) λ°©μμΌλ‘ DPO νλμ μ§ννμ¬, νκ΅μ΄ μλ£ QA νμ€ν¬(KorMedMCQA λ°μ΄ν°μ κΈ°λ°)μ λν μ±λ₯μ λμ± ν₯μμν€κ³ λͺ¨λΈμ μ λ ¬(Align)νμ΅λλ€.
κ²°κ³Όμ μΌλ‘, μ΄ λͺ¨λΈμ μ νλ μμ νμμλ νκ΅μ΄ μλ£ κ΄λ ¨ μ§λ¬Έμ λν΄ ν₯μλ λ΅λ³ λ₯λ ₯μ 보μ¬μ£Όλλ‘ μ΅μ νλμμ΅λλ€.
μμΈν νμ΅ μ 보λ μ¬κΈ°μμ νμΈν μ μμ΅λλ€
νμ© λͺ©μ λ° νκ³μ
νμ© λͺ©μ
- νκ΅μ΄ μλ£ μ§μμλ΅: νκ΅μ΄λ‘ λ μλ£ κ΄λ ¨ μ§λ¬Έμ λ΅λ³νλ λ° νμ©λ μ μμ΅λλ€.
- μλ£ μ 보 κ²μ 보쑰: μ¬μ©μμ μλ£ κ΄λ ¨ μ 보 κ²μμ λλ μμ€ν μ λ°±μλλ‘ μ¬μ©λ μ μμ΅λλ€.
- μλ£ μλ΄ μ±λ΄ κ°λ°: κ°λ¨ν μλ£ μλ΄μ΄λ μ 보 μ 곡μ μν μ±λ΄ κ°λ°μ κΈ°λ° λͺ¨λΈλ‘ νμ©λ μ μμ΅λλ€.
- μ°κ΅¬ λͺ©μ : μμ μ μ½ νκ²½μμμ LLM νλ, λͺ¨λΈ λ³ν©, DPO λ± κ΄λ ¨ μ°κ΅¬μ νμ©λ μ μμ΅λλ€.
νκ³μ λ° μ£Όμμ¬ν
- μ λ¬Έ μλ£ μ§λ¨ λ체 λΆκ°: μ΄ λͺ¨λΈμ μ λ¬Έ μλ£μΈμ μ§λ¨μ΄λ μλ΄μ λ체ν μ μμ΅λλ€. μμ±λ μ 보λ μ°Έκ³ μ©μΌλ‘λ§ μ¬μ©ν΄μΌ ν©λλ€.
- μ 보μ μ νμ± λ° μ΅μ μ±: λͺ¨λΈμ νμ΅ λ°μ΄ν°μ κΈ°λ°νμ¬ λ΅λ³μ μμ±νλ―λ‘, μ΅μ μλ£ μ 보λ νΉμ ν¬κ· μ§ν λ±μ λν μ λ³΄κ° λΆμ ννκ±°λ λλ½λ μ μμ΅λλ€.
- νκ° (Hallucination): λ€λ₯Έ LLMκ³Ό λ§μ°¬κ°μ§λ‘, μ¬μ€κ³Ό λ€λ₯΄κ±°λ κ΄λ ¨ μλ μ 보λ₯Ό μμ±νλ νκ° νμμ΄ λ°μν μ μμ΅λλ€. μ€μν μ 보λ λ°λμ κ΅μ°¨ κ²μ¦μ΄ νμν©λλ€.
- νΈν₯μ± (Bias): νμ΅ λ°μ΄ν°μ λ΄μ¬λ νΈν₯μ΄ λͺ¨λΈμ λ΅λ³μ μν₯μ λ―ΈμΉ μ μμ΅λλ€.
- KorMedMCQA νΉν: DPO νλμ΄ KorMedMCQA λ°μ΄ν°μ κΈ°λ°μΌλ‘ μ΄λ£¨μ΄μ‘μΌλ―λ‘, ν΄λΉ νμμ QAμλ κ°μ μ λ³΄μΌ μ μμΌλ λ€λ₯Έ ννμ μλ£ κ΄λ ¨ νμ€ν¬μμλ μ±λ₯μ΄ λ¬λΌμ§ μ μμ΅λλ€.
μ¬μ© λ°©λ²
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "iRASC/Meerkat-Ko-8B-d6-w5-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "system", "content": "λΉμ μ μλ£ κ΄λ ¨ μ§λ¬Έμ λ΅λ³νλ AI μ΄μμ€ν΄νΈμ
λλ€."},
{"role": "user", "content": "μ½λ μ€ν
λ‘€'μ΄ μ νν 무μμΈκ°μ? HDLκ³Ό LDLμ μ΄λ»κ² λ€λ₯Έκ°μ? "}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=1000,
eos_token_id=terminators,
do_sample=True,
temperature=0.7,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
- Downloads last month
- 4