Instructions to use kuotient/mamba-ko-2.8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kuotient/mamba-ko-2.8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kuotient/mamba-ko-2.8b")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("kuotient/mamba-ko-2.8b", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use kuotient/mamba-ko-2.8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kuotient/mamba-ko-2.8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kuotient/mamba-ko-2.8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/kuotient/mamba-ko-2.8b
- SGLang
How to use kuotient/mamba-ko-2.8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kuotient/mamba-ko-2.8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kuotient/mamba-ko-2.8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kuotient/mamba-ko-2.8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kuotient/mamba-ko-2.8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use kuotient/mamba-ko-2.8b with Docker Model Runner:
docker model run hf.co/kuotient/mamba-ko-2.8b
Mamba-ko-2.8B๐
Mamba-ko-2.8B is the state space model, further pretrained(or continous trained) with synthetically generated dataset - korean_textbooks.
If you're interested in building large-scale language models to solve a wide variety of problems in a wide variety of domains, you should consider joining Allganize. For a coffee chat or if you have any questions, please do not hesitate to contact me as well! - kuotient.dev@gmail.com
I would like to thank Allganize Korea for their generosity in providing resources for this personal project. This project is not directly related to the company's goals or research.
TODO
๐ข Training with korean_textbooks dataset - DONE
More training with publicly available Korean corpora
๐ก Instruct tuning
What is Mamba?
Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.
License
Apache 2.0
Model Details
Developed by
Jisoo Kim(kuotient)
Base Model
state-spaces/mamba-2.8b-slimpj
Model Benchmark
KoBEST
| Model | boolq | copa | hellaswag | sentineg |
|---|---|---|---|---|
| kuotient/mamba-ko-2.8b | 0.6213 | 0.6150 | 0.4014 | 0.3383 |
| state_spaces/mamba-2.8b-slimpj | 0.3343 | 0.4867 | 0.3452 | 0.3547 |
| kuotient/mamba-ko-2.8b-old (2B trained only) | 0.4236 | 0.5896 | 0.4012 | 0.4348 |
| kuotient/mamba-ko-2.8b-old-instruct | 0.4041 | 0.6505 | 0.4906 | 0.3348 |
| EleutherAI/polyglot-ko-1.3b | 0.3552 | 0.7196 | 0.5247 | 0.6790 |
| maywell/TinyWand-SFT | 0.3455 | 0.6142 | 0.3944 | N/A |
| microsoft/phi-2 | 0.3343 | 0.4792 | 0.3235 | N/A |
| TinyLlama/TinyLlama-1.1B | 0.3343 | 0.4784 | 0.3396 | N/A |
Thanks
ํ๊ตญ์ด LLM ์ปค๋ฎค๋ํฐ์ ๋ง์ ๊ธฐ์ฌ์ ๋๊ธฐ๋ถ์ฌ๋ฅผ ํด์ฃผ๊ณ ๊ณ์ maywell๋ ๊ฐ์ฌ๋๋ฆฝ๋๋ค.
Usage
pip install causal_conv1d>=1.1.0 mamba-ssm==1.1.1
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "kuotient/mamba-ko-2.8b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
model = MambaLMHeadModel.from_pretrained(
model_name, device=device, dtype=torch.float16)
prompt = "์์ด๋คํํ
์ ๊ณตํ ์์๊ฐ ์๋ ์์ 5๊ฐ์ง์ ์์๋ ๋ค์๊ณผ ๊ฐ๋ค."
tokens = tokenizer(prompt, return_tensors='pt')
input_ids = tokens.input_ids.to(device)
streamer = TextStreamer(tokenizer)
out = model.generate(
input_ids=input_ids,
streamer=streamer,
max_length=2000,
temperature=0.7,
top_p=0.7,
eos_token_id=tokenizer.eos_token_id,
)
- Downloads last month
- 13
docker model run hf.co/kuotient/mamba-ko-2.8b