CoLLaVO & MoAI
Collection
Computer Vision-aided Efficient 7B size Large Language and Vision Models. Let's enjoy it • 2 items • Updated • 2
How to use BK-Lee/MoAI-7B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="BK-Lee/MoAI-7B") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("BK-Lee/MoAI-7B", dtype="auto")How to use BK-Lee/MoAI-7B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BK-Lee/MoAI-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "BK-Lee/MoAI-7B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/BK-Lee/MoAI-7B
How to use BK-Lee/MoAI-7B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "BK-Lee/MoAI-7B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "BK-Lee/MoAI-7B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "BK-Lee/MoAI-7B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "BK-Lee/MoAI-7B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use BK-Lee/MoAI-7B with Docker Model Runner:
docker model run hf.co/BK-Lee/MoAI-7B
This repository contains the weights of the model presented in MoAI: Mixture of All Intelligence for Large Language and Vision Models.
You need only the following seven steps.
git clone https://github.com/ByungKwanLee/MoAI
bash install
from PIL import Image
from torchvision.transforms import Resize
from torchvision.transforms.functional import pil_to_tensor
image_path = "figures/moai_mystery.png"
image = Resize(size=(490, 490), antialias=False)(pil_to_tensor(Image.open(image_path)))
prompt = "Describe this image in detail."
from moai.load_moai import prepare_moai
moai_model, moai_processor, seg_model, seg_processor, od_model, od_processor, sgg_model, ocr_model \
= prepare_moai(moai_path='BK-Lee/MoAI-7B', bits=4, grad_ckpt=False, lora=False, dtype='fp16')
moai_inputs = moai_model.demo_process(image=image,
prompt=prompt,
processor=moai_processor,
seg_model=seg_model,
seg_processor=seg_processor,
od_model=od_model,
od_processor=od_processor,
sgg_model=sgg_model,
ocr_model=ocr_model,
device='cuda:0')
import torch
with torch.inference_mode():
generate_ids = moai_model.generate(**moai_inputs, do_sample=True, temperature=0.9, top_p=0.95, max_new_tokens=256, use_cache=True)
answer = moai_processor.batch_decode(generate_ids, skip_special_tokens=True)[0].split('[U')[0]
print(answer)