Instructions to use llava-hf/llava-1.5-7b-hf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use llava-hf/llava-1.5-7b-hf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="llava-hf/llava-1.5-7b-hf") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf") model = AutoModelForImageTextToText.from_pretrained("llava-hf/llava-1.5-7b-hf") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use llava-hf/llava-1.5-7b-hf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "llava-hf/llava-1.5-7b-hf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llava-hf/llava-1.5-7b-hf", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/llava-hf/llava-1.5-7b-hf
- SGLang
How to use llava-hf/llava-1.5-7b-hf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "llava-hf/llava-1.5-7b-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llava-hf/llava-1.5-7b-hf", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "llava-hf/llava-1.5-7b-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llava-hf/llava-1.5-7b-hf", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use llava-hf/llava-1.5-7b-hf with Docker Model Runner:
docker model run hf.co/llava-hf/llava-1.5-7b-hf
How to have a continuous conversation
thanks for your amazing work!
according to your script, we can only have one input, if i want to ask the model more than one question, what should i do?
if I make more than one input, the answer is completely irrelevant to the image...
here is my experiment,
prompt = "USER: <image>\nwhat is the image about\nASSISTANT:"
raw_image = Image.open("/home/ubuntu/code/textual_inversion/zzz/sea.jpg")
inputs = processor(prompt, raw_image, return_tensors="pt").to(device, torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))
the image i provided:
output (which is normal):
ER:
what is the image about
ASSISTANT: The image features a large body of water with a few boats scattered throughout the scene. The water appears to be calm and serene, with a few sailboats and a yacht visible in the distance. The sky above the water is clear and blue, creating a picturesque view of the ocean. The boats are positioned at various distances from each other, adding depth and interest to the scene.
the second input, which has no image, I want the model to answer the question refer to the image i provided before.
prompt = "USER: is the image positive? can you describe the image again?\nASSISTANT:"
inputs = processor(prompt, return_tensors="pt").to(device, torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))
output (which is irrelevant to the image):
ER: is the image positive? can you describe the image again?
ASSISTANT: The image is a positive image of a human brain. It is a close-up view of the brain, showing its intricate structure and details. The image is in black and white, which adds to the dramatic and artistic nature of the photograph. The brain is the main subject of the image, and it is the focal point of the photograph.
In that case, you should append the previous message + image to the prompt, before feeding it back to the model
Thanks @nielsr .
Yes, I just tested this with some conversation loop that just keeps adding USER and ASSISTANT past queries and it worked well.
Can you share the code for the same? I am not getting how to do that.
Hi @prakashshubham , this is what I did to conduct a multi-round conversation and hope you can find this helpful.
queries = [
"<image>\nHow many animated characters are there in this image?",
"Answer with a single number in decimal format. Give no explanations."
]
def generate_response(image):
chat = []
for query in queries:
chat.append({"role": "user", "content": query})
prompt = text_processor.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = processor(prompt, image, return_tensors="pt").to(device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens = 300)
output = processor.decode(output[0], skip_special_tokens=True)
input_ids = inputs["input_ids"]
cutoff = len(text_processor.decode(
input_ids[0],
skip_special_tokens=True,
clean_up_tokenization_spaces=True,
))
answer = output[cutoff:]
chat.append({"role": "assistant", "content": answer})
return answer
Can you share the code for the same? I am not getting how to do that.
Hi @prakashshubham , this is what I did to conduct a multi-round conversation and hope you can find this helpful.
queries = [ "<image>\nHow many animated characters are there in this image?", "Answer with a single number in decimal format. Give no explanations." ] def generate_response(image): chat = [] for query in queries: chat.append({"role": "user", "content": query}) prompt = text_processor.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) inputs = processor(prompt, image, return_tensors="pt").to(device) with torch.no_grad(): output = model.generate(**inputs, max_new_tokens = 300) output = processor.decode(output[0], skip_special_tokens=True) input_ids = inputs["input_ids"] cutoff = len(text_processor.decode( input_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True, )) answer = output[cutoff:] chat.append({"role": "assistant", "content": answer}) return answer
I was later able to do it myself. But still, thanks for this.