Instructions to use llava-hf/llava-1.5-7b-hf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use llava-hf/llava-1.5-7b-hf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llava-hf/llava-1.5-7b-hf")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
model = AutoModelForImageTextToText.from_pretrained("llava-hf/llava-1.5-7b-hf")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use llava-hf/llava-1.5-7b-hf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "llava-hf/llava-1.5-7b-hf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llava-hf/llava-1.5-7b-hf",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/llava-hf/llava-1.5-7b-hf

SGLang

How to use llava-hf/llava-1.5-7b-hf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "llava-hf/llava-1.5-7b-hf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llava-hf/llava-1.5-7b-hf",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "llava-hf/llava-1.5-7b-hf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llava-hf/llava-1.5-7b-hf",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use llava-hf/llava-1.5-7b-hf with Docker Model Runner:
```
docker model run hf.co/llava-hf/llava-1.5-7b-hf
```

How to have a continuous conversation

#19

by sucongCJS - opened Feb 23, 2024

Discussion

sucongCJS

Feb 23, 2024

thanks for your amazing work!
according to your script, we can only have one input, if i want to ask the model more than one question, what should i do?
if I make more than one input, the answer is completely irrelevant to the image...
here is my experiment,

prompt = "USER: <image>\nwhat is the image about\nASSISTANT:"
raw_image = Image.open("/home/ubuntu/code/textual_inversion/zzz/sea.jpg")
inputs = processor(prompt, raw_image, return_tensors="pt").to(device, torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

the image i provided:

output (which is normal):
ER:
what is the image about
ASSISTANT: The image features a large body of water with a few boats scattered throughout the scene. The water appears to be calm and serene, with a few sailboats and a yacht visible in the distance. The sky above the water is clear and blue, creating a picturesque view of the ocean. The boats are positioned at various distances from each other, adding depth and interest to the scene.

the second input, which has no image, I want the model to answer the question refer to the image i provided before.

prompt = "USER: is the image positive? can you describe the image again?\nASSISTANT:"
inputs = processor(prompt, return_tensors="pt").to(device, torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

output (which is irrelevant to the image):
ER: is the image positive? can you describe the image again?
ASSISTANT: The image is a positive image of a human brain. It is a close-up view of the brain, showing its intricate structure and details. The image is in black and white, which adds to the dramatic and artistic nature of the photograph. The brain is the main subject of the image, and it is the focal point of the photograph.

ggcristian

Apr 21, 2024

Hi, @sucongCJS

Were you able to get a way to do this by script?

nielsr

Llava Hugging Face org Apr 22, 2024

In that case, you should append the previous message + image to the prompt, before feeding it back to the model

ggcristian

Apr 22, 2024

Thanks @nielsr .

Yes, I just tested this with some conversation loop that just keeps adding USER and ASSISTANT past queries and it worked well.

prakashshubham

Jun 21, 2024

Hi @ggcristian
Can you share the code for the same? I am not getting how to do that.

Yao-Lirong

Jul 3, 2024

•

edited Jul 3, 2024

Can you share the code for the same? I am not getting how to do that.

Hi @prakashshubham , this is what I did to conduct a multi-round conversation and hope you can find this helpful.

queries = [
   "<image>\nHow many animated characters are there in this image?",
   "Answer with a single number in decimal format. Give no explanations."
]

def generate_response(image):
    chat = []
    for query in queries:
        chat.append({"role": "user", "content": query})
        prompt = text_processor.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
        inputs = processor(prompt, image, return_tensors="pt").to(device)

        with torch.no_grad():
            output = model.generate(**inputs, max_new_tokens = 300)
        output = processor.decode(output[0], skip_special_tokens=True)
        
        input_ids = inputs["input_ids"]
        cutoff = len(text_processor.decode(
                            input_ids[0],
                            skip_special_tokens=True,
                            clean_up_tokenization_spaces=True,
                        ))
        answer = output[cutoff:]
        chat.append({"role": "assistant", "content": answer})
    return answer

prakashshubham

Jul 3, 2024

Can you share the code for the same? I am not getting how to do that.

Hi @prakashshubham , this is what I did to conduct a multi-round conversation and hope you can find this helpful.

queries = [
   "<image>\nHow many animated characters are there in this image?",
   "Answer with a single number in decimal format. Give no explanations."
]

def generate_response(image):
    chat = []
    for query in queries:
        chat.append({"role": "user", "content": query})
        prompt = text_processor.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
        inputs = processor(prompt, image, return_tensors="pt").to(device)

        with torch.no_grad():
            output = model.generate(**inputs, max_new_tokens = 300)
        output = processor.decode(output[0], skip_special_tokens=True)
        
        input_ids = inputs["input_ids"]
        cutoff = len(text_processor.decode(
                            input_ids[0],
                            skip_special_tokens=True,
                            clean_up_tokenization_spaces=True,
                        ))
        answer = output[cutoff:]
        chat.append({"role": "assistant", "content": answer})
    return answer

I was later able to do it myself. But still, thanks for this.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment