Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config

by thirdinwinter - opened Mar 13, 2025

Mar 13, 2025

Hi there, I am running the BGE-VL-MLLM-S2 with the official sample code(the VL-MLLM-S1 sample code):

and I encountered the following warning

Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47

which comes from the following lines:
query_inputs = model.data_process(
text=text,
images=["img1.jpg", "img2.jpg"],
q_or_c="q",
task_instruction="Retrieve the target image that best meets the combined criteria by using both the provided image and the image retrieval instructions: "
)

Is there any way to fix it? I also find some similar solutions in https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/discussions/34 but still do not know about the appropriate value of these two variables

thirdinwinter

Mar 13, 2025

Oops! sorry for my mistake, the website should be https://huggingface.co/BAAI/BGE-VL-MLLM-S1/discussions/1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment