Instructions to use BAAI/BGE-VL-MLLM-S2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use BAAI/BGE-VL-MLLM-S2 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("BAAI/BGE-VL-MLLM-S2", trust_remote_code=True) sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config
Hi there, I am running the BGE-VL-MLLM-S2 with the official sample code(the VL-MLLM-S1 sample code):
and I encountered the following warning
Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add patch_size and vision_feature_select_strategy to the model's processing config or set directly with processor.patch_size = {{patch_size}} and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47
which comes from the following lines:
query_inputs = model.data_process(
text=text,
images=["img1.jpg", "img2.jpg"],
q_or_c="q",
task_instruction="Retrieve the target image that best meets the combined criteria by using both the provided image and the image retrieval instructions: "
)
Is there any way to fix it? I also find some similar solutions in https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/discussions/34 but still do not know about the appropriate value of these two variables
Oops! sorry for my mistake, the website should be https://huggingface.co/BAAI/BGE-VL-MLLM-S1/discussions/1
