Instructions to use jinaai/jina-embeddings-v4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jinaai/jina-embeddings-v4 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("jinaai/jina-embeddings-v4", trust_remote_code=True, dtype="auto") - ColPali
How to use jinaai/jina-embeddings-v4 with ColPali:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- sentence-transformers
How to use jinaai/jina-embeddings-v4 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Can we embed multiple images and text into a single embedding?
Can we have like 5 images and 5 sentences in one embedding?
Hey, if you want to encode 5 sentences into one embedding, you can just concatenate them. For images the encode function does not support it. This means you need to implement yourself a function that converts the images into a sequence of tokens that you can pass to the model. So you basically need to implement something that does the functionality of the encode function [1] yourself but pass multiple images (that should not be too complicated). If you want to encode both text and images into a single embedding you can do it in a similar way. Nevertheless the model is only trained to encode single images and pure text into one embedding representation. So I don't now if multi-model inputs or inputs with multiple images with produce good embeddings.
[1] https://huggingface.co/jinaai/jina-embeddings-v4/blob/main/modeling_jina_embeddings_v4.py#L487-L546
We also plan to support encoding multiple images at the time into multiple embedding, i.e., late chunking for images, e.g., to preserve context between pdf pages of the same document by using the late chunking method [1] . But first we need to run some experiments how well this works.
Thank you for your reply! Makes sense. Super interesting work though! Thank you for sharing