aimagelab/ReT-M2KR
Preview • Updated • 701 • 2
How to use aimagelab/ReT2-M2KR-ColBERT-CLIP-ViT-L with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("aimagelab/ReT2-M2KR-ColBERT-CLIP-ViT-L", dtype="auto")Official implementation of ReT-2: Recurrence Meets Transformers for Universal Multimodal Retrieval.
This model features a visual backbone based on openai/clip-vit-large-patch14 and a textual backbone based on colbert-ir/colbertv2.0.
The backbones have been fine-tuned on the M2KR dataset.
@article{caffagni2025recurrencemeetstransformers,
title={{Recurrence Meets Transformers for Universal Multimodal Retrieval}},
author={Davide Caffagni and Sara Sarto and Marcella Cornia and Lorenzo Baraldi and Rita Cucchiara},
journal={arXiv preprint arXiv:2509.08897},
year={2025}
}
Base model
colbert-ir/colbertv2.0