ApsaraStackMaaS/EvoQwen2.5-VL-Retriever-7B-v1 Visual Document Retrieval β’ 8B β’ Updated Nov 4 β’ 72 β’ 17
Running on Zero 18 Moondream3 Preview π 18 Process images and text to answer questions, caption, detect objects, and find points
view post Post 1088 Haystack can now see πThe latest release of the Haystack OSS LLM framework adds a long-requested feature: image support!π Notebooks below This isn't just about passing images to an LLM. We built several features to enable practical multimodal use cases.What's new?π§ Support for multiple LLM providers: OpenAI, Amazon Bedrock, Google Gemini, Mistral, NVIDIA, OpenRouter, Ollama and more (support for Hugging Face API coming π)ποΈ Prompt template language to handle structured inputs, including imagesπ PDF and image convertersπ Image embedders using CLIP-like modelsπ§Ύ LLM-based extractor to pull text from imagesπ§© Components to build multimodal RAG pipelines and AgentsI had the chance of leading this effort with @sjrhuschlee (great collab).π Below you can find two notebooks to explore the new features:σ ―β’σ σ Introduction to Multimodal Text Generation https://haystack.deepset.ai/cookbook/multimodal_introσ ―β’σ σ Creating Vision+Text RAG Pipelines https://haystack.deepset.ai/tutorials/46_multimodal_rag(πΌοΈ image by @bilgeyucel ) See translation β€οΈ 4 4 π 3 3 + Reply
Qwen/Qwen3-Coder-30B-A3B-Instruct Text Generation β’ 31B β’ Updated 27 days ago β’ 835k β’ β’ 837
Shakker-Labs/FLUX.1-dev-LoRA-Vector-Journey Text-to-Image β’ Updated Oct 15, 2024 β’ 148 β’ β’ 207
Intelligent-Internet/II-Medical-8B-1706 Text Generation β’ 8B β’ Updated Aug 12 β’ 297 β’ β’ 135