denniscraandijk (Dennis)

upvoted a collection 4 months ago

Apertus LLM

Democratizing Open and Compliant LLMs for Global Language Environments: 8B and 70B open-data open-weights models, multilingual in >1000 languages • 4 items • Updated Oct 1, 2025 • 318

upvoted 5 papers 10 months ago

It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers

Paper • 2502.03793 • Published Feb 6, 2025 • 4

upvoted a collection 10 months ago

Babel

Collection

Open Multilingual Large Language Models Serving Over 90% of Global Speakers • 5 items • Updated Apr 15, 2025 • 18

upvoted a paper 10 months ago

Rank1: Test-Time Compute for Reranking in Information Retrieval

Paper • 2502.18418 • Published Feb 25, 2025 • 28

upvoted 2 collections 11 months ago

Nomic Embed v2

Collection

Multilingual Embedding Models • 5 items • Updated Apr 30, 2025 • 21

Tulu 3 Models

Collection

All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated 13 days ago • 103

upvoted a paper 11 months ago

SPLADE-v3: New baselines for SPLADE

Paper • 2403.06789 • Published Mar 11, 2024 • 5

upvoted a collection 12 months ago

Lychee-KaLM-embedding

Collection

17 items • Updated Nov 24, 2025 • 25

upvoted 3 collections about 1 year ago

Granite 3.1 Language Models

Collection

A series of language models with 128K context length trained by IBM licensed under Apache 2.0 license. • 9 items • Updated Nov 17, 2025 • 68

Common Corpus

Collection

Largest multilingual pretraining data. • 1 item • Updated Nov 13, 2024 • 13

POTION

Collection

These are the flagship POTION models. Load them and use them with model2vec (https://github.com/MinishLab/model2vec) or sentence-transformers • 6 items • Updated Nov 13, 2025 • 14

upvoted a paper about 1 year ago

Contextual Document Embeddings

Paper • 2410.02525 • Published Oct 3, 2024 • 24

upvoted 2 papers over 1 year ago

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83

Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 82

upvoted 2 papers almost 2 years ago

GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence

Paper • 2310.05388 • Published Oct 9, 2023 • 4

Weaver: Foundation Models for Creative Writing

Paper • 2401.17268 • Published Jan 30, 2024 • 45

Dennis

AI & ML interests

Organizations

Apertus LLM

SuperBPE: Space Travel for Language Models

Gemini Embedding: Generalizable Embeddings from Gemini

EuroBERT: Scaling Multilingual Encoders for European Languages

Training Sparse Mixture Of Experts Text Embedding Models

It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers

Babel

Rank1: Test-Time Compute for Reranking in Information Retrieval

Nomic Embed v2

Tulu 3 Models

SPLADE-v3: New baselines for SPLADE

Lychee-KaLM-embedding

Granite 3.1 Language Models

Common Corpus

POTION

Contextual Document Embeddings

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Improving Text Embeddings with Large Language Models

GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence

Weaver: Foundation Models for Creative Writing

Dennis

AI & ML interests

Organizations

denniscraandijk's activity