-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2404.14219
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 441 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 377 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
meta-llama/Llama-3.3-70B-Instruct
Text Generation • Updated • 597k • • 2.68k -
tencent/HunyuanVideo
Text-to-Video • Updated • 989 • • 2.13k -
Qwen/QwQ-32B-Preview
Text Generation • 33B • Updated • 8.54k • • 1.74k
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 51 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 72 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper • 2402.18039 • Published • 11
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Paper • 2408.01800 • Published • 92 -
SlimLM: An Efficient Small Language Model for On-Device Document Assistance
Paper • 2411.09944 • Published • 12 -
MobileQuant: Mobile-friendly Quantization for On-device Language Models
Paper • 2408.13933 • Published • 16
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 24 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 152 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 189 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 56 -
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper • 2402.18039 • Published • 11
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 300 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 441 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 377 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Paper • 2408.01800 • Published • 92 -
SlimLM: An Efficient Small Language Model for On-Device Document Assistance
Paper • 2411.09944 • Published • 12 -
MobileQuant: Mobile-friendly Quantization for On-device Language Models
Paper • 2408.13933 • Published • 16
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
meta-llama/Llama-3.3-70B-Instruct
Text Generation • Updated • 597k • • 2.68k -
tencent/HunyuanVideo
Text-to-Video • Updated • 989 • • 2.13k -
Qwen/QwQ-32B-Preview
Text Generation • 33B • Updated • 8.54k • • 1.74k
-
AgentInstruct: Toward Generative Teaching with Agentic Flows
Paper • 2407.03502 • Published • 51 -
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Paper • 2406.08464 • Published • 72 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 259 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper • 2402.10379 • Published • 31