small models
updated
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper
• 2310.10837
• Published • 11
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper
• 2310.11453
• Published • 106
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper
• 2310.16795
• Published • 27
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper
• 2310.16836
• Published • 14
FP8-LM: Training FP8 Large Language Models
Paper
• 2310.18313
• Published • 33
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Paper
• 2310.19102
• Published • 11
Ziya2: Data-centric Learning is All LLMs Need
Paper
• 2311.03301
• Published • 20
Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Paper
• 2312.12682
• Published • 9
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published • 61
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published • 95
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published • 73
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published • 47
Rethinking Optimization and Architecture for Tiny Language Models
Paper
• 2402.02791
• Published • 13
Scaling Laws for Downstream Task Performance of Large Language Models
Paper
• 2402.04177
• Published • 20
HARE: HumAn pRiors, a key to small language model Efficiency
Paper
• 2406.11410
• Published • 40