FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation Paper • 2503.06680 • Published Mar 9 • 20
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation Paper • 2503.06680 • Published Mar 9 • 20
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 104
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6, 2024 • 68
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs Paper • 2404.16375 • Published Apr 25, 2024 • 18
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 51
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8, 2024 • 83
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Paper • 2404.04125 • Published Apr 4, 2024 • 29