What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models Paper • 2601.06165 • Published 17 days ago • 16
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 18 days ago • 5
Everyday Physics in Korean Contexts: A Culturally Grounded Physical Reasoning Benchmark Paper • 2509.17807 • Published Sep 22, 2025 • 1
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 19
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 19 days ago • 10
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 19 days ago • 10
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 19 days ago • 10
AIM-Intelligence/COMPASS_Qwen2.5-7B-Instruct_LoRA Text Generation • 8B • Updated 18 days ago • 20 • 2
AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset Viewer • Updated 18 days ago • 5.92k • 191 • 10
AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset Viewer • Updated 18 days ago • 5.92k • 191 • 10
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 18 days ago • 5
AIM-Intelligence/COMPASS_Qwen2.5-7B-Instruct_LoRA Text Generation • 8B • Updated 18 days ago • 20 • 2
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 18 days ago • 5