X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework Paper • 2601.03194 • Published Jan 6 • 2
X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework Paper • 2601.03194 • Published Jan 6 • 2
Robust and Calibrated Detection of Authentic Multimedia Content Paper • 2512.15182 • Published Dec 17, 2025 • 17
Robust and Calibrated Detection of Authentic Multimedia Content Paper • 2512.15182 • Published Dec 17, 2025 • 17
EthicsMH: A Pilot Benchmark for Ethical Reasoning in Mental Health AI Paper • 2509.11648 • Published Sep 15, 2025 • 2
D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning Paper • 2509.06771 • Published Sep 8, 2025 • 6
Query Attribute Modeling: Improving search relevance with Semantic Search and Meta Data Filtering Paper • 2508.04683 • Published Aug 6, 2025
DSBC : Data Science task Benchmarking with Context engineering Paper • 2507.23336 • Published Jul 31, 2025 • 2
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees Paper • 2506.14606 • Published Jun 17, 2025 • 11
A Technical Study into Small Reasoning Language Models Paper • 2506.13404 • Published Jun 16, 2025 • 8
A Technical Study into Small Reasoning Language Models Paper • 2506.13404 • Published Jun 16, 2025 • 8
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark Paper • 2505.16968 • Published May 22, 2025 • 40
Uncovering Cultural Representation Disparities in Vision-Language Models Paper • 2505.14729 • Published May 20, 2025 • 1
ReEx-SQL: Reasoning with Execution-Aware Reinforcement Learning for Text-to-SQL Paper • 2505.12768 • Published May 19, 2025 • 4
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? Paper • 2408.05109 • Published Aug 9, 2024