B-score: Detecting biases in large language models using response history Paper • 2505.18545 • Published May 24, 2025 • 30
Understanding Generative AI Capabilities in Everyday Image Editing Tasks Paper • 2505.16181 • Published May 22, 2025 • 24
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance Paper • 2505.15952 • Published May 21, 2025 • 20
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs Paper • 2503.02003 • Published Mar 3, 2025 • 48
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13, 2025 • 43
hungnh1125/llava_siglip_llama3_8b_finetune_localize_difference_28_08_2024_8192_lora Updated Sep 5, 2024
hungnh1125/llava_siglip_llama3_8b_finetune_localize_difference_27_08_2024_8192_lora Updated Aug 28, 2024