Running 37 BigCodeArena ๐ 37 Compare two AI models by sending them code and seeing their responses
view article Article BigCodeArena: Judging code generations end to end with code executions Oct 7, 2025 โข 19