Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security Paper β’ 2507.19399 β’ Published Jul 25, 2025 β’ 1
LionGuard 2: Building Lightweight, Data-Efficient & Localised Multilingual Content Moderators Paper β’ 2507.15339 β’ Published Jul 21, 2025
Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation Paper β’ 2507.11966 β’ Published Jul 16, 2025
Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications Paper β’ 2507.09820 β’ Published Jul 13, 2025
RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages Paper β’ 2507.05980 β’ Published Jul 8, 2025 β’ 1
MinorBench: A hand-built benchmark for content-based risks for children Paper β’ 2503.10242 β’ Published Mar 13, 2025 β’ 5
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection Paper β’ 2411.12946 β’ Published Nov 20, 2024 β’ 22