Strix-XSS-4B-RL (PoC)

⚠️ Proof of Concept: This model is an early research prototype demonstrating RL-trained XSS detection. It is not production-ready and requires a modified version of Strix to function properly.

Model Description

Strix-XSS-Qwen3-4B-RL is a specialized language model fine-tuned using reinforcement learning to detect Cross-Site Scripting (XSS) vulnerabilities. Built on Qwen3-4B-Thinking-2507, this model is designed to work as a sub-agent within Strix, an AI-powered penetration testing framework.

This model demonstrates the viability of using RL to train specialized security testing agents that can autonomously identify web application vulnerabilities.

Key Features

RL-Trained: Fine-tuned using reinforcement learning on simulated penetration testing environments
Specialized Focus: Optimized specifically for XSS vulnerability detection
Agent Architecture: Designed for multi-agent systems where different models handle different vulnerability types
Lightweight: 4B parameters, suitable for deployment on consumer hardware

Performance

Strix-XSS Evaluation: 0.79 (measured on strix-xss from the Prime Intellect environment hub)

This evaluation measures the model's ability to correctly identify XSS vulnerabilities in simulated web applications using the Strix testing framework.

Training Details

Training Data

Dataset Size: 135 examples
Environment: Simulated web application environment with Strix tooling
Training Method: Reinforcement Learning
Training Platform: Prime Intellect hosted training beta

Special thanks to Prime Intellect for providing the infrastructure that made this research possible! Their hosted training platform enabled efficient RL training for this security-focused model.

Base Model

Architecture: Qwen3-4B-Thinking-2507
Parameters: 4 billion
Context Length: [inherited from base model]

Intended Use

This model is a proof of concept intended to demonstrate:

The feasibility of RL-trained agents for vulnerability detection
Integration patterns for specialized sub-agents in security testing frameworks
Research directions for AI-powered penetration testing

Integration with Strix

This model is designed to work with a modified version of Strix that supports configurable sub-agent models for specific vulnerability types

Note: The public version of Strix does not yet support custom sub-agent models. This integration is currently experimental.

Limitations

PoC Status: Not production-ready; requires additional testing and validation
Specialized Scope: Trained only on XSS detection, not general security testing
Small Dataset: 135 training examples limits generalization
Simulated Training: Performance on real-world targets may vary
Framework Dependency: Designed specifically for Strix; may not work well in isolation

Model Versions

Full Precision: This repository
GGUF Quantized: Available at kusonooyasumi/strix-xss-qwen3-4b-rl-gguf
- Q4_K_M (recommended for most users)
- Q5_K_M (better quality)
- Q8_0 (high quality)
- FP16 (full quality)

License

This model is released under the MIT License. See LICENSE file for details.

Citation

If you use this model in your research, please cite:

@misc{strix-xss-rl,
  author = {oyasumi},
  title = {Strix-XSS-4B-RL: An RL-Trained Model for XSS Detection},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/kusonooyasumi/strix-xss-qwen3-4b-rl}}
}