🚀 Introducing FINAL-Bench Quantum — an open, neutral benchmark that finally puts quantum-computing methods on one fair yardstick.
Quantum results are notoriously hard to compare. The same "logical error rate" or "query fidelity" means very different things depending on the code, noise model, hardware, and shot count. FINAL-Bench Quantum fixes that: five events judged under identical, published protocols, where every number is labeled as either measured here or quoted from a source.
Five events: ① QEC Decoder ② Optimization (Max-Cut) ③ VQE ④ QRAM ⑤ Quantum Simulation
The rules are simple and strict: ✅ Track A (measured here, with 95% confidence intervals) is kept separate from Track B (quoted from papers, not directly comparable). 🔬 Simulation and real hardware are clearly distinguished, and no quantum-advantage claims are made. 🌍 Methods from Google, IBM, NVIDIA, USTC, Riverlane and more sit side by side, with origin flags and author credits. 📤 Anyone can submit their own method via the Submit tab for review and listing.
Already on the board: real IBM Heron r2 measurements (repetition-code distance boundary, 29–175× error reduction from d3 to d5), a real-chip QRAM query fidelity of 0.92, and H₂ VQE at chemical accuracy — always labeled honestly as simulation vs hardware.
A leaderboard is only useful if you can trust it, so neutrality is the whole point: strong competitors stay in even when they beat the host, sources are quoted faithfully, and a simulation is never rounded up into a hardware claim.
🧬 Darwin Family: Zero Gradient Steps, GPQA Diamond 88.89%
How far can we push LLM reasoning *without* training?
Our team at VIDRAFT submitted this paper to Daily Papers yesterday, and it's currently #3. Huge thanks to everyone who upvoted — sharing the core ideas below.
Darwin Family is a training-free evolutionary merging framework. By recombining the weight spaces of existing LLM checkpoints — with zero gradient-based training — it reaches frontier-level reasoning.
- 🏆 Darwin-28B-Opus: GPQA Diamond 88.89% - 💸 Zero gradient steps — not a single B200 or H200 hour needed - 🧬 Consistent gains across 4B → 35B scale - 🔀 Cross-architecture breeding between Transformer and Mamba families - 🔁 Stable recursive multi-generation evolution
#Three Core Mechanisms
① 14-dim Adaptive Merge Genome — fine-grained recombination at both component level (Attention / FFN / MLP / LayerNorm / Embedding) and block level, expanding the prior evolutionary-merge search space.
② MRI-Trust Fusion — we diagnose each layer's reasoning contribution via an **MRI (Model Reasoning Importance)** signal and fuse it with evolutionary search through a **learnable trust parameter**. Trust the diagnostic too much and search collapses; ignore it and search becomes inefficient — Darwin learns the balance from data.
③ Architecture Mapper — weight-space breeding across heterogeneous families. Attention × SSM crossover actually works.
Why It Matters > Diagnose latent capabilities already encoded in open checkpoints, > and recombine them — no gradients required.