5 20

Jojun84

Jojun8403

AI & ML interests

None yet

Recent Activity

liked a model 8 days ago

FINAL-Bench/Darwin-398B-JGOS

reacted to SeaWolf-AI's post with 🔥 9 days ago

🚀 Introducing FINAL-Bench Quantum — an open, neutral benchmark that finally puts quantum-computing methods on one fair yardstick. Quantum results are notoriously hard to compare. The same "logical error rate" or "query fidelity" means very different things depending on the code, noise model, hardware, and shot count. FINAL-Bench Quantum fixes that: five events judged under identical, published protocols, where every number is labeled as either measured here or quoted from a source. Five events: ① QEC Decoder ② Optimization (Max-Cut) ③ VQE ④ QRAM ⑤ Quantum Simulation The rules are simple and strict: ✅ Track A (measured here, with 95% confidence intervals) is kept separate from Track B (quoted from papers, not directly comparable). 🔬 Simulation and real hardware are clearly distinguished, and no quantum-advantage claims are made. 🌍 Methods from Google, IBM, NVIDIA, USTC, Riverlane and more sit side by side, with origin flags and author credits. 📤 Anyone can submit their own method via the Submit tab for review and listing. Already on the board: real IBM Heron r2 measurements (repetition-code distance boundary, 29–175× error reduction from d3 to d5), a real-chip QRAM query fidelity of 0.92, and H₂ VQE at chemical accuracy — always labeled honestly as simulation vs hardware. A leaderboard is only useful if you can trust it, so neutrality is the whole point: strong competitors stay in even when they beat the host, sources are quoted faithfully, and a simulation is never rounded up into a hardware claim. Leaderboard: https://huggingface.co/spaces/FINAL-Bench/quantum-bench-leaderboard Article: https://huggingface.co/blog/FINAL-Bench/quantum-leaderboard #quantum #QEC #QuantumComputing #benchmark

upvoted an article 9 days ago

FINAL-Bench Quantum: An Open, Neutral Benchmark for Quantum-Computing Methods

View all activity

Organizations

None yet

liked a model 8 days ago

FINAL-Bench/Darwin-398B-JGOS

Text Generation • 403B • Updated 6 days ago • 342 • 28

reacted to SeaWolf-AI's post with 🔥 9 days ago

Post

6759

🚀 Introducing FINAL-Bench Quantum — an open, neutral benchmark that finally puts quantum-computing methods on one fair yardstick.

Quantum results are notoriously hard to compare. The same "logical error rate" or "query fidelity" means very different things depending on the code, noise model, hardware, and shot count. FINAL-Bench Quantum fixes that: five events judged under identical, published protocols, where every number is labeled as either measured here or quoted from a source.

Five events: ① QEC Decoder ② Optimization (Max-Cut) ③ VQE ④ QRAM ⑤ Quantum Simulation

The rules are simple and strict:
✅ Track A (measured here, with 95% confidence intervals) is kept separate from Track B (quoted from papers, not directly comparable).
🔬 Simulation and real hardware are clearly distinguished, and no quantum-advantage claims are made.
🌍 Methods from Google, IBM, NVIDIA, USTC, Riverlane and more sit side by side, with origin flags and author credits.
📤 Anyone can submit their own method via the Submit tab for review and listing.

Already on the board: real IBM Heron r2 measurements (repetition-code distance boundary, 29–175× error reduction from d3 to d5), a real-chip QRAM query fidelity of 0.92, and H₂ VQE at chemical accuracy — always labeled honestly as simulation vs hardware.

A leaderboard is only useful if you can trust it, so neutrality is the whole point: strong competitors stay in even when they beat the host, sources are quoted faithfully, and a simulation is never rounded up into a hardware claim.

Leaderboard: FINAL-Bench/quantum-bench-leaderboard
Article: https://huggingface.co/blog/FINAL-Bench/quantum-leaderboard

#quantum #QEC #QuantumComputing #benchmark

2 replies

upvoted an article 9 days ago

Article

FINAL-Bench Quantum: An Open, Neutral Benchmark for Quantum-Computing Methods

FINAL-Bench

•

9 days ago

• 17

liked a Space 9 days ago

FINAL-Bench Quantum Leaderboard

⚛

Neutral quantum-method benchmark — QEC decoders & more

liked a model 26 days ago

FINAL-Bench/Darwin-60B-DUO

Text Generation • Updated 20 days ago • 701 • 33

liked a model about 1 month ago

FINAL-Bench/Darwin-28B-REASON

Text Generation • 27B • Updated 10 days ago • 949 • 36

upvoted a paper about 1 month ago

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Paper • 2605.14386 • Published May 14 • 62

upvoted an article about 1 month ago

Article

Training-Free Reasoning at 88.89% on GPQA Diamond: How Darwin Family Hit Frontier Scores Without a Single Gradient Step

FINAL-Bench

•

May 15

• 18

reacted to SeaWolf-AI's post with 🔥 about 1 month ago

Post

5430

🧬 Darwin Family: Zero Gradient Steps, GPQA Diamond 88.89%

How far can we push LLM reasoning *without* training?

Our team at VIDRAFT submitted this paper to Daily Papers yesterday, and it's
currently #3. Huge thanks to everyone who upvoted — sharing the core ideas below.

🔗 Paper: Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning (2605.14386)
🔗 arXiv: https://arxiv.org/abs/2605.14386
🔗 Model: FINAL-Bench/Darwin-28B-REASON
🔗 Model: FINAL-Bench/Darwin-28B-Opus

---

TL;DR

Darwin Family is a training-free evolutionary merging framework.
By recombining the weight spaces of existing LLM checkpoints — with zero
gradient-based training — it reaches frontier-level reasoning.

- 🏆 Darwin-28B-Opus: GPQA Diamond 88.89%
- 💸 Zero gradient steps — not a single B200 or H200 hour needed
- 🧬 Consistent gains across 4B → 35B scale
- 🔀 Cross-architecture breeding between Transformer and Mamba families
- 🔁 Stable recursive multi-generation evolution

#Three Core Mechanisms

① 14-dim Adaptive Merge Genome — fine-grained recombination at both
component level (Attention / FFN / MLP / LayerNorm / Embedding) and block
level, expanding the prior evolutionary-merge search space.

② MRI-Trust Fusion — we diagnose each layer's reasoning contribution
via an **MRI (Model Reasoning Importance)** signal and fuse it with
evolutionary search through a **learnable trust parameter**. Trust the
diagnostic too much and search collapses; ignore it and search becomes
inefficient — Darwin learns the balance from data.

③ Architecture Mapper — weight-space breeding across heterogeneous
families. Attention × SSM crossover actually works.

Why It Matters
> Diagnose latent capabilities already encoded in open checkpoints,
> and recombine them — no gradients required.

Replies and critiques welcome 🙌