TTS AGI

Team

community

Verified

https://ttsarena.org

TTS-AGI

Activity Feed

AI & ML interests

Decentralized yet unified efforts to accelerate research for Open Text to Speech (TTS) systems!

Recent Activity

sanchit-gandhi authored a paper 2 days ago

Voxtral Realtime

gijs updated a dataset 3 days ago

TTS-AGI/DACVAE-latents

gijs published a dataset 3 days ago

TTS-AGI/DACVAE-latents

View all activity

sanchit-gandhi

authored a paper 2 days ago

Voxtral Realtime

Paper • 2602.11298 • Published 15 days ago • 16

gijs

updated a dataset 3 days ago

TTS-AGI/DACVAE-latents

Viewer • Updated 3 days ago • 28M • 372

gijs

published a dataset 3 days ago

TTS-AGI/DACVAE-latents

Viewer • Updated 3 days ago • 28M • 372

mrfakename

published a model 7 days ago

TTS-AGI/ACE-Step-1.5-Backup

Updated 29 days ago

mrfakename

in TTS-AGI/TTS-Arena-V2 14 days ago

[Model Addition Request] Lightning v3.1 - 44kHz High-Fidelity TTS with Ultra-Low Latency

#111 opened 14 days ago by

stalwartcoder

pcuenq

posted an update about 2 months ago

Post

3520

👉 What happened in AI in 2025? 👈

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1️⃣ Q1 — Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2️⃣ Q2 — Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3️⃣ Q3 — "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4️⃣ Q4 — Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🤯

Credits
🙏 NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

🫡 @reach-vb for the original idea, design and recipe

🙌 @ariG23498 and yours truly for compiling and verifying the 2025 edition

🥳 Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! 🥂

2 replies

mrfakename

posted an update 3 months ago

Post

15944

Excited to share that I've joined the Hugging Face Fellows program! 🤗

Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀

reach-vb

authored a paper 3 months ago

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Paper • 2510.06961 • Published Oct 8, 2025 • 11

Steveeeeeeen

authored 2 papers 3 months ago

Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement

Paper • 2510.23141 • Published Oct 27, 2025 • 5

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Paper • 2510.06961 • Published Oct 8, 2025 • 11

mrfakename

posted an update 4 months ago

Post

6266

Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo

(Turn 🔊 on to hear audio samples)

5 replies

multimodalart

posted an update 4 months ago

Post

19527

Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt