Demo not working

#2
by xezpeleta - opened

Hi! The microphone is working (i see the audio wave) but there isn't any transcription generated when I speak.

This is the error I see in the browser console:

/api/spaces/by-subdomain/mistralai-voxtral-mini-realtime:1  Failed to load resource: the server responded with a status of 400 ()

Can be due to the server overload?

Mistral AI_ org

Yes indeed, we deployed the server with VLLM and it's getting a lot of requests. More than it can handle. Will switch to local processing in the future when possible.

I have this error, with my extensions + latest firefox version:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://de5282c3ca0c.edge.sdk.awswaf.com/de5282c3ca0c/526cf06acb0d/report. (Reason: CORS request did not succeed). Status code: (null).
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://de5282c3ca0c.edge.sdk.awswaf.com/de5282c3ca0c/526cf06acb0d/telemetry. (Reason: CORS request did not succeed). Status code: (null).

Without, it's still not working..
On another browser (Safari latest version), not working also ๐Ÿ˜ฅ

The safari console:
[Error] Failed to load resource: the server responded with a status of 400 () (mistralai-voxtral-mini-realtime, line 0)
Blocked a frame with origin "https://huggingface.co" from accessing a frame with origin "https://mistralai-voxtral-mini-realtime.hf.space". Protocols, domains, and ports must match.

Screenshot 2026-02-05 at 10.32.58

Yes indeed, we deployed the server with VLLM and it's getting a lot of requests. More than it can handle. Will switch to local processing in the future when possible.

You should add AVD and debounce ms options for optional latency. I see demo send data every second even when I don't speak.

Mistral AI_ org

A WebGPU version will be coming once the implementation is done in transformers/transformers.js

In the meantime, went back to API.

Hi! I'm trying to use this realtime with an audio that switches between English, Korean, and Japanese. It get's stuck on the English and doesn't language switch

Hi! I'm trying to use this realtime with an audio that switches between English, Korean, and Japanese. It get's stuck on the English and doesn't language switch

agree, I have tried talking in English then Chinese. It stops transcribing when I switch to Chinese.

Mistral AI_ org

Right, noticed the same.
Switching between latin languages works, but not between languages with different alphabets. English and French works both independently and together but English and Chinese only work when used independently.

Mistral AI_ org

To explain why the space failed, the VLLM endpoint could have roughly 50 users at the same time but the space had 80K users in a day.

There was no other way than switch to API.

When a transformers.js implementation is available we'll switch back/

Any idea how possible it is to get the switching to work across alphabets? Is it just the model itself that it was trained on?

Mistral AI_ org
โ€ข
edited 26 days ago

Any idea how possible it is to get the switching to work across alphabets? Is it just the model itself that it was trained on?

I would fine-tune the model to achieve that. I'm not in the science team, but my guess is that it's attention on the text decoder. Since they work alone, but not when in the same sentence.

Mistral AI_ org

Since this seems resolved, will be closing!

pandora-s changed discussion status to closed

Sign up or log in to comment