Demo not working
Hi! The microphone is working (i see the audio wave) but there isn't any transcription generated when I speak.
This is the error I see in the browser console:
/api/spaces/by-subdomain/mistralai-voxtral-mini-realtime:1 Failed to load resource: the server responded with a status of 400 ()
Can be due to the server overload?
Yes indeed, we deployed the server with VLLM and it's getting a lot of requests. More than it can handle. Will switch to local processing in the future when possible.
I have this error, with my extensions + latest firefox version:
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://de5282c3ca0c.edge.sdk.awswaf.com/de5282c3ca0c/526cf06acb0d/report. (Reason: CORS request did not succeed). Status code: (null).
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://de5282c3ca0c.edge.sdk.awswaf.com/de5282c3ca0c/526cf06acb0d/telemetry. (Reason: CORS request did not succeed). Status code: (null).
Without, it's still not working..
On another browser (Safari latest version), not working also ๐ฅ
The safari console:
[Error] Failed to load resource: the server responded with a status of 400 () (mistralai-voxtral-mini-realtime, line 0)
Blocked a frame with origin "https://huggingface.co" from accessing a frame with origin "https://mistralai-voxtral-mini-realtime.hf.space". Protocols, domains, and ports must match.
Yes indeed, we deployed the server with VLLM and it's getting a lot of requests. More than it can handle. Will switch to local processing in the future when possible.
You should add AVD and debounce ms options for optional latency. I see demo send data every second even when I don't speak.
A WebGPU version will be coming once the implementation is done in transformers/transformers.js
In the meantime, went back to API.
Hi! I'm trying to use this realtime with an audio that switches between English, Korean, and Japanese. It get's stuck on the English and doesn't language switch
Hi! I'm trying to use this realtime with an audio that switches between English, Korean, and Japanese. It get's stuck on the English and doesn't language switch
agree, I have tried talking in English then Chinese. It stops transcribing when I switch to Chinese.
Right, noticed the same.
Switching between latin languages works, but not between languages with different alphabets. English and French works both independently and together but English and Chinese only work when used independently.
To explain why the space failed, the VLLM endpoint could have roughly 50 users at the same time but the space had 80K users in a day.
There was no other way than switch to API.
When a transformers.js implementation is available we'll switch back/
Any idea how possible it is to get the switching to work across alphabets? Is it just the model itself that it was trained on?
Any idea how possible it is to get the switching to work across alphabets? Is it just the model itself that it was trained on?
I would fine-tune the model to achieve that. I'm not in the science team, but my guess is that it's attention on the text decoder. Since they work alone, but not when in the same sentence.
Since this seems resolved, will be closing!
