Spaces:

mistralai
/

Voxtral-Mini-Realtime

Running

App Files Files Community

Demo not working

by xezpeleta - opened about 1 month ago

Discussion

xezpeleta

about 1 month ago

•

edited about 1 month ago

Hi! The microphone is working (i see the audio wave) but there isn't any transcription generated when I speak.

This is the error I see in the browser console:

/api/spaces/by-subdomain/mistralai-voxtral-mini-realtime:1  Failed to load resource: the server responded with a status of 400 ()

Can be due to the server overload?

Jofthomas

Mistral AI_ org about 1 month ago

Yes indeed, we deployed the server with VLLM and it's getting a lot of requests. More than it can handle. Will switch to local processing in the future when possible.

davidaparicio

30 days ago

I have this error, with my extensions + latest firefox version:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://de5282c3ca0c.edge.sdk.awswaf.com/de5282c3ca0c/526cf06acb0d/report. (Reason: CORS request did not succeed). Status code: (null).
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://de5282c3ca0c.edge.sdk.awswaf.com/de5282c3ca0c/526cf06acb0d/telemetry. (Reason: CORS request did not succeed). Status code: (null).

Without, it's still not working..
On another browser (Safari latest version), not working also 😥

The safari console:
[Error] Failed to load resource: the server responded with a status of 400 () (mistralai-voxtral-mini-realtime, line 0)
Blocked a frame with origin "https://huggingface.co" from accessing a frame with origin "https://mistralai-voxtral-mini-realtime.hf.space". Protocols, domains, and ports must match.

aswordlion

29 days ago

Yes indeed, we deployed the server with VLLM and it's getting a lot of requests. More than it can handle. Will switch to local processing in the future when possible.

You should add AVD and debounce ms options for optional latency. I see demo send data every second even when I don't speak.

Jofthomas

Mistral AI_ org 29 days ago

A WebGPU version will be coming once the implementation is done in transformers/transformers.js

In the meantime, went back to API.

joiemoie

29 days ago

Hi! I'm trying to use this realtime with an audio that switches between English, Korean, and Japanese. It get's stuck on the English and doesn't language switch

jmtjmt

28 days ago

Hi! I'm trying to use this realtime with an audio that switches between English, Korean, and Japanese. It get's stuck on the English and doesn't language switch

agree, I have tried talking in English then Chinese. It stops transcribing when I switch to Chinese.

Jofthomas

Mistral AI_ org 26 days ago

Right, noticed the same.
Switching between latin languages works, but not between languages with different alphabets. English and French works both independently and together but English and Chinese only work when used independently.

Jofthomas

Mistral AI_ org 26 days ago

To explain why the space failed, the VLLM endpoint could have roughly 50 users at the same time but the space had 80K users in a day.

There was no other way than switch to API.

When a transformers.js implementation is available we'll switch back/

joiemoie

26 days ago

Any idea how possible it is to get the switching to work across alphabets? Is it just the model itself that it was trained on?

Jofthomas

Mistral AI_ org 26 days ago

•

edited 26 days ago

Any idea how possible it is to get the switching to work across alphabets? Is it just the model itself that it was trained on?

I would fine-tune the model to achieve that. I'm not in the science team, but my guess is that it's attention on the text decoder. Since they work alone, but not when in the same sentence.

pandora-s

Mistral AI_ org 17 days ago

Since this seems resolved, will be closing!

pandora-s changed discussion status to closed 17 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment