MLX Speech Models
Collection
Speech AI models for Apple Silicon via MLX. ASR, TTS, VAD, diarization, speaker embedding. • 39 items • Updated • 4
How to use aufklarer/PersonaPlex-7B-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir PersonaPlex-7B-MLX-8bit aufklarer/PersonaPlex-7B-MLX-8bit
How to use aufklarer/PersonaPlex-7B-MLX-8bit with Moshi:
# pip install moshi # Run the interactive web server python -m moshi.server --hf-repo "aufklarer/PersonaPlex-7B-MLX-8bit" # Then open https://localhost:8998 in your browser
# pip install moshi
import torch
from moshi.models import loaders
# Load checkpoint info from HuggingFace
checkpoint = loaders.CheckpointInfo.from_hf_repo("aufklarer/PersonaPlex-7B-MLX-8bit")
# Load the Mimi audio codec
mimi = checkpoint.get_mimi(device="cuda")
mimi.set_num_codebooks(8)
# Encode audio (24kHz, mono)
wav = torch.randn(1, 1, 24000 * 10) # [batch, channels, samples]
with torch.no_grad():
codes = mimi.encode(wav.cuda())
decoded = mimi.decode(codes)PersonaPlex 7B full-duplex speech-to-speech model converted to MLX safetensors with 8-bit quantization for Apple Silicon.
Converted from nvidia/personaplex-7b-v1 (based on Kyutai Moshi architecture).
Swift inference: soniqo/speech-swift
| Component | Architecture | Size |
|---|---|---|
| Temporal Transformer | 32-layer, 4096d, 32 heads (7B params) | ~6.5 GB (8-bit) |
| Depformer | 6-layer, 1024d, 16 heads, per-codebook weights | ~1.3 GB (8-bit) |
| Mimi Codec | SEANet encoder/decoder + 8L transformer + 16 RVQ codebooks | ~370 MB (fp16) |
| Embeddings | Text + 16 audio embeddings + output heads | ~940 MB (fp16) |
| Total | ~9.1 GB |
let model = try await PersonaPlexModel.fromPretrained(
modelId: "aufklarer/PersonaPlex-7B-MLX-8bit"
)
let response = model.respond(audio: samples, voice: .NATF0, steps: 100)
audio personaplex input.wav --model aufklarer/PersonaPlex-7B-MLX-8bit -o output.wav
| Variant | Quantization | Size | Model ID |
|---|---|---|---|
| 4-bit | 4-bit | ~4.9 GB | aufklarer/PersonaPlex-7B-MLX-4bit |
| 8-bit | 8-bit | ~9.1 GB | aufklarer/PersonaPlex-7B-MLX-8bit |
18 voice presets available: NATF0-3, NATM0-3, VARF0-4, VARM0-4
Quantized