PersonaPlex-7B MLX 8-bit

PersonaPlex 7B full-duplex speech-to-speech model converted to MLX safetensors with 8-bit quantization for Apple Silicon.

Converted from nvidia/personaplex-7b-v1 (based on Kyutai Moshi architecture).

Swift inference: soniqo/speech-swift

Model Details

Component	Architecture	Size
Temporal Transformer	32-layer, 4096d, 32 heads (7B params)	~6.5 GB (8-bit)
Depformer	6-layer, 1024d, 16 heads, per-codebook weights	~1.3 GB (8-bit)
Mimi Codec	SEANet encoder/decoder + 8L transformer + 16 RVQ codebooks	~370 MB (fp16)
Embeddings	Text + 16 audio embeddings + output heads	~940 MB (fp16)
Total		~9.1 GB

Usage

let model = try await PersonaPlexModel.fromPretrained(
    modelId: "aufklarer/PersonaPlex-7B-MLX-8bit"
)
let response = model.respond(audio: samples, voice: .NATF0, steps: 100)

audio personaplex input.wav --model aufklarer/PersonaPlex-7B-MLX-8bit -o output.wav

Variants

Variant	Quantization	Size	Model ID
4-bit	4-bit	~4.9 GB	aufklarer/PersonaPlex-7B-MLX-4bit
8-bit	8-bit	~9.1 GB	aufklarer/PersonaPlex-7B-MLX-8bit

Voices

18 voice presets available: NATF0-3, NATM0-3, VARF0-4, VARM0-4

Downloads last month: 305

MLX

Hardware compatibility

Quantized

Model tree for aufklarer/PersonaPlex-7B-MLX-8bit

Base model

kyutai/moshiko-pytorch-bf16

Finetuned

nvidia/personaplex-7b-v1

Finetuned

(37)

this model

Collection including aufklarer/PersonaPlex-7B-MLX-8bit

MLX Speech Models

Collection

Speech AI models for Apple Silicon via MLX. ASR, TTS, VAD, diarization, speaker embedding. • 39 items • Updated 4 days ago • 4

Paper for aufklarer/PersonaPlex-7B-MLX-8bit

What it takes to solve the Hubble tension through scale-dependent modifications of the primordial power spectrum

Paper • 2504.07966 • Published Apr 10, 2025