Parakeet-TDT-CTC 110M — CoreML

CoreML export of nvidia/parakeet-tdt_ctc-110m for on-device speech recognition on Apple Silicon via FluidAudio.

CoreML Components

File	Size	Description
`Preprocessor.mlmodelc`	207 MB	Fused mel-spectrogram + FastConformer encoder
`Decoder.mlmodelc`	7.5 MB	1-layer LSTM prediction network
`JointDecision.mlmodelc`	2.7 MB	Single-step joint network (token + duration)
`parakeet_vocab.json`	18 KB	1024-token BPE vocabulary
`config.json`	2.5 KB	Model metadata and I/O contracts

Input: 16 kHz mono audio, fixed 15-second window (240,000 samples). Output: Token IDs, probabilities, and TDT duration predictions per encoder frame.

Performance

Benchmarked with FluidAudio CLI on Apple M2 (release build):

Benchmark	WER
LibriSpeech test-clean	3.0%
RTFx (overall)	102x real-time
Peak memory	0.3 GB

NVIDIA's reference WER (greedy, GPU):

Benchmark	WER
LibriSpeech test-clean	2.4%
LibriSpeech test-other	5.2%
AMI	15.88%
Earnings-22	12.42%
GigaSpeech	10.52%
TEDLIUM-v3	4.16%

Usage with FluidAudio

# Transcribe
fluidaudiocli transcribe audio.wav --model-version tdt-ctc-110m

# Benchmark
fluidaudiocli asr-benchmark --subset test-clean --model-version tdt-ctc-110m

Models auto-download from this repo on first use. To pre-fetch:

fluidaudiocli download --model-version tdt-ctc-110m

Conversion

Exported from NeMo using mobius/models/stt/parakeet-tdt-ctc-110m/coreml/convert-tdt-coreml.py:

Preprocessor fuses mel-spectrogram extraction and the FastConformer encoder into a single CoreML model
JointDecision is the single-step variant (encoder_step + decoder_step inputs) used by FluidAudio's TDT decoder
All models exported as MLProgram (iOS 17+ / macOS 14+), float32 precision

References

Downloads last month: 48

Model tree for FluidInference/parakeet-tdt-ctc-110m-coreml

Base model

nvidia/parakeet-tdt_ctc-110m

Quantized

(3)

this model

Papers for FluidInference/parakeet-tdt-ctc-110m-coreml

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Paper • 2305.05084 • Published May 8, 2023 • 4

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

Paper • 2304.06795 • Published Apr 13, 2023 • 2