Constantin
@mnv
Creating impact — every day.
Constantin
@mnv
Creating impact — every day.
What’s under the hood?
Trained on NVIDIA’s Granary dataset: 120K hours of transcribed and pseudo-labeled English audio (LibriSpeech, Common Voice, YouTube-Commons, Librilight).
Handles punctuation, capitalization, and word-level timestamps out of the box.
Can run on systems with just 2GB of RAM, which makes it edge-device friendly—even if you’re not rocking a DGX box.
NVIDIA’s opening up the dataset post-Interspeech 2025, adding more transparency to the pipeline.