Sitemap

Get started with Nvidia Nemo ASR

mlapi
1 min readJul 29, 2024

Nvidia’s Nemo framework contains many AI workflows from generative AI to Text to speech, one of them is Automatic speech recognition.

Photo by Mariia Shalabaieva on Unsplash

In this tutorial we will see how to get started to Nemo ASR.

Nemo github link — https://github.com/nvidia/nemo
Nemo version — main
python version — 3.10.12
sample audio file — https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
Colab notebook — https://colab.research.google.com/drive/1ezBj4jBRGZd0qmr6hCTeeqYwov2BIv39?usp=sharing

Let’s get started with installing nemo framework

pip uninstall pyarrow
python -m pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[all]

For this tutorial we will be using Parakeet model from nvidia. You can see all the parakeet models here — https://huggingface.co/collections/nvidia/parakeet-659711f49d1469e51546e021

Specifically we will be using parakeet-rnnt-0.6b

Ok then let’s install the model and initiate it

import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="nvidia/parakeet-rnnt-0.6b")

Now let’s use transcribe method to transcribe an audio file.

wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
transcriptions = asr_model.transcribe(['2086-149220-0033.wav'])
print(transcriptions)

This is it, this is how you can generate transcript for your audio file.

NOTE: Nemo ASR — Audio file requirements
1. Audio file should be mono
2. Audio file sample rate 16k hz

--

--

mlapi
mlapi

No responses yet