Get started with Nvidia Nemo ASR
Nvidia’s Nemo framework contains many AI workflows from generative AI to Text to speech, one of them is Automatic speech recognition.
In this tutorial we will see how to get started to Nemo ASR.
Nemo github link — https://github.com/nvidia/nemo
Nemo version — main
python version — 3.10.12
sample audio file — https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
Colab notebook — https://colab.research.google.com/drive/1ezBj4jBRGZd0qmr6hCTeeqYwov2BIv39?usp=sharing
Let’s get started with installing nemo framework
pip uninstall pyarrow
python -m pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[all]
For this tutorial we will be using Parakeet model from nvidia. You can see all the parakeet models here — https://huggingface.co/collections/nvidia/parakeet-659711f49d1469e51546e021
Specifically we will be using parakeet-rnnt-0.6b
Ok then let’s install the model and initiate it
import nemo.collections.asr as nemo_asr
asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name="nvidia/parakeet-rnnt-0.6b")
Now let’s use transcribe method to transcribe an audio file.
wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
transcriptions = asr_model.transcribe(['2086-149220-0033.wav'])
print(transcriptions)
This is it, this is how you can generate transcript for your audio file.
NOTE: Nemo ASR — Audio file requirements
1. Audio file should be mono
2. Audio file sample rate 16k hz