The NVIDIA RIVA SDK is a comprehensive Speech AI system. There are a wide ranging set of features. You can use this tool to create Speech AI applications for the NVIDIA Jetson! Looky here:
Introduction
Speech is now the component of many different applications. Sometimes speech is integrated into devices, like Apple’s SIRI or Google. Speech may be also be placed in devices like the Amazon Alexa. These devices work in much the same way. First there is a wake-up word processed locally, like “Hey Siri!”. Subsequent voice commands are round tripped to a server. The server processes the voice commands (Automatic Speech Recognition or ASR) then returns a response.
The comprehensive Speech AI toolkit from NVIDIA, named RIVA, can handle this scenario. In addition, RIVA can build applications where this is all handled on a local device such as a NVIDIA Jetson.
RIVA is a comprehensive library with includes:
- Automated speech recognition (ASR)
- Text-to-Speech synthesis (TTS)
- Neural Machine Translation (NMT) (language to language translation, for example English to Spanish)
- A collection of natural language processing (NLP) services, such as named entity recognition (NER), punctuation, and intent classification.
RIVA runs on the Jetson Orin and Xavier family of processors, running JetPack 5 and above. In the video, we are using a Jetson Orin Nano Developer Kit, and a Logitech Headset with Microphone.
Installation
Generally we don’t cover walk through installations. However, this is challenging enough that it’s worth the article. RIVA is currently in beta for the Jetsons (denoted as ARM64 or embedded in several places in the NVIDIA documentation). You may find that some of the directions change as time goes on.
With that said, this might be a little tough if you’re a beginner. We’ll assume that you are following along with the video.
RIVA Quick Start Guide
You will need to follow the RIVA Quick Start Guide. You should be able to follow along, starting in the Embedded section. You will need to have access to NVIDIA NGC. NVIDIA NGC is the warehouse for NVIDIA AI. NGC requires a free developer account. NVIDIA has a couple of videos on setting up your account and getting a developer key:
Then it’s a matter of (mostly) following the instructions. Modify the Docker daemon file:
$ sudo gedit /etc/docker/daemon.json
Then add the line:
“default-runtime”: “nvidia”
at the end of the file, before the ending } character. Remember to add a comma after the preceeding entry. The restart the Docker daemon.
$ sudo systemctl restart docker
Next, add the user to the Docker group. This makes working with permissions easier.
sudo usermod -aG docker $USER
newgrp docker
Then it’s time to install the NGC command line tool.
$ wget --content-disposition https://ngc.nvidia.com/downloads/ngccli_arm64.zip && unzip ngccli_arm64.zip && chmod u+x ngc-cli/ngc
$ find ngc-cli/ -type f -exec md5sum {} + | LC_ALL=C sort | md5sum -c ngc-cli.md5
$ echo “export PATH=\”\$PATH:$(pwd)/ngc-cli\”” >> ~/.bash_profile && source ~/.bash_profile
$ ngc config set
Download RIVA Quickstart
Now we are ready to download RIVA quickstart, then RIVA and the models. The NGC cli tool is ngc.
$ ngc registry resource download-version nvidia/riva/riva_quickstart_arm64:2.12.0
Switch over to the riva_quickstart directory and modify config.sh to meet your needs. Once you finish that, you are ready to initialize the server and download models. Note that you need to use sudo here, which is different from the documentation:
$ sudo bash riva_init.sh
To start the RIVA server in a Docker container, there is a convenience script:
$ bash riva_start.sh
Install RIVA Python Client
The RIVA Python Client is on Github. Before you start the install, make sure you have pip installed. It is named python3-pip in the Ubuntu repository.
You will need to also install testresources and the portaudio libraries. Then add the user to the associated groups:
$ pip3 install testresources
$ sudo apt install portaudio19-dev
$ pip3 install pyaudio
$ sudo adduser $USER audio
$ sudo adduser $USER pulse-access
$ newgrp pulse-access
Then install the python-clients repository. Follow the directions in the README file. Here’s the one sequence we followed in the video for reference. Make sure you are in the top level directory before doing this.
$ git clone https://github.com/nvidia-riva/python-clients.git
$ cd python-clients
$ git submodule init
$ git submodule update --remote --recursive
$ pip install -r requirements.txt
$ python3 setup.py bdist_wheel
$ pip install --force-reinstall dist/*.whl
$ pip install nvidia-riva-client
You should be good to go at this point. There are several examples in the python-clients/examples directory. There are also Jupyter notebooks for some other examples.
Notes
RIVA and the models can use a lot of memory. In the demo in the video, we were using ~ 6.6GB. Before deploying to a Jetson, you may want to develop on a PC or a Jetson with a larger amount of memory to facilitate faster development.
The post Speech AI on NVIDIA Jetson Tutorial appeared first on JetsonHacks.