Table of contents
Kaldi is an extremely powerful NLP framework that allows for Automatic Speech Recognition, Speaker Diarization, and more; however, the Kaldi install process can be quite intimidating to first-time users. We've made the Kaldi install procedure as simple as possible so you can get started modeling ASAP! Let's dive in.
Time and Space
The most notable requirement is time and space. The Kaldi install takes over 40 GB of space and can take many hours to install, so prepare accordingly! If you don't have the time or space to install Kaldi and instead want to get started immediately with a ready-to-go solution, check out the Cloud Speech-to-Text APIs section.
Unix-Like Operating System
Kaldi is not supported on Windows. If you are using Windows, we recommend that you install VMware and create a Debian-based virtual machine.
Automatic Kaldi Install
We've made the Kaldi install a breeze by providing you with an automated installation script. If you are comfortable with an automatic installation, then you can follow along here. If you'd prefer to perform the installation manually, then you can jump down to the next section - Manual Kaldi Install.
1) Install necessary packages
First, you'll need to make sure you have
wget comes natively installed on most Linux distributions, but you may need to open a terminal and install
2) Fetch installation script
3) Run installation script
Open the installation script in a text editor and examine its contents to ensure that you are comfortable running it, then initiate the automatic installation procedure with the following terminal command:
If you have multiple CPUs, you can perform a parallel build by supplying the number of processors you would like to use. For example, to use 4 CPUs, enter
sudo bash setup.sh 4
Running the above command will install all of Kaldi's dependencies, and then Kaldi itself. You will be required to confirm that all dependencies are installed at one point (several minutes into the installation). We suggest checking and confirming, but if you are following along on a fresh Ubuntu 20.04.03 LTS install (perhaps on a virtual machine), then you can skip the need to confirm by instead running
In this case, you do not need to interact with the terminal at all during installation. The installation will likely take several hours, so you can leave and come back to your computer when the installation is complete.
Jump down to the Getting Started with Kaldi section to learn how to start working with your freshly installed copy of Kaldi!
Manual Kaldi Install
1) Install necessary packages
Before manually installing Kaldi, we’ll need to install some additional packages. First, open a terminal, and run the following commands:
- You can copy these commands and paste them into the terminal by right clicking in terminal and selecting “Paste”.
- We’ll also need Intel MKL, which we will install later via Kaldi if you do not have it already.
2) Clone the Kaldi repo
Next, we need to clone the Kaldi repository so we can install Kaldi itself. In the terminal, navigate to the directory in which you’d like to clone the repository. In this case, we are cloning to the Home directory.
Run the following command:
3) Install tools
To begin installing Kaldi from the cloned repo, we’ll first need to perform the
tools installation. Navigate into the
tools directory with the following command:
and then install Intel MKL if you don’t already have it. This will take time - MKL is a large library.
Now we check to ensure all dependencies are installed. Given our preparatory installations, you should get a message telling you that all dependencies are indeed installed.
If you do not have all dependencies installed, you will get an output telling you which dependencies are missing. Install any remaining packages you need, and then rerun the
extras/check_dependencies.sh command. New required installations may now appear as a result of the dependencies you just installed. Continue alternating between these two steps (checking missing dependencies and installing them) until you receive a message saying that all dependencies are installed ("all OK.").
make. See the install note below if you have a multi-CPU build.
If you have multiple CPUs, you can do a parallel build by supplying the "-j" option to
make in order to expedite the install. For example, to use 4 CPUs, enter
make -j 4
4) Install src
Next, we need to perform the
src install. First,
And then run the following commands. See the install note below if you have a multi-CPU build. This build may take several hours for uniprocessor systems.
Again, you can supply the
-j option to both
make depend and
make if you have multiple CPUs in order to expedite the install. For example, to use 4 CPUs, enter
make depend -j 4 and
make -j 4
After this, Kaldi will have been successfully installed on your machine! You can move on to the next section to learn how to get started working with your freshly installed copy of Kaldi.
Getting Started with Kaldi
By now you've successfully installed Kaldi and are probably itching to dive in! Kaldi is a very complicated framework, so it has a bit of a learning curve. Check out our Kaldi Speech Recognition for Beginners tutorial to get started! In it, we learn how to do Automatic Speech Recognition (ASR) using pre-trained models on the Gettysburg Address! You can see the results below:
FOUR SCORE AND SEVEN YEARS AGO OUR FATHERS BROUGHT FORTH ON THIS CONTINENT, A NEW NATION, CONCEIVED IN LIBERTY, AND DEDICATED TO THE PROPOSITION THAT ALL MEN ARE CREATED EQUAL
FOUR SCORE AN SEVEN YEARS AGO OUR FATHERS BROUGHT FORTH UND IS CONTINENT A NEW NATION CONCEIVED A LIBERTY A DEDICATED TO THE PROPOSITION THAT ALL MEN ARE CREATED EQUAL
Additionally, you can check out the Kaldi documentation here for more information on using Kaldi.
Cloud Speech-to-Text APIs
If the Kaldi installation procedure seems too laborious, or if you don't want to learn how to build and train models in Kaldi and would prefer a ready-to-go Speech-to-Text solution, you can check out options for cloud Speech-to-Text APIs, which make it easy to send in an audio file and get its transcription.
Beyond transcription, AssemblyAI's API provides Audio Intelligence features like Summarization, Sentiment Analysis, Entity Detection, and more. Grab a token and get started for free below, or learn more about the API in the docs.
The exact ISO we are looking for is ubuntu-20.04.3-desktop-amd64.iso. Either use this direct download link, or find it on the releases page (also linked above). After downloading, check its hash by running
certutil -hashfile ubuntu-20.04.3-desktop-amd64.iso md5 in the command prompt in the directory where the ISO file is stored. The result should be d14cb9b6f48feda0563cda7b5335e4c0.