Skcript Technologies Private Limited

Voice recognition in UiPath - Explained

Voice recognition has made many operations easy for us so our RPA team had an idea to implement voice recognition in Uipath and we succeeded in doing it! Read to know-how.


Implementing voice recognition to your bot will make it more interactive for the users. We can use voice recognition in both attended and unattended bots. In attended bots, we can use voice recognition to get the user’s input. In unattended bots, we can use voice recognition for alert messages whenever an error or exception occurs. For example, if the null exception arises, we can use text to speech saying that value is null. Voice recognition can serve many use cases like self-help kiosks, ticket generation, etc.

Uipath can perform various voice recognition activities. When I started implementing voice recognition to one of my bots I faced the following issues:

  1. Uipath has released an official google speech package in 2018. You can find this package on the below link.

    First I tried with the official Google package. It installed two activities speech to text and text to speech. It has higher accuracy and is very easy to implement. But when I used it for speech to text activity, a dialog box popped up with a start and stop button every time I ran the bot. I could neither remove those two buttons nor rename it and it may be annoying for the clients. Since the package got released last year there was no documentation or videos on how to change the buttons or rename it. Many users mentioned in reviews about removing those buttons but there was no response from the Uipath developers. So I had to find another way to use voice recognition.

  2. Finally, I wrote a python script and invoked it in my workflow. It uses the Google speech engine, and it had a very good accuracy. By invoking the python code we can configure it according to our requirements. We can get rid of those buttons using python code. We can also modify the code whenever we need. So we can use python code rather than using any inbuilt packages. If you are ok with start and stop button, then you can prefer a Google speech package.


In this case study, we will see how to implement voice recognition with python code and make it more interactive attended robot. Let us see how to implement our bot to a kiosk in a restaurant drive-thru. In this kiosk, there will be a list of sandwiches. The user has to tell the food item number and the item quantity of the food they like to have. After the user tells, the robot clicks on the OK button and calculates the total price of the order placed.

Let’s see how it works.

Building up the Workflow:

Let’s first build the speech to text workflow. In order to use python code, we must install the python package to our workflow. Before installing the python package in uipath, make sure to install python and then install pyaudio and speech recognizer to python.

We can install pyaudio and Speech recognizer by using the following commands:

Go to your terminal and use “pip install pyaudio” and “pip install SpeechRecognition”.

Now, we can install the python package in uipath. Use python scope activity where we can use other python activities. Specify the version of python you have installed in your machine and also specify the folder path where you installed python on your machine.

Then use load python script activity, which loads your python code into uipath. Give the file path of your code and the result variable is of the python object type. Then insert a beep activity in order to know that code has loaded and is ready to listen to the user’s voice or you can also have something like a message box. Now add invoke python method to your workflow, which is used to run a specified method from the python code directly in a workflow. Specify the instance and the output variable should be python object.

Now add get python object activity in order to convert the python object into a string variable.

The final workflow will look like the above image.

Create another workflow, install google speech packages and from that, we will use text to speech activity. As I said earlier this package is so efficient but the only thing is that we can’t remove the start and stop button while using speech to text activity. Add text to speech activity and specify the language, service account file and the text we need to convert into voice.

Then invoke python sequence using the invoke method.

Now the food item number will be captured from the user. We can use the item number to match the food name using switch activity. In switch activity, assign price with respect to its food number.

Then add another text to speech activity, to notify users to tell the item quantity. Then invoke the speech to text activity to capture the quantity. If the quantity is 2 then the assigned price will be multiplied with respect to the item quantity, thus we will get the total price. All these operations are done using switch activity and this is how the switch activity looks like.

Go to the main workflow, add another text to speech asking “do you want to continue”. If the user says yes, then you have to do the whole process again. If the user says no, then you have to say Bye. Bye.

The final workflow will look like the above image.


From this case study, you can make interactive bots with voice recognition.

You can see the bot in action by clicking the magic link below:

Contact us now

Trusted by the best

Skcript x AWS Portfolio
Skcript x BOSCH Portfolio
Skcript x IBM Portfolio
Skcript x IBC Bank Portfolio
Skcript x Jet Airways Portfolio

+112 more

Skcript Technologies Private Limited

Book a free consultation

Book a time with our consultants to discuss your project and get a free quote. No strings attached.