The rise of automation, along with increased computational power, novel application of statistical algorithms, and improved accessibility to data, have resulted in the birth of the personal digital assistant market, popularly represented by Apple’s Siri, Microsoft’s Cortana, Google’s Google Assistant, and Amazon’s Alexa.
While each assistant may specialize in slightly different tasks, they all seek to make the user’s life easier through verbal interactions so you don’t have to search out a keyboard to find answers to questions like “What’s the weather today?” or “Where is Switzerland?”. Despite the inherent “cool” factor that comes with using a digital assistant, you may find that the aforementioned digital assistants don’t cater to your specific needs. Fortunately, it’s relatively easy to build your own.
This tutorial will walk you through the basics of building your own digital virtual assistant in Python, complete with voice activation plus response to a few basic inquiries. From there, you can customize it to perform whatever tasks you need most.
Installing Python
To follow along with the code in this tutorial, you’ll need to have a recent version of Python installed. I’ll be using ActivePython, for which you have two choices:
Download and install the pre-built Virtual Assistant runtime environment (including Python 3.6) for Windows 10 or CentOS 7, or
Build your own custom Python runtime with just the packages you’ll need for this project, by creating a free ActiveState Platform account, after which you will see the following image:
Click the Get Started button and choose Python 3.6 and the OS you’re working in. In addition to the standard packages included in ActivePython, we’ll need to add a few third party packages, including something that can do speech recognition, convert text to speech and playback audio:
- Speech Recognition Package – when you voice a question, we’ll need something that can capture it. The SpeechRecognition package allows Python to access audio from your machine’s microphone, transcribe audio, save audio to an audio file, and other similar tasks.
- Text to Speech Package – our assistant will need to convert your voiced question to a text one. And then, once the assistant looks up an answer online, it will need to convert the response into a voiceable phrase. For this purpose, we’ll use the gTTS package (Google Text-to-Speech). This package interfaces with Google Translate’s API. More information can be found here.
- Audio Playback Package – All that’s left is to give voice to the answer. The mpyg321 package allows for Python to play MP3 files.
- Google Assistant and Siri, which focus primarily on helping users with non-work related tasks like leisure and fitness.
- Cortana, which focuses on work efficiency.
- Alexa, which is more concerned with retail.
Once the runtime builds, you can download the State Tool and use it to install your runtime into a virtual environment.
And that’s it! You now have Python installed, as well as everything you need to build the sample application. In doing so, ActiveState takes the (sometimes frustrating) environment setup and dependency resolution portion out of your hands, allowing you to focus on actual development.
All the code used in this tutorial can be found in my Github repo.
All set? Let’s go.
Digital Assistant Voice Input
The first step in creating your own personal digital assistant is establishing voice communication. We’ll create two functions using the libraries we just installed: one for listening and another for responding. Before we do so, let’s import the libraries we installed, along with a few of the standard Python libraries:
import speech_recognition as sr from time import ctime import time import os from gtts import gTTS import requests, json
Now let’s define a function called listen. This uses the SpeechRecognition library to activate your machine’s microphone, and then converts the audio to text in the form of a string. I find it reassuring to print out a statement when the microphone has been activated, as well as the stated text that the microphone hears, so we know it’s working properly. I also include conditionals to cover common errors that may occur if there’s too much background noise, or if the request to the Google Cloud Speech API fails.
def listen(): r = sr.Recognizer() with sr.Microphone() as source: print("I am listening...") audio = r.listen(source) data = "" try: data = r.recognize_google(audio) print("You said: " + data) except sr.UnknownValueError: print("Google Speech Recognition did not understand audio") except sr.RequestError as e: print("Request Failed; {0}".format(e)) return data
For the voice response, we’ll use the gTTS library. We’ll define a function respond that takes a string input, prints it, then converts the string to an audio file. This audio file is saved to the local directory and then played by your operating system.
def respond(audioString): print(audioString) tts = gTTS(text=audioString, lang='en') tts.save("speech.mp3") os.system("mpg321 speech.mp3")
The listen and respond functions establish the most important aspects of a digital virtual assistant: the verbal interaction. Now that we’ve got the basic building blocks in place, we can build our digital assistant and add in some basic features.
Digital Assistant Voiced Responses
To construct our digital assistant, we’ll define another function called digital_assistant and provide it with a couple of basic responses:
def digital_assistant(data): if "how are you" in data: listening = True respond("I am well") if "what time is it" in data: listening = True respond(ctime()) if "stop listening" in data: listening = False print('Listening stopped') return listening return listening
This function takes whatever phrase the listen function outputs as an input, and checks what was said. We can use a series of if statements to understand the voice query and output the appropriate response. To make our assistant seem more human, the first thing we’ll add is a response to the question “How are you?” Feel free to change the response to your liking.
The second basic feature included is the ability to respond with the current time. This is done with the ctime function from the time package.
I also build in a “stop listening” command to terminate the digital assistant. The listening variable is a Boolean that is set to True when the digital assistant is active, and False when not. To test it out, we can write the following Python script, which includes all the previously defined functions and imported packages:
time.sleep(2) respond("Hi Dante, what can I do for you?") listening = True while listening == True: data = listen() listening = digital_assistant(data)
Save the script as digital_assistant.py. Before we run the script via the command prompt, let’s check that ActiveState Python is running correctly by entering the following on the command line:
$ Python3.6
If ActivePython installed correctly, you should obtain an output that looks like this:
ActivePython 3.6.6.3606 (ActiveState Software Inc.) based on Python 3.6.6 (default, Dec 19 2018, 08:04:03) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information.
Note that if you have other versions of Python already installed on your machine, ActivePython may not be the default version. For instructions on how to make it the default for your operating system, ActiveState provides procedures here.
With ActivePython now the default version, we can run the script at the command prompt using:
$ Python3.6 digital_assistant.py
You should see and hear the output:
Hi Dante, what can I do for you? I am listening...
Now you can respond with one of the three possibilities we defined in the digital_assistant function, and it will respond appropriately. Cool, right?
How To Create A Digital Assistant Google Maps Query
I find myself frequently wondering where a certain city or country is with respect to the rest of the world. Typically this means I open a new tab in my browser and search for it on Google Maps. Naturally, if my new digital assistant could do this for me, it would save me the trouble.
To implement this feature, we’ll add a new if statement into our digital_assistant function:
def digital_assistant(data): if "how are you" in data: listening = True respond("I am well") if "what time is it" in data: listening = True respond(ctime()) if "where is" in data: listening = True data = data.split(" ") location_url = "https://www.google.com/maps/place/" + str(data[2]) respond("Hold on Dante, I will show you where " + data[2] + " is.") maps_arg = '/usr/bin/open -a "/Applications/Google Chrome.app" ' + location_url os.system(maps_arg) if "stop listening" in data: listening = False print('Listening stopped') return listening return listening
The new if statement picks up if you say “where is” in the voice query, and appends the next word to a Google Maps URL. The assistant replies, and a command is issued to the operating system to open Chrome with the given URL. Google Maps will open in your Chrome browser and display the city or country you inquired about. If you have a different web browser, or your applications are in a different location, adapt the command string accordingly.
How To Create A Digital Assistant Weather Query
If you live in a place where the weather can change on a dime, you may find yourself searching for the weather every morning to ensure that you are adequately equipped before leaving the house. This can eat up significant time in the morning, especially if you do it every day, when the time can be better spent taking care of other things.
To implement this within our digital assistant, we’ll add another if statement that recognizes the phrase “What is the weather in..?”
def digital_assistant(data): global listening if "how are you" in data: listening = True respond("I am well") if "what time is it" in data: listening = True respond(ctime()) if "where is" in data: listening = True data = data.split(" ") location_url = "https://www.google.com/maps/place/" + str(data[2]) respond("Hold on Dante, I will show you where " + data[2] + " is.") maps_arg = '/usr/bin/open -a "/Applications/Google Chrome.app" ' + location_url os.system(maps_arg) if "what is the weather in" in data: listening = True api_key = "Your_API_key" weather_url = "http://api.openweathermap.org/data/2.5/weather?" data = data.split(" ") location = str(data[5]) url = weather_url + "appid=" + api_key + "&q=" + location js = requests.get(url).json() if js["cod"] != "404": weather = js["main"] temp = weather["temp"] hum = weather["humidity"] desc = js["weather"][0]["description"] resp_string = " The temperature in Kelvin is " + str(temp) + " The humidity is " + str(hum) + " and The weather description is "+ str(desc) respond(resp_string) else: respond("City Not Found") if "stop listening" in data: listening = False print('Listening stopped') return listening time.sleep(2) respond("Hi Dante, what can I do for you?") listening = True while listening == True: data = listen() listening = digital_assistant(data)
For the weather query to function, it needs a valid API key to obtain the weather data. To get one, go here and then replace Your_API_key with the actual value. Once we concatenate the URL string, we’ll use the requests package to connect with the OpenWeather API. This allows Python to obtain the weather data for the input city, and after some parsing, extract the relevant information.
Conclusions
There is a myriad of digital assistants currently on the market, including:
With modest expectations in mind, each does its job relatively well. If you require more specificity, designing your own digital assistant is far from a pipe dream. Recent advances in speech recognition and converting text to speech make it viable even for hobbyists. And working in Python greatly simplifies the task, giving you the ability to make any number of customization to tailor your assistant to your needs.