TwilioGPT modernizing telephone and voice systems using LLMs and NLP
By Chris Wilson
Link to Github https://github.com/jackparsons93/RAG_to_riches
This is the blog for my capstone project using sound recognition and the Twilio API along with the OpenAI API to create interactive voice assistants.
The first chatbot in my presentation uses a script called twilly.py. Twilly.py is a very primitive chatbot that uses ChatGPT to answer questions from an end user. The user calls the Twilio phone number that activates a webhook to be routed to an ngrok server that then sets the POST route to a webhook at /voice. The caller hears a voice, saying, “Hello, I am a chatbot powered by ChatGPT. What would you like to talk about today?” Then users can ask any question to ChatGPT over the phone.
While this makes using ChatGPT more accessible for those who are visually impaired or cannot use a keyboard for some other reason, it doesn’t work perfectly. The major problem with this app is that it doesn’t respond the way you would want it to when you ask it to repeat itself. Its response to that request is: “Of course, let me know what you would like for me to repeat for you.” Here is a link to the recording of twilly.py cut_no_repeat.wav.
Here is a terminal of the question to ChatGPT not repeating itself.
We fixed this problem in the next chatbot called memory.py. The Flask app uses a session variable and sets “conversation history” to an empty array so the Flask app can tell ChatGPT the conversation history. It also incorporates a method called chat_gpt_response_with_history, that appends the session history variable back to the prompt to ChatGPT. Now you can ask the app to repeat itself, and it will. Here is a link to the chat with memory.py cut_repeat.wav .
Here is a terminal of memory.py
We now move onto the next program multiple.py. It adds two features: the ability to select the speed of the transcription of the text and the ability to ask a multiple choice question. If you select slow speed, Twilio will pause for 3 seconds every 4th word. The second feature, multiple choice questions, makes a session variable for every option A-D. It then formulates its queries for ChatGPT as multiple choice questions, and ChatGPT responds with the answer.
Here is a link to the audio for a call on multiple.py multipleChoice.wav
Here is the terminal output of multiple.py
Please note that my selection for speed response was “Slow Response,” which the Chatbot mistook for croissants. This mixup indicates that the app still needs improvement. IIn the future I could prompt the end user to press 1 for fast and press 2 for slow.
Next the question comes about how to monetize the app and allow users to have personalized access. To solve this problem we look at a program called passcode.py. This program uses Twilio and requires a passcode for user access to the app over the phone. We now hear a call where I intentionally enter the incorrect passcode followed by the correct passcode of 1337. cut_passcode.wav
Below we can see a terminal of an incorrect passcode entered, followed by the correct passcode of 1337, followed by a slow response answer to the question to name the first 10 elements of the periodic table.
Now that we have a passcode working, the question comes up of how to cover computer and Twilio costs. To monetize the program, I used a Flask website that can take payments using Stripe. The website starts off asking a user to register with username and password. Then it asks the user to pay 5 dollars over Stripe to release the passcode that will enable the user to log into the phone system and ask ChatGPT a question. This program is in the folder called Flask_stripe. Its main logic is in app.py, and the views are in the template folder.
This is the screen you see on the homepage of the app in a web browser.
I registered with the username chris_wilson123 and a password of password. I then logged into the app with the username and password. This brought up the next screen shown below:.
I then use the Stripe test credit card of 4242 4242 4242 4242 which is the flask test credit card.
I then receive my passcode after paying.
I will now call the Twilio phone number to test it out. I first enter an incorrect passcode of 1234 and then a correct passcode of 4247. Here is an audio clip of the flask_stripe app: passcode_web.wav
In the program, twilio_google.py we next look at using Twilio and ChatGPT with the google search API. We entered a query into Google about the upcoming Kansas City Chiefs games. Then we ask ChatGPT to extract the data from the JSON about dates, times, and opponents. Then the user hears Twilio’s text-to-speech give a response. Here is a phone call of this app Chiefs.wav Please note the long pause giving ChatGPT time to draw upon its data..
Next we see another Twilio app, in twilio_upcoming.py. Here we are checking the Google calendar API to check upcoming events: check upcoming.wav.
Building up the calendar functionality, I used a program called better_calendar2.py. That allows us to set an upcoming event. This involves using one-shot prompting ChatGPT to let it know the format in which to set the calendar event., As we can see in the call link NEW_MEETING.wav, a new meeting with Vivian has been set for August 28th at 1pm.
In the next program called flash_lights.py, I use a Raspberry Pi and the sound recognition library to flash lights on a Raspberry Pi.lights.mp4. Please note, I have also done this with Twilio. I used it to both turn on lights and spin a chicken feeder, I did not take a video of the Twilio chicken feeder because that would have required using two phones. I did, however, also hook the Raspberry Pi up to Twitch. When people donate bits, the chicken motor spins. I did this also with Twilio but do not have a video. Here is a video of the Raspberry Pi spinning over a twitch donation: Vince spin.mp4. Here is a link to finished Chicken Feeder, twitch app https://github.com/jackparsons93/chicken_feeder/blob/main/app.py.
Next we move to some classic NLP topics, such as sentiment analysis. In a program called twilio_vader.py. I used Twilio text-to-speech to observe if the user is feeling good or not. If there are indications that the user is unhappy, I route them to an actual person. Here’s a call that simulates an unhappy customer that should be resolved with an actual human and not a phone system:transfer.wav
Future work
Where to go from here? In the future I could make my own low-level C implementation of Asterisk and potentially make my own VOIP server that features my own AI voices. To make a VOIP server. I would have to use Asterisk, a C program designed for Linux that uses PJSIP and PJSUA C libraries. Doing that may entail finding some workaround for the problem with Apple's Silicon chips that would now prevent me from compiling these libraries on ARM architecture chips.
Another thing I could consider in future iterations of this project is making my own text to speech library using the Mozilla Common Voice dataset. That will also require overcoming obstacles, specifically the dimensional mismatches on my Tensorflow code that I’ve run into when trying this before.
Chris Wilson