Hardware Used:
A “Car-puter” – contains a mini-itx motherboard and a power supply that can run on 12 volts.
The “Car-puter” is in a hardened case and the CPU chip is soldered on the board for robustness.
The motherboard contains an Intel, ATOM CPU D2500 1.86GHz 2 Cores, 4096 MB Ram [#1]
External software installed:
Windows 7, MS-SQL server, Visual Studio development environment, Nuance Dragon Speech Recognition [#2]
How it works - Process flow:
1. When you speak into the microphone, Dragon converts it to text and inserts the text into a text box on the UI form.
2. When the chat-bot program detects the new text, it cleans the text by removing some punctuation characters (commas are kept, apostrophes are converted to ASCII 96 (Hex 60). The text is basically a string of words.
3. The database is queried to see if the string of words is a command. Commands will execute separate code, to complete the task. i.e., “What time is it?” is a command that gets the time from the system clock, converts the time into words, and sends the words to the TTS system. If the input is NOT a command, the process continues with step 4 below.
4. The string of words is “Tokenized” (divided into individual words, and stored in an array)
5. A list of N-Grams [#3] is created. If the input is 4 words, this will generate 10 N-Grams [#4]
6. Each word is also looked up in a data table to find “Input Extra Words” The “Extra Words” handles synonyms and word classifications. If the input contains the word “Mother” the extra words would contain synonyms like “Ma” and “Mom” and a classification like “female parent” Another example would be for words like “red, blue, green, yellow” all of these contain the classification “color”
7. All N-Grams and “input extra words” are used to build SQL queries that retrieves data rows from a table of potential replies. This builds a table containing a subset of all potential replies where the potential replies contain words that are in the original input. Some of the words in the N-Grams and “extra words” are “Stemmed” [#5] before the SQL queries are built.
8. The table containing the potential replies, related to our initial input, must now be scanned [#7], and various properties of each reply are awarded points. Each point is multiplied by an adjustable amount, and then the “Adjusted Points” are totaled. The table is then sorted by the total points, the potential reply with the highest number of points is sent to the Text-To-Speech code. My robot can use either Windows TTS or Dragon TTS. Each potential reply also has a list of “Output Extra Words,” that are generated when the bot is “Dreaming” [#6]
9. The properties of the “input words” and each “potential reply” that are used to award the points are as follows:
A. The longest word in the input is also in the potential reply
B. The number of words in the matching N-Gram
C. The number of words in the input that are also in the potential reply (or output)
D. The number of words in the input that are also in the “output extra words” (The “output extra words” is the same as the “Potential reply extra words”)
E. The number of words in the potential reply that are also in the “input extra words”
F. The number of words in the “input extra words” that are also in the “output extra words”
G. The “Last Spoken Point Penalty” generates NEGATIVE points, for having been spoken recently. This causes the bot to choose a different output, if one is available, to prevent the bot from repeating itself. When a potential reply is chosen, and spoken, the reply’s row in the potential reply table is updated with a “last spoken” timestamp. When points are being awarded to each potential reply (Step 7, above), the NEGATIVE points are subtracted from the point total, which lowers the chance that it will be spoken again.
10. The potential reply with the highest number of points is sent to the Text-To-Speech code and spoken.
====
The “Common Output Table”:
The input “Hello” is a command, and it is mapped to a function named “greeting.” The “greeting” function looks up a list of common phrases that could be used as a reply to “Hello”
Sample Common Output Table:
Row Id
Phrase Type
Output
1
Greeting
Hi
2
Greeting
Howdy
3
Greeting
Hello
45
Valediction
Bye bye
56
Valediction
Good Bye
99
Valediction
See you later
101
Sorry
I'm sorry
101
Sorry
I apoligize
A typical SQL query would look like “Select * from ComonOutputTable where PhraseType = ‘Greeting’”, and this produces a numbered list of “Hi, Howdy, Hello”
Most programming languages contain a “Random Number Generator” (RNG) function, however there is nothing to prevent the RNG from generating the same number multiple times, i.e., 0,3,2,2,2,0,1 (2 is repeated), so I built two “enhanced” random functions. “RandomNoRepeat()” and “CardDeck()”
RandomNoRepeat() stores its random number in a global variable. If the same number is picked again, a different random number is picked.
CardDeck() generates a list of any number of values(cards). This list is randomized (shuffled) and stored in a global variable. When a random number is needed, the top card is pulled from the deck, and its value is used. After all cards have been drawn, the card deck is re-shuffled. One of these functions can be used to pick an appropriate reply from the common output table, without repeating the output.
The bot can be put into “Learn Mode” where spoken input is stored as a potential reply. Thus, the chat-bot can learn new things simply by being spoken to.
The application contains forms for maintaining the Potential Replies table, the Common Output table, and several other tables.
Notes:
#1) The system could run faster with a faster CPU, with additional CPU Cores.
#2) Win7 did not come with speech recognition. Dragon could be eliminated if Win 10 is used.
#3) An N-Gram is a group of words; with N signifying how many words are in the group. i.e., A 3-gram is a grouping of 3 words, a 2-gram contains 2 words, and a 1-gram is a single word
#4) With the input of 4 words (“My dog eats steak”) the following 10 N-Grams are generated:
• “My dog eats steak”
• “My dog eats”
• “My dog”
• “My”
• “dog eats steak”
• “dog eats”
• “dog”
• “eats steak”
• “eats”
• “steak”
The more words there are in the input, the more N-Grams are generated, which means more SQL queries, which means more potential replies to scan through, which means SLOWER response times.
#5) “Stemming” is the process of converting a word containing a suffix, to its root form (by removing the suffix) i.e., “Walking” and “Walked” are converted to “Walk”. However, some words are more complicated, such as “Ponies” must be stemmed to “pony” and “Ate” must be stemmed to “Eat”, “Ran” must be stemmed to “Run”
#6) A separate program named “Dream” is run when the bot is offline. (Sleeping?) This program generates the “Output Extra Words” for each potential reply. Every time the bot learns a new word OR a new potential reply, can affect ALL OTHER POTENTIAL REPLIES. This is done offline instead of at “run time” to improve response time.
#7) The scanning takes place in a VB.Net table, within the UI, written in VB.Net and all of the code used to award points is written in VB.Net. It may be possible to convert
this code into SQL code to return a response in less time.