From Google Home to Apple’s upcoming HomePod, smart assistants are popping up left, right and centre. As technology becomes more and more embedded within our lives, and our relationship with it becomes increasingly intimate, we must stop and ask some rather uncomfortable questions.
But first, we need to know where it all began. Back in 1961, the IBM shoebox was widely regarded as the first speech recognition system. It could understand digits from 0 to 9 and six words to perform equations. And all this happened 56 years ago, and 20 years prior to IBMs first personal computer in 1981.
On April 21, 1962, this was presented to the general public at the Seattle World Exposition by IBM engineer William Dersch:
Progress moved pretty slowly after this and the next major advance came in the 1970s at Carnegie Mellon University. In 1976, researcher Raj Reddy led a group to explore speech recognition further, which was funded by US Department of Defense. Together, they created four tools, a sequence of speech recognition systems: Hearsay, Dragon, Harpy, and Sphinx I/II. Hearsay-I was one of the first systems capable of continuous speech recognition and the Dragon system later developed into Dragon NaturallySpeaking, which is software still used today for dictation.
In 1990, 30 years after the IBM shoebox, Dragon Systems released the first consumer speech recognition software, for just $9000 dollars per person. Seven years later, the much-improved Dragon NaturallySpeaking arrived. The application recognized continuous speech, so you could speak, well, naturally, at about 100 words per minute. However, you had to train the program for 45 minutes, and it was still fairly expensive at $695.
Here’s a clip of the first Dragon Dictate software being used, on a show called Computer Chronicles in 1990:
What I would consider the first voice assistant, came about in 1996, which was VAL from BellSouth. VAL was a dial-in interactive voice recognition system that was supposed to give you information based on what you said on the phone, which paved the way for all the inaccurate voice-activated menus that would plague callers for the next 20 years and beyond.
No significant, world-changing discoveries were made in voice technologies throughout the 2000s. However, other technologies were developing rapidly, and we managed to go from the Nokia brick in 2000, to the iPhone 4 being released in 2010. The iPhone 4S was the next massive leap in voice assistant technologies, and was the first ‘assistanty’ assistant.
While there were some precursors to Siri, such as Google’s Voice Search, Siri was the first voice technology marketed as if it could replace a real personal assistant. It draws everything it knows about you from all the data on your phone and iCloud accounts to generate a contextual reply, and it responds to your voice input with personality.
It’s not just fun but funny, for example, if you tell Siri you want to hide a body, it helpfully volunteers nearby dumps and metal foundries. If you ask the meaning of life, it’ll tell you the answer’s 42.
Speech recognition went from a functional utility to entertainment.
The child is all grown up.
And that brings us right up to now. Smart assistants took the slightly unexpected turn of being little speakers all around our homes.
They’re not there just to help us google something anymore when we can’t be bothered to type, they’re here to wake us up and turn on our lights in the morning, or to reorder our coffee or washing up liquid on Amazon Prime so we don’t need to worry about these things anymore.
They’re here to manage our lives so that we don’t have to.
This blog post is an adapted version of a talk I gave at PyCon 2017. You can watch the talk here.