To get the free app, enter your email address or mobile phone number.
The Voice in the Machine: Building Computers That Understand Speech Hardcover – March 23, 2012
|New from||Used from|
Top 20 lists in Books
View the top 20 best sellers of all time, the most reviewed books of all time and some of our editors' favorite picks. Learn more
Frequently Bought Together
Customers Who Bought This Item Also Bought
With the explosive growth in speech applications on Android, iPhone and
other devices, The Voice in the Machine is a timely read. It
relates the 50+ year quest to develop voice recognition and synthesis, explains how
the technologies work, and contains enough anecdotes to make it fun.
There are many books on speech technology, but this is the first to
explain the technology against a backdrop of the broader forces that have shaped the
field. This will become a must-read text for those interested in what speech
technology is and how it has developed.
Roberto Pieraccini's fascinating book takes us on a tour of human speech,
modern techniques for speech understanding and generation, and the problems of
deploying it in real industrial applications. By using examples, he conveys the
essence of modern statistical speech processing without resorting to mathematics.
This book is both entertaining and educational, and highly recommended.
This is a fascinating tour of the development of modern speech
technologies and applications…A wonderful historical account of the growth of
About the Author
More About the Author
I have been in the speech technology research and business for more than 30 years. Prior to joining ICSI, I was the Chief Technology Officer of SpeechCycle, a company specialized in advanced spoken human-machine interaction systems for enterprise customer care (yes, those annoying "please tell me the reason you are calling about" computers that prevent you to talk to human operators when you need them). Trying to make those annoying computers better, I led an effort to develop new technology that tried to make those computers learn from their own mistakes and, hopefully, improve.
Before SpeechCycle, around 2003-2005, I managed a speech research team at IBM T.J. Watson Research, in Yorktown Heights, NY and prior to that, between 1999 and 2003, I was at SpeechWorks International, which is now known as Nuance, today's largest worldwide computer speech company.
The turning point in my computer speech research carrer was when I joined AT&T Bell Laboratories (which became then AT&T Shannon Laboratories) in 1988, where I worked with some of the most influential scientists in computer speech, such as Larry Rabiner. I arrived at Bell Laboratories from Italy, where in the 1980s I was a researcher at CSELT, the laboratories of the national Italian telephone company.
During all this time, I wrote, as an author or co-author, about 150 scientific papers and articles in the fields of speech recognition, spoken language understanding and dialog, multimodal interaction, and machine learning. I am best known for my original contributions to statistical methods for spoken language understanding and machine learning for spoken dialog systems.
My book "The Voice in the Machine" on the history of computer speech understanding technology, published by MIT Press, tells the story of 60 years of computer speech technology, in a way that is accessible to general scientific readers.
Top Customer Reviews
Mostly about developments in the speech recognition field (for completeness, Pieraccini has one chapter on Text-to-Speech), it's a very well-written, comprehensive survey of the history and current developments in speech technology.
It covers everything from the earliest attempts, through all the government-sponsored ARPA speech recognition challenges, to recent commercial deployments. The book would well serve as a reference for a college course or just for leisure reading: it's the best example I've ever seen of a book that explains concepts behind complex math, intuitively, without using a single equation. Roberto's writing style could almost be called poetic. It definitely conveys the passion behind the science. You must get this!
The author starts out by describing in convincing detail why human speech is so complex and difficult to understand, and to recreate in a lab or a commercial setting. He then goes on to describe early attempts inspired by AI, eventually arriving at statistical approaches that are the basis of most modern speech processing systems.
I like the book in its broad coverage, and while I do realize that the book is not aimed at techies, I'd have appreciated a little more coverage of HMMs and EM.
At a handful of places, there are some editing oversights that are simply disappointing for a book from a writer of this caliber (Ch. 5: "...De Mori, who pursued a brilliant carrier first at McGill..." -- career, not carrier).
Nonetheless, the book is a good read for someone interested in this technology.
I see two main categories of people that might gain great advantage by reading this book. The first are those not involved in the evolution of speech technologies, the second are the insiders, who were involved either in research or at any level, even non technical, in the speech industry. For the former the book explains how a complex technology evolves in reality with all the roadblocks, turns, and steep paths while the author puts all his effort in explaining very complex engineering problems without formulas or technicalities, but using simple and enlightening analogies and examples. The book will help them to understand what is behind Siri, Google Voice, or every other speaking machine. For the latter, the professionals of the voice science and industry, it is very interesting to see how the author assembles a map of the past and current technology, the motivations and the forces behind it, and shows how all the pieces fit together in a technological landscape of the area in which they are currently engaged. For them it is like stepping out for a minute to gain a vantage point perspective and different points of view.
I belong to the second category because I spent 20 years in R&D in the research lab in Italy where Roberto Pieraccini moves his first steps and then I was deep involved in the newborn speech industry.Read more ›
Most Recent Customer Reviews
A very good high-level overview. The earlier parts of the book provide more implementation details than the later parts, which tend to gloss over the details in favor of recounting... Read morePublished 8 months ago by Brandon Fosdick