Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
To get the free app, enter your mobile phone number.
The Voice in the Machine: Building Computers That Understand Speech (MIT Press) Hardcover – March 23, 2012
Frequently bought together
Customers who bought this item also bought
With the explosive growth in speech applications on Android, iPhone and other devices, The Voice in the Machine is a timely read. It relates the 50+ year quest to develop voice recognition and synthesis, explains how the technologies work, and contains enough anecdotes to make it fun.(Alfred Z. Spector, Vice President of Research, Google, Inc.)
There are many books on speech technology, but this is the first to explain the technology against a backdrop of the broader forces that have shaped the field. This will become a must-read text for those interested in what speech technology is and how it has developed.(Robert Dale, Centre for Language Technology, Macquarie University)
Roberto Pieraccini's fascinating book takes us on a tour of human speech, modern techniques for speech understanding and generation, and the problems of deploying it in real industrial applications. By using examples, he conveys the essence of modern statistical speech processing without resorting to mathematics. This book is both entertaining and educational, and highly recommended.(Steve Young, Professor of Information Engineering, University of Cambridge)
This is a fascinating tour of the development of modern speech technologies and applications…A wonderful historical account of the growth of speech technology.(Choice)
About the Author
Roberto Pieraccini, Director of ICSI, the International Computer Science Institute in Berkeley, California, has been active for more than thirty years in speech research and technology.
Top customer reviews
There was a problem filtering reviews right now. Please try again later.
Mostly about developments in the speech recognition field (for completeness, Pieraccini has one chapter on Text-to-Speech), it's a very well-written, comprehensive survey of the history and current developments in speech technology.
It covers everything from the earliest attempts, through all the government-sponsored ARPA speech recognition challenges, to recent commercial deployments. The book would well serve as a reference for a college course or just for leisure reading: it's the best example I've ever seen of a book that explains concepts behind complex math, intuitively, without using a single equation. Roberto's writing style could almost be called poetic. It definitely conveys the passion behind the science. You must get this!
The author starts out by describing in convincing detail why human speech is so complex and difficult to understand, and to recreate in a lab or a commercial setting. He then goes on to describe early attempts inspired by AI, eventually arriving at statistical approaches that are the basis of most modern speech processing systems.
I like the book in its broad coverage, and while I do realize that the book is not aimed at techies, I'd have appreciated a little more coverage of HMMs and EM.
At a handful of places, there are some editing oversights that are simply disappointing for a book from a writer of this caliber (Ch. 5: "...De Mori, who pursued a brilliant carrier first at McGill..." -- career, not carrier).
Nonetheless, the book is a good read for someone interested in this technology.
I see two main categories of people that might gain great advantage by reading this book. The first are those not involved in the evolution of speech technologies, the second are the insiders, who were involved either in research or at any level, even non technical, in the speech industry. For the former the book explains how a complex technology evolves in reality with all the roadblocks, turns, and steep paths while the author puts all his effort in explaining very complex engineering problems without formulas or technicalities, but using simple and enlightening analogies and examples. The book will help them to understand what is behind Siri, Google Voice, or every other speaking machine. For the latter, the professionals of the voice science and industry, it is very interesting to see how the author assembles a map of the past and current technology, the motivations and the forces behind it, and shows how all the pieces fit together in a technological landscape of the area in which they are currently engaged. For them it is like stepping out for a minute to gain a vantage point perspective and different points of view.
I belong to the second category because I spent 20 years in R&D in the research lab in Italy where Roberto Pieraccini moves his first steps and then I was deep involved in the newborn speech industry.
A last little advice is for the readers who would like to move from the author's examples to more technical readings. I found the Notes section very interesting, like a book inside the book. You might read it from the top to the bottom and you will find there some formulas, pointers to literature and complementary thoughts.
Now, I'll eagerly wait a continuation from Roberto Pieraccini to look forward instead of backward, but I strongly suggest to read this marvelous book now.
Be aware that the target audience is NEITHER
- people who already understand computer speech technology (unless perhaps they want to learn some history) OR
- the intellectually lazy. This is a difficult subject, and to get the most out of it, you will occasionally have to close the book and think about what you have just read.
But assuming you are in this target audience (you're an engineer in another field, a physicist, an astronomer, basically someone curious about the world around you) and want to learn the basic history, ideas, successes, and failures of computer speech understanding, I have never come across a book close to as good as this.
I only wish there were a comparable book in similar fields like computer vision, or computer translation.