Recent advances in speech synthesis will enable the development of high-quality natural voice systems with broad application in education, business, entertainment, and medicine. Text to Speech Synthesis is the first book to comprehensively document these new research trends and paradigms, balancing coverage of research and applications. It brings together seminal research by leaders in the field, drawn from both academic and industrial laboratories worldwide.
The authors and editors offer broad coverage of several key areas, including new unit selection approaches, speech representations and modeling, data-driven synthesis schemes, and expressive speech synthesis.
Coverage includes:
This is an essential resource for all researchers working in speech synthesis and related areas such as multimedia signal processing, linguistics, and spoken user interfaces. It will also be valuable to any engineer, developer, or manager who must evaluate the latest speech technologies or integrate them into practical applications.
Dr. Shrikanth Narayanan is associate professor at the Signal and Image Processing Institute of USC's Electrical Engineering Department. He founded and directs USC's Speech Analysis and Interpretation Laboratory, and serves as research area director of the Integrated Media Systems Center, an NSF Engineering Research Center. He is associate editor of IEEE Transactions of Speech and Audio Processing, serves on the speech communication technical committee of the Acoustical Society of America, and was Principal Member of Technical Staff at AT&T Laboratories.
Dr. Abeer Alwan, a professor of electrical engineering at UCLA, established and directs the Speech Processing and Auditory Perception Laboratory there. Her research interests include modeling human speech production and perception mechanisms and applying these models to speech-processing applications such as noise-robust automatic speech recnognition, compression, and synthesis. She is a Fellow of the Acoustical Society of America and recently served as editor-in-chief of the journal Speech Communication.
Product Details
Would you like to update product info or give feedback on images?
|
|
Share your thoughts with other customers:
|
||||||||||||||||||||||
|
Most Helpful Customer Reviews
1 of 2 people found the following review helpful:
4.0 out of 5 stars
compare TTS to ASR,
By
This review is from: Text to Speech Synthesis: New Paradigms and Advances (Prentice Hall Imsc Press Multimedia Series) (Hardcover)
The field of TTS has been steadily improving. But still not perfect. If you listen to an extended TTS audio, you are unlikely to imagine it was a single human recording. Here, the editors provide a set of research papers that map out the boundary of TTS.
What I found the most interesting was the chapter comparing it with Automatic Speech Recognition. The latter is a much harder problem. Especially if you want speaker independence. And the input audio can have noise. Whereas TTS is effectively noise-free. The input text is always precisely known. But the chapter points out an ironic difference that is somewhat of a mirror image. ASR accuracy can be easily and objectively measured, by comparing the ASR's output text with the text transcribed by a human listener. Whereas the "goodness" of a TTS audio output is very subjectively determined. This is one major unsolved TTS problem. The chapter goes into some of the ASR methods that have been brought successfully into TTS research. Most notably is the use of Hidden Markov Methods. In ASR work, this was perhaps the biggest innovation in the last 10 years. It also shows some preliminary promise for TTS.
Share your thoughts with other customers: Create your own review
|
|
Tag this product(What's this?)Think of a tag as a keyword or label you consider is strongly related to this product.
Tags will help all customers organize and find favorite items. |
|
This product's forum
Active discussions in related forums
Search Customer Discussions
|
Related forums
|