Chapter 6



01  02  03  04  05  

06  07  08  09  10  

11  










PlainTalk -- Text to Speech

Consequently, when the computer reads a text, it may err in its analysis of hierarchical segmentation and assignment of sentence stress. Since pitch is largely determined by segmentation and stress, incorrect information about these elements can result in unintelligible speech. To minimize the effects of such errors, we limit the range of pitch movement. Although the synthesizer sounds more realistic than people trying to impersonate computers, it still sounds very mechanical. When we can annotate text to specify phrase structure and focus, or generate text with a computer whose range of pitch can expand to match the range of human speakers, synthetic speech will sound better.

Other causes for the poor quality of synthetic speech arise from our inability to model the duration of the phonemes and the movement of the pitch as accurately as we need to to imitate human speech. More important, we still cannot analyze speech and use the resulting parameters in a way that accurately copies the human sound of the speech. At present, it is difficult to predict when we will solve these problems and build computers that sound like HAL.

Researchers in speech synthesis are now working in an area not portrayed in 2001. In the film, HAL is portrayed as a large machine whose connection to the world is a large red eye. At Bell Labs, we have attached a talking face to our computer, which simultaneously sends the same information to the synthesizer and the talking head. Thus the talking head receives information about the phonemes and their duration and uses the information to compute the appropriate position of its lips, jaw, and tongue. It also moves its eyebrows to enhance the stressed portions of the speech. Although the talking head in the picture is a flat mask, it can be covered by a textured face mask portraying any person you choose. The talking face not only makes the speech synthesizer more attractive and personable, it also enhances the intelligibility of the speech by letting the listener lipread while listening tothe computer (cf. chapter 11). If HAL had had a real face, rather than one large eye, would it have been so easy to kill him -- by turning him off? I wonder.


Acknowledgments

I would like to thank my wife, Virginia, and my colleagues Bernd Möubeus, Chilin Shih, Richard Sproat, Michael Tanenblatt, Evelyne Tzoukermann, and Jan van Santen for their helpful comments. I would also like to thank my directors, L. R. Rabiner and N. S. Jayant, for their encouragement and support.




top of pageauthor infofurther readingorder