Chapter 7



01  02  03  04  05  

06  07  08  09  10  

11  12  13  14  15  

16  17  





To demonstrate the importance of the auditory sense, try watching the television news with the sound turned off. Then try it again with the sound on, but without looking at the picture. Next, try a similar experiment with a videotape of the movie 2001. You will probably find it easier to follow the stories with your ears alone than with your eyes alone, even though our eyes transmit much more information to our brains than our ears do -- about fifty billion bits per second from both eyes versus approximately a million bits per second from two ears. The result is surprising. There is a saying that a picture is worth a thousand words; yet the above exercise illustrates the superior power of spoken language to convey our thoughts. Part of that power lies in the close link between verbal language and conscious thinking. Until recently, a popular theory held that thinking was subvocalized speech. (J. B. Watson, the founder of behaviorism, attached great attention to the small movements of the tongue and larynx made while we think.) Although we now recognize that thoughts incorporate both language and visual images, the crucial importance of the auditory sense in the acquisition of knowledge -- which we need in order to recognize speech in the first place -- is widely accepted.

Yet many people consider blindness a more serious handicap than deafness. A careful consideration of the issues shows this to be a misconception. With modern mobility techniques, blind persons with appropriate training have little difficulty going from place to place. The blind employees of my first company (Kurzweil Computer Products, Inc., which developed the Kurzweil Reading Machine for the Blind) traveled around the world routinely. Reading machines can vprovide access to the world of print, and visually impaired people experience few barriers to communicating with others in groups or individual encounters. For the deaf, however, the barrier to understanding what other people are saying is fundamental.

We learn to understand and produce spoken language during our first year of life, years before we can understand or create written language. HAL apparently spent years learning human speech by listening to his teacher, whom he identifies as Mr. Langley, at the HAL lab in Urbana, Illinois. Studies with humans have shown that groups of people can solve problems with dramatically greater speed if they can communicate verbally rather than being restricted to other methods. HAL and his human colleagues amply demonstrate this finding. Thus, intelligent machines that understand verbal language make possible an optimal modality of communication. In recent years, a major goal of artificial intelligence research has been making our interactions with computers more natural and intuitive. HAL's primarily verbal communication with crew members is a clear example of an intuitive user interface.


The Roots of Automatic Speech Recognition (ASR)

Keeping in mind our five lessons about creating speech-recognition systems, it is interesting to examine historical attempts to endow machines with the ability to understand human speech. The effort goes back to Alexander Graham Bell, and the roots of the story go even farther back, to Bell's grandfather Alexander Bell, a widely known lecturer and speech teacher. His son, Alexander Melville Bell, created a phonetic system for teaching the deaf to speak called visible speech. At the age of twenty-four, Alexander Graham Bell began teaching his father's system of visible speech to instructors of the deaf in Boston. He fell in love with and subsequently married one of his students, Mabel Hubbard. She had been deaf since the age of four as a result of scarlet fever. The marriage served to deepen his commitment to applying his inventiveness to overcoming the handicaps of deafness.

He built a device he called a phonautograph to make visual patterns from sound. Attaching a thin stylus to an eardrum he obtained from a medical school, he traced the patterns produced by speaking through the eardrum on a smoked glass screen. His wife, however, was unable to understand speech by looking at these patterns. The device could convert speech sounds into pictures, but the pictures were highly variable and showed no similarity in patterns, even when the same person spoke the same word.


top of pageauthor infofurther readingorderforward