Vocal resonances and broad band excitation

This site introduces a technique developed in this lab for measuring acoustic properties of the vocal tract quickly and non-invasively, during speech or singing. We use it as a research tool, but we have also developed it for use as a speech trainer. (Because we use some technical language, you may wish to read Introduction to vocal tract acoustics before continuing.)

The speech training application uses visual feedback about the vocal tract to help people produce speech sounds in a language they are learning. In principle, one might hope to provide such feedback for the speech sounds alone. However, such feedback is inherently limited in precision and practicality. Even the most advanced speech recognition systems still mistake words, which indicates the limits of their precision in accurate measures of pronunciation. The basic problem is that the speech signal alone does not have enough information in it to allow us to work out, quickly and precisely, the configuration of the vocal tract. This is not a problem for understanding speech, but it may be a problem in learning precise pronunciation. Our approach is therefore to introduce a signal with more information in the frequency domain.

In model experiments using the laboratory prototype, we have shown that one or two hours' training using visual feedback of some key features of the acoustical response of a subject's vocal tract improves the accuracy and intelligibility of pronunciation of foreign phonemes by monolingual adults. ../Loop_Canvas_1sec.html

Vocal tract resonance measurements:

We inject into the vocal tract an acoustic current which is synthesised to give high resolution frequency information over the frequency range of interest. We first calibrate the system my making a measurement of sound pressure just outside the lips, with the mouth closed (pclosed). We then inject the same acoustic current into the vocal tract in parallel with the external field and again measure the pressure at the lips (popen). The ratio popen/pclosed then shows the frequencies of the tract resonances.

graph showing voice harmonics and resonances independently.
This measurement made during phonation shows two superposed signals. The popen/pclosed ratio is the broad band signal, and the arrows show the resonances R1, R2, R3, R4. The harmonics of the voice are a frequences fo, 2fo, 3fo etc, with the 8th harmonic and above being hidden by the broad band signal. Notice that the fifth harmonic (5fo) falls close to R2, which gives it a boost in amplitude. (The resonance provides impedance matching between the high impedance of the glottis and the low impedance of the radiation field at the lips.)
    The vocal tract behaves is an acoustic duct about 170 mm long, nearly closed by the vocal folds and open at the mouth. A cylinder, length L, closed at one end has resonances at f0 = v/4L , at 3f0, 5f0 etc, where v is the speed of sound. (See pipes and harmonics.) For such a cylinder the resonances would fall at frequencies of about 0.5, 1.5, 2.5, 3.5 and 4.5 kHz. The vocal tract shape varies as the lips and tongue are moved in speech. So here, while R1 and R4 fall close to the values for the cylinder, R2 is lower and R3 higher than for a cylinder.
From the broad band response we can determine the resonances of the vocal tract, independently of the speech signal. The resonant frequencies are interesting for fundamental acoustical phonetic research but, if we extract and display them immediately, they can be used to drive a cursor for speech training. This is how we do it in the real time version.

diagram showing how to extract vocal tract resonances in real time

Schematic diagram. (a) shows the spectrum of the speech signal alone. This male voice has harmonic partials spaced at the pitch frequency 126 Hz. (b) The injected signal has frequencies spaced at 5Hz, whose amplitudes are calibrated (in this case) using the radiation field outside the speaker's mouth. (c) The sum of the speech signal and the broad band signal (including the effects of the resonances) goes from the microphone to the analogue-digital converter. The speech signal is used to measure pitch and amplitude; then the harmonic components below 1kHz are removed. (d) The resonances are detected from the remaining interpolated signal. Similarly, the broadband signals may be removed to leave just the speech harmonics. In the version of the device used for speech training, the resonance frequencies are used to position the cursor on the vowel plane (see below). Notice that the signal:noise ratio in these figures is greater than in the preceding figure. This is a consequence of making the measurements rapidly.

How it looks:

screen dump of real time display

This is a screen dump of the feedback display in the current speech trainer device, set up with targets from Australian English. The background ellipses are measurements of the vowels of 33 Australian men, with mean values for each vowel at the centre of each ellipse. The semi-axes are the standard deviations in R1 and R2. These or other areas can be used as targets in speech training. A cursor on the monitor (the cross at (1190,530)) shows the current configuration of the subject's own vocal tract. Initially, subjects 'steer' the motion of the cursor by consciously controlling jaw and tongue position. Speakers of the language displayed can 'aim' towards one of the vowels shown. After some practice, however, it becomes nearly as automatic as using a joy-stick or a mouse - one just 'makes it go' where one wants, without thinking of the muscular details. In other words, a visual feedback loop is unconsciously used to train articulation.

Does it work?

For a report of a trial experiment using a prototype system as a language trainer, see our papers:

More pages on related topics


[Basics | Research | Publications | Flutes | Clarinet | Saxophone | Brass | Didjeridu | Guitar | Violin | Voice | Cochlear ]
[ People | Contact Us | Home ]

Joe Wolfe / J.Wolfe@unsw.edu.au
phone 61-2-9385 4954 (UT + 10, +11 Oct-Mar)
Joe's music site

Happy birthday, theory of relativity!

As of June 2005, relativity is 100 years old. Our contribution is Einstein Light: relativity in brief... or in detail. It explains the key ideas in a short multimedia presentation, which is supported by links to broader and deeper explanations.
Music Acoustics Homepage What is a decibel? Didjeridu acoustics