Vocal resonances and broad band excitation
This site introduces a technique developed in this lab for measuring acoustic properties of the vocal tract quickly and non-invasively, during speech or singing. We use it as a research tool, but we have also developed it for use as a speech trainer. (Because we use some technical language, you may wish to read Introduction to vocal tract acoustics before continuing.)
The speech training application uses visual feedback about the vocal tract to help people produce speech sounds in a language they are learning. In principle, one might hope to provide such feedback for the speech sounds alone. However, such feedback is inherently limited in precision and practicality. Even the most advanced speech recognition systems still mistake words, which indicates the limits of their precision in accurate measures of pronunciation. The basic problem is that the speech signal alone does not have enough information in it to allow
us to work out, quickly and precisely, the configuration of the vocal tract. This is not a problem for understanding speech, but it may be a problem in learning precise pronunciation. Our approach is therefore to introduce a signal with more information in the frequency domain.
In model experiments using the laboratory prototype, we have shown that one or two hours' training using visual feedback of some key features of the acoustical response of a subject's vocal tract improves the accuracy and intelligibility of pronunciation of foreign phonemes by monolingual adults.
Vocal tract resonance measurements:
We inject into the vocal tract an acoustic current which is synthesised to give high resolution frequency information over the frequency range of interest. We first calibrate the system my making a measurement of sound pressure just outside the lips, with the mouth closed (pclosed). We then inject the same acoustic current into the vocal tract in parallel with the external field and again measure the pressure at the lips (popen). The ratio popen/pclosed then shows the frequencies of the tract resonances.
This measurement made during phonation shows two superposed signals. The popen/pclosed ratio is the broad band signal, and the arrows show the resonances R1, R2, R3, R4. The harmonics of the voice are a frequences fo, 2fo, 3fo etc, with the 8th harmonic and above being hidden by the broad band signal. Notice that the fifth harmonic (5fo) falls close to R2, which gives it a boost in amplitude. (The resonance provides impedance matching between the high impedance of the glottis and the low impedance of the radiation field at the lips.)
The vocal tract behaves is an acoustic duct about 170 mm long, nearly closed by the vocal folds and open at the mouth. A cylinder, length L, closed at one end has resonances at f0 = v/4L , at 3f0, 5f0 etc, where v is the speed of sound. (See pipes and harmonics.) For such a cylinder the resonances would fall at frequencies of about 0.5, 1.5, 2.5, 3.5 and 4.5 kHz. The vocal tract shape varies as the lips and tongue are moved in speech. So here, while R1 and R4 fall close to the values for the cylinder, R2 is lower and R3 higher than for a cylinder.
From the broad band response we can determine the resonances
of the vocal tract, independently of the speech signal. The resonant frequencies are interesting for fundamental acoustical
phonetic research but, if we extract and display them immediately, they can be used to drive a cursor for speech training. This is how we do it in the real time version.
Schematic diagram. (a) shows the spectrum of the
speech signal alone. This male voice has harmonic partials
spaced at the pitch frequency 126 Hz. (b) The injected
signal has frequencies spaced at 5 Hz, whose amplitudes are
calibrated (in this case) using the radiation field outside
the speaker's mouth. (c) The sum of the speech signal
and the broad band signal (including the effects of the resonances)
goes from the microphone to the analogue-digital converter. The speech signal is
used to measure pitch and amplitude; then the harmonic components
below 1 kHz are removed. (d) The resonances are detected
from the remaining interpolated signal. Similarly, the broadband
signals may be removed to leave just the speech harmonics.
In the version of the device used for speech training, the resonance frequencies are used to position the cursor on the vowel plane (see below). Notice that the signal:noise ratio in these figures is greater than in the preceding figure. This is a consequence of making the measurements rapidly.
How it looks:
This is a screen dump of the feedback display in the current
speech trainer device, set up with targets from Australian
English. The background ellipses are measurements of the vowels
of 33 Australian men, with mean values for each vowel at the
centre of each ellipse. The semi-axes are the standard deviations
in R1 and R2. These or other areas can be used as targets
in speech training. A cursor on the monitor (the cross at
(1190,530)) shows the current configuration of the subject's
own vocal tract. Initially, subjects 'steer' the motion of
the cursor by consciously controlling jaw and tongue position.
Speakers of the language displayed can 'aim' towards one of
the vowels shown. After some practice, however, it becomes
nearly as automatic as using a joy-stick or a mouse - one
just 'makes it go' where one wants, without thinking of the
muscular details. In other words, a visual feedback loop is
unconsciously used to train articulation.
Does it work?
For a report of a trial experiment using a prototype system
as a language trainer, see our papers:
- Dowd, A., Smith, J.R. and Wolfe, J. (1998) "Learning
to pronounce vowel sounds in a foreign language using acoustic
measurements of the vocal tract as feedback in real time"
Language and Speech, 41, 1-20.
- Epps, J., Smith, J.R. and Wolfe, J. (1997) "A novel instrument
to measure acoustic resonances of the vocal tract during
speech" Measurement Science and Technology 8,
- Donaldson, T., Wang, D., Smith, J. and Wolfe, J. (2003)
tract resonances: a preliminary study of sex differences
for young Australians", Acoustics Australia,
- J., Dowd, A., Smith, J.R. and Wolfe, J. (1997) Real
time measurements of the vocal tract resonances during speech
Eurospeech'97 (G. Kokkinakis, N. Fakotakis &
E. Dermatas, eds.) Rhodes, 721-724.
- Joliveau, E., Smith, J. and Wolfe, J. (2004) "Tuning
of vocal tract resonances by sopranos", Nature,
- Joliveau, E., Smith, J. and Wolfe, J. (2004) "Vocal
tract resonances in singing: the soprano voice", J.
Acoust. Soc. America, 116, 2434-2439.
More pages on related topics