-
What are the levels of speech?
- Linguistic for speaker
- Physiological for speaker
- Acoustic- sound waves
- Physiological for listener
- Linguistic for listener
-
Which level is the easiest to study?
acoustic level
-
Linear source-filter theory
- expresses articulatory-acoustic relationships
- *one of the most important/best theories in our field
-
What is involved in speech production?
- need a power source (breath support)
- we get a complex periodic signal from the vocal folds (the vibrate in 3 different ways and come together
- speech is changed by changing the shapes of your cavities
-
What is the source of sound for speech?
- vocal folds (vibration)
- *for some consonants, the source is more complex (can be in the vocal tract or a combination of both- voiceless sounds)
-
What makes you sound like you?
shapes of pharynx, nasal, and oral/mouth cavities
-
What is the filter for speech?
- vocal tract (frequency dependent like all filters)
- resonator (air filled cavity)
-
What does the resonator do for you?
- natural frequencies change in resonator (ear does everything for you so you can perceive differences)
- 3-6 syllables per second
-
How are the source and filter related?
- they are assumed to be independent of each other (an assumption made for convenience)
- this implies that you can change the output of the vocal folds without changing the vocal tract and vice-versa
-
What do the vocal folds and vocal tract give you?
- vocal folds- fundamental frequency, harmonics, and amplitude changes
- vocal tract- articulation
-
How are vowels modeled?
as a tube closed at one end and open at the other
-
What is the formula to calculate where the resonant frequencies will be?
- Fn = (2n-1)c/41
- Fn = resonant frequency
- n = integer (if looking for 1 you put 1, if looking for 2 you put 2, etc)
- c = speed of sound/4 times the length of tube
-
What is the first resonant frequency with a tube length of 17 cm and speed of sound is 34,000 cm/s?
- Fn = (2n-1)c/41
- (2-1)*34000/(4*17) = 500 Hz
- *the longer the tube the lower the resonant frequencies, the shorter the tube the higher the resonant frequencies
-
How many resonances are there for a tube?
- infinite
- we only need to consider the first 3 or 4 (the model is valid to only about 5 kHz)
-
What happens when the shape of the tube changes going from one vowel to another?
resonant frequencies change
-
Why doesn't changing the frequency/energy of the source of vibration change the resonant frequencies of the pipe/vocal tract?
the source and filter are independent of one another
-
What are formant frequencies?
- resonant frequencies of vowels
- *do NOT confuse with fundamental frequency!
-
How do a curved tube (vocal tract) and a straight tube (model) behave out to 5 kHz?
- indentically acoustically
- the curve begins to affect acoustic signals with a short wavelength
-
What happens if the tube has uniform cross sectional area?
the resonances are equally spaced
-
Does all of the energy come from the source or filter?
- source
- vocal fold vibration for vowels
-
What does changing the length of the tube do?
- changes the resonance frequencies
- influenced by age and sex
- l = 14.5 cm for females
- l = 8.75 cm for children
-
What does every formant/resonant/natural frequency have?
its own frequency, amplitude, and bandwidth
-
How are different vowels modeled?
acoustically by different vocal tract shapes
-
Phonetically, how are vowels distinguished?
position of the tongue
-
What happens if a constriction is placed on the tube/vocal tract?
the resonances change
-
What happens if you change the articulation?
you change the vocal tract shape, and the resonance frequencies, amplitudes, and bandwidths
-
-
The output energy of a vowel is the product of:
- the source energy
- the size and shape of the resonator
- the radiation characteristics (adds 6 dB)
- increases in frequency by 6 dB + 6 dB (constant)- output is actually -6 dB
-
What are glottal source characteristics for vowels?
- vocal fold vibration is periodic
- fo or F0 is used to indicate the vocal fundamental frequency
- the amplitude of the harmonics decreases by -12 dB/octave
-
What gives you amplitude changes?
- source
- only changing source and not filter makes resonant frequencies stay the same
-
-
What are filter characteristics for vowels?
- the vocal tract is a dynamic filter (changes constantly)
- it is frequency dependent
- it has, theoretically, an infinite number of resonances (only care about 1st 3 or 4 for vowels)
- each resonance has a center frequency, and amplitude and a bandwidth
- for speech, these resonances are called formants
- formants are numbered in succession from the lowest (F1, F2, F3, etc)
- the formants together form the transfer function (input-output relationship; formants become physically evident only when energized)
-
Which harmonic has the highest amplitude?
the one closest to the vowel
-
What is radiation characteristic?
- acoustic effect when a sound leaves a small area and enters a large one (like speaker)
- the effect is to raise the slope of the spectrum by +6 dB/octave
-
*What are the acoustic phonetic relationships for vowels?
- F1 is inversely related to tongue height (raise tongue, low F1 and vice versa)
- F2 is directly related to tongue advancement (back vowels have low F2, front vowels have high F2)
- lip rounding lowers all formant frequencies (because you're making the vocal tract longer)
- you can calculate how close a person is to the sound they are trying to make
-
What does perturbation mean?
constriction
-
What is the perturbation theory?
- volume velocity variations reflect the way air particles vibrate at a particular point in the vocal tract (how the air is passing through vocal folds)
- at some points, vibration is minimal (node); at others, maximal (antinodes)
- for F1, the antinode is at the open end of the tube (mouth) and the node is at the closed end (vocal folds)
- for F2, there are 2 antinodes and 2 nodes, etc
-
Where is there always an antinode?
lips
-
Where is there always a node?
vocal folds
-
What happens when there is a constriction near a node?
formant frequency will increase
-
What happens when there is a constriction near an antinode?
formant frequency will decrease
-
Perturbation theory, if a change in cross sectional area is applied (a perturbation):
- the acoustic effect depends on proximity to a node or an antinode (antinode = lower freq.; node = higher freq.)
- lip constrictions lower all formant frequencies
- laryngeal constrictions raise all formant frequencies
-
What do amplitudes depend on?
formant frequencies
-
If F1 is lowered (raised), what happens to A1?
it lowers (rises)
-
If 2 formant frequencies move closer together:
both peaks increase in amplitude
-
How do you raise or lower formant frequencies?
change articulators (3-6 syllables per second)
-
What are source-filter interactions?
- independent of one another
- BUT some vocal tract shapes may affect vocal fold vibration:
- singers' formant (to be heard over background noise)
- high impendance constrictions require greater subglottal air pressure
- vocal tract - vocal fold coupling during open phase of vibratory cycle
-
What can the linear source-filter theory be used to describe?
the acoustics of consonants as well as vowels
-
Why, for consonants, is the source not always at the level of the vocal folds?
- some sources are in the vocal tract
- these sources are aperiodic
- durations and amplitudes also are different from vowels
-
What does the source-filter theory give us?
a series of expectations for the acoustic characteristics for consonants
-
How are fricatives modeled?
as a tube with a very severe constriction
-
What are characteristics of fricatives?
- the air exiting the constriction is turbulent
- zeros or antiformants can be found in the spectrum
- because of the turbulence, there is no periodicity unless accompanied by voicing
-
What are characteristics of nasal consonants?
- velopharyngeal port is open and the oral cavity is completely blocked at some point
- the side-branch resonator produces antiformants (zeros)
- the overall vocal tract is longer than for vowels
- oral formants, nasal formants, nasal antiformants
- nasal murmur
-
What are characteristics of stops?
- the tube model is not altered very much
- time domain is critical
- there is a complete closure of the vocal tract somewhere
- pressure builds up behind the closure
- rapid release
- articulation results in a burst and transitions
-
What does analog mean?
storing ALL the information on a wave
-
What does digital mean?
samples at specific times along wave at each frequency and takes few points and stores the information (connects the dots for you and doesn't record amplitudes)
-
What is a spectrograph?
- an instrument the can capture the dynamics of speech
- acoustic signals vary only in frequency, amplitude and time; the sound spectrograph captures all of these
-
What is a spectrogram?
the output (usually a hardcopy) of a spectrograph
-
What is a wide-band filter good for?
looking at formant frequencies
-
What is a narrow-band filter good for?
looking at harmonics and fundamental frequency
-
-
-
What do black areas of a spectrogram indicate?
highest amplitudes
-
What do white areas on a spectrogram indicate?
the noise floor
-
What do shades of gray in a spectrogram indicate?
- intensity
- amplitudes between highest amplitude and noise floor
- the more intense the signal is at a particular frequency and time, the darker the trace
-
What is the Nyquist theorum?
- in order to represent a signal faithfully, it must be sampled at a rate equal to twice its highest frequency
- if you don't pick the right sampling rate, you don't get accurate output (if you get the wrong output, all your measurements are wrong)
-
What is presampling or brickwall filtering?
- removes all of the energy above the nyquist frequency
- the clinician/researcher determines the Nyquist frequency
- some knowledge of speech and speech and language disorders is required
-
What is aliasing?
- when the output doesn't match the input
- when you don't follow Nyquist rule
-
What are discrete numbers?
dots along wave (not continuous measurement)
-
What is sampling rate?
how many times you take a discrete number
-
What is sampling?
how many times per second the amplitude will be recorded
-
What does sampling for digital signal processing do?
- analog-to-digital conversion
- signal must be sampled at the Nyquist rate
- sampling rate decides the times at which the signal will be sampled
- sampling converts the acoustic signal into a series of numbers
- instead of amplitudes at all instances of time, no matter how small the time interval, amplitudes in the digital world exist only at the sampling interval
-
What happens to the samples determined by the sample rate?
they are chopped into discrete numbers (converting amplitude variations into discrete numbers)
-
What is quantization?
- discrete number of amplitude levels
- the more quantizer levels available, the more the discrete signal represents the original analog signal (higher the rate, smaller the interval)
- in our applications, 16 -bit quantizers over a 20-volt range are typical (this yields an amplitude resolution of 300 microvolts and a signal to noise ratio of 96 dB)
-
What happens after A/D (analog to digital) conversion?
- the signal is stored as a stream of numbers
- time is related by the index to the sampling rate
- the amplitude is the stored number (quantization process)
- in this form, many operations can be performed (you can do anything you want)
-
What is involved in a waveform display?
- duration measurements (speech changes gradually)
- signal editing
- amplitude measurements (rms is most common)
- vocal fundamental frequency
- *some consistent rules need to be adopted for duration and signal editing
-
What is a digital spectrograph?
a series of spectra based on the FFT (fast Fourier analysis) or LPC (linear predictive coding)
-
How is amplitude depicted in a digital spectrograph?
as shades of gray
-
What is an example of a digital spectrograph?
- praat
- does the work for us
-
What is linear predictive coding (LPC)?
- you can predict where the next dot (amplitude) will be based on previous cycles (as few as 10 to 15 previous samples is all that is required)
- speech does not generally vary wildly from sample to sample (highly predictable)
-
What is the equation for LPC?
- y = a0 + a1(x-1) + a2 (x-2)+....
- y = amplitude of the next sample
- x = one of the previous samples
- a = estimates of the resonances of vocal tract (can represent sections of vocal tract)
- allows you to talk on the phone (can guess what speech will be so it only has to transfer so many numbers)
- individuals with voice/hearing problems have problems with being understood on the phone
-
What is a wideband spectrogram?
- short time window (.005, .007, .009)
- good for measuring formant frequencies (of vowels)
-
What is a narrowband spectrogram?
- long time window (.1, .05)
- good for showing and measuring harmonics
|
|