SECOND YEAR COURSE AUTUMN TERM
PERCEPTION
Hearing Lecture Notes (6): Auditory Object Recognition & Music
For some more material relevant to this topic see McGill
auditory pages;
1. TIMBRE
Vowel sounds
Vowel Sounds in speech differ in the relative amplitudes of their
harmonics. A particular vowel has harmonics that have a greater amplitude
near the formant frequencies. A formant is a resonant frequency of
the vocal tract. As you change the pitch of a vowel, you change the fundamental
frequency and the spacing of the harmonics, but the formant frequencies
stay the same. If you change the vowel without changing the pitch of the
voice, the fundamental and the harmonic spacing stay the same but the formant
frequencies change. Here is the spectrum of the vowel in "bit"
on a fundamental frequency of 125 Hz.
Musical instruments
The synthetic sounds produced by a simple keyboard synthesiser differ in:
* the relative amplitude of their harmonics;
* their attack time and decay time.
For most synthesisers the relative amplitudes of the different harmonics
stay constant throughout the sound.
The sounds produced by a natural musical instrument are much more
complex; the different harmonics start and stop at different times and change
in relative amplitude throughout the "steady-state" of the note.
Our ability to identify a natural musical instrument from another depends
more on the attack (and decay) than the "steady-state". The nature
of the attack and the relative amplitudes during the staedy-state are not
constant for a particular instrument. They depend on the style of playing,
where in the the range of the instrument the note is etc.
Back to Main Index
Auditory Scene Analysis
Ears receive waves from many different sound sources at the same time eg
multiple talkers, or instruments, cars, machinery etc. In order to recognise
the pitch and timbre of the sound from a particular source the brain must
decide which frequencies "belong together" and have come from
this source. The problem is formally similar to that of "parsing"
a visual scene into separate objects.
Principles enunciated by the Gestalt psychologists in vision are useful
as heuristics for helping the decide what sounds will be grouped together:
proximity, similarity, good continuation, common fate, all have auditory
analogues.
The brain needs to group simultaneously(separating out which frequency components
that are present at a particular time have come from the same sound source)
and also successively(deciding which group of components at one time is
a continuation of a previous group).
Auditory streaming
Auditory streaming is the formation of perceptually distinct apparent sound
sources. Temporal order judgement is good within a stream but bad between
steams. Examples include:
* implied polyphony
* noise burst replacing a consonant in a sentence.
* click superimposed on a sentence or melody.
Back to Main Index
Grouping Principles
(i) Proximity
* Tones close in frequency will group together, so as to minimise the extent
of frequency jumps and the number of streams.
* Tones with similar timbre will tend to group together.
* Speech sounds of similar pitch will tend to be heard from the same speaker.
* Sounds from different locations are harder to group together across time
than those from the same location.
(ii) Common fate
* Sounds from a common source tend to start and stop at the same time and
change in amplitude or frequency together (vibrato).
* A single component is easy to hear out if it is the only one to change
in a complex.
(iii) Good continuation
Abrupt discontinuities in frequency or pitch, can give the impression of
a different sound source.
Continuity Effect
Sound that is interrupted by a noise that masks it, can appear to be continuous.
Alternations of sound and mask can give the illusion of continuity with
the auditory system interpolating across the mask.
Back to Main Index
MUSIC PERCEPTION
Tuning
Consonant intervals have harmonics that do not beat together to give roughness,
i.e. at small integer frequency ratios: 2:1 (octave) 3:2 (fifth) 4:3 (fourth)
5:4 (major third).
Unfortunately, a scale based on such intervals is not internally consistent
and does not allow modulations.
Equal temperament sacrifices some consonance in the primary intervals for
an equal size of semitone (2**1/12), and so sounds equally in tune in any
key.
Absolute pitch
About 1 person in 10,000 has "Absolute Pitch" - they can identify
the pitch of musical note without the use of an external reference pitch.
Most people can only give pitch names relatively - "if that is A this
must be C". Absolute pitch is much more common in people who had musical
training at an early age than among those who started later, and isprobably
more common in those whose early training involved learning the names of
notes. It can be a liability, since pitch perception can change as you grow
older, and international pitch standards also change. A more common absolute
ability is the ability to tell when a piece of music is being played in
the correct key.
Melody
The pitch of a tone can be regarded as having chroma (musical note name)
and height (which octave). Melodies are hard to recognise if only chroma
is maintained (transposing notes by octaves). Overall contour is an important
attribute of melody, and allows variation of chroma within a recognisable
framework.
Back to Main Index