SECOND YEAR COURSE AUTUMN TERM

PERCEPTION

Hearing Lecture Notes (6): Auditory Object Recognition & Music
For some more material relevant to this topic see McGill auditory pages;

1. TIMBRE
Vowel sounds

Vowel Sounds in speech differ in the relative amplitudes of their harmonics. A particular vowel has harmonics that have a greater amplitude near the formant frequencies. A formant is a resonant frequency of the vocal tract. As you change the pitch of a vowel, you change the fundamental frequency and the spacing of the harmonics, but the formant frequencies stay the same. If you change the vowel without changing the pitch of the voice, the fundamental and the harmonic spacing stay the same but the formant frequencies change. Here is the spectrum of the vowel in "bit" on a fundamental frequency of 125 Hz.

Musical instruments

The synthetic sounds produced by a simple keyboard synthesiser differ in:

* the relative amplitude of their harmonics;

* their attack time and decay time.

For most synthesisers the relative amplitudes of the different harmonics stay constant throughout the sound.

The sounds produced by a natural musical instrument are much more complex; the different harmonics start and stop at different times and change in relative amplitude throughout the "steady-state" of the note. Our ability to identify a natural musical instrument from another depends more on the attack (and decay) than the "steady-state". The nature of the attack and the relative amplitudes during the staedy-state are not constant for a particular instrument. They depend on the style of playing, where in the the range of the instrument the note is etc.

Back to Main Index

Auditory Scene Analysis

Ears receive waves from many different sound sources at the same time eg multiple talkers, or instruments, cars, machinery etc. In order to recognise the pitch and timbre of the sound from a particular source the brain must decide which frequencies "belong together" and have come from this source. The problem is formally similar to that of "parsing" a visual scene into separate objects.

Principles enunciated by the Gestalt psychologists in vision are useful as heuristics for helping the decide what sounds will be grouped together: proximity, similarity, good continuation, common fate, all have auditory analogues.

The brain needs to group simultaneously(separating out which frequency components that are present at a particular time have come from the same sound source) and also successively(deciding which group of components at one time is a continuation of a previous group).

Auditory streaming

Auditory streaming is the formation of perceptually distinct apparent sound sources. Temporal order judgement is good within a stream but bad between steams. Examples include:

* implied polyphony

* noise burst replacing a consonant in a sentence.

* click superimposed on a sentence or melody.

Back to Main Index

Grouping Principles

(i) Proximity

* Tones close in frequency will group together, so as to minimise the extent of frequency jumps and the number of streams.

* Tones with similar timbre will tend to group together.

* Speech sounds of similar pitch will tend to be heard from the same speaker.

* Sounds from different locations are harder to group together across time than those from the same location.

(ii) Common fate

* Sounds from a common source tend to start and stop at the same time and change in amplitude or frequency together (vibrato).

* A single component is easy to hear out if it is the only one to change in a complex.

(iii) Good continuation

Abrupt discontinuities in frequency or pitch, can give the impression of a different sound source.

Continuity Effect

Sound that is interrupted by a noise that masks it, can appear to be continuous. Alternations of sound and mask can give the illusion of continuity with the auditory system interpolating across the mask.

Back to Main Index

MUSIC PERCEPTION
Tuning

Consonant intervals have harmonics that do not beat together to give roughness, i.e. at small integer frequency ratios: 2:1 (octave) 3:2 (fifth) 4:3 (fourth) 5:4 (major third).

Unfortunately, a scale based on such intervals is not internally consistent and does not allow modulations.

Equal temperament sacrifices some consonance in the primary intervals for an equal size of semitone (2**1/12), and so sounds equally in tune in any key.

Absolute pitch

About 1 person in 10,000 has "Absolute Pitch" - they can identify the pitch of musical note without the use of an external reference pitch. Most people can only give pitch names relatively - "if that is A this must be C". Absolute pitch is much more common in people who had musical training at an early age than among those who started later, and isprobably more common in those whose early training involved learning the names of notes. It can be a liability, since pitch perception can change as you grow older, and international pitch standards also change. A more common absolute ability is the ability to tell when a piece of music is being played in the correct key.

Melody

The pitch of a tone can be regarded as having chroma (musical note name) and height (which octave). Melodies are hard to recognise if only chroma is maintained (transposing notes by octaves). Overall contour is an important attribute of melody, and allows variation of chroma within a recognisable framework.


Back to Main Index