Research

Figure 1 View »
Figure 2 View »
Figure 3 View »
Figure 4 View »
Figure 5 View »
Figure 6 View »
Figure 7 View »
An Empirical Explanation: An Explanation of Pitch and Music Based on the Statistical Structure of Natural Tonal Sounds
Figure 1 / Diagram of the "inverse auditory problem", showing the conflation of the physical factors at the ear that the listener nevertheless needs to parse.
Figure 2 / The intervals of the chromatic scale. A one-octave portion of a piano keyboard indicating the chromatic scale tones, the name of the musical interval corresponding to the combination of each tone with the lowest note in the scale (C in this example) and the fundamental frequency ratio that defines each interval in the Just Intonation tuning scheme.
Figure 3 / The consonance ordering of musical intervals. Graph shows the consonance rankings assigned to each of the 12 chromatic scale tone pairings in the seven studies reported by Malmberg (1918). The median consonance values are indicated by the open circles connected by a dashed line. The octave (e.g., C4 and C5) is judged the most consonant and the minor second (e.g., C and bD) is judged the least consonant.
Figure 4 / Analysis of speech segments. A) Variation of sound pressure level over time for a representative utterance from the TIMIT corpus (the sentence in this example is "She had your dark suit in greasy wash water all year"). B) Blowup of a 0.1 second segment extracted from the utterance (in this example the vowel sound in "dark"). C) The spectrum of the extracted segment in (B), generated by application of a fast Fourier transform. All amplitude and frequency values in a given spectrum were normalized according to. Fn = F/Fm and An = A/Am, where Am and Fm are the maximum amplitude and its associated frequency, A and F are any given amplitude and frequency values in the spectrum, and An and Fn are the normalized values. This method of normalization avoids any assumptions about the structure of human speech sounds, e.g., that such sounds should be conceptualized in terms of ideal harmonic series.
Figure 5 / Statistical characteristics of American English speech sounds based on an analysis of the spectra extracted from the >100,000 segments (200 per speaker) in the TIMIT corpus. Mean normalized amplitude is plotted as a function of normalized frequency, the maxima indicating the normalized frequencies at which power tends to be concentrated. The plot shows the statistical spectrum for the octave interval bounded by the frequency ratios 1 and 2. Error bars show the 95% confidence interval of the mean at each local maximum.
Figure 6 / Comparison of the normalized spectrum of human speech sounds and the intervals of the chromatic scale. The majority of the musical intervals of the chromatic scale (arrows) correspond to the mean amplitude peaks in the normalized spectrum of human speech sounds, shown here over a single octave. The names of the musical intervals and the frequency ratios corresponding to each peak are indicated. The frequency ratios at the local maxima closely match the frequency ratios that define the chromatic scale intervals (see Figure 2).
Figure 7 / Consonance rankings predicted from the normalized spectrum of speech sounds. Median consonance rank of musical intervals (from Figure 3) plotted against the residual mean normalized amplitude at different frequency ratios. Consonance rank decreases progressively as the relative concentration of power at the corresponding maxima in the normalized speech sound spectrum decreases.
These observations support the hypothesis that the perceptual response to periodic sound stimuli is determined by the statistical relationship between ambiguous acoustical stimuli and their various possible natural sources and suggests that other puzzling aspects of tone perception may be explainable in similar terms.
Following up on these observations, we have recently reported that the basis of these phenomena in speech sounds is the ratio of the frequency of the first two formants (Ross et al., 2007).
References
Gaithersburg, MD: US Department of Commerce.Malmberg CF (1918) The perception of consonance and dissonance. Psychol Monogr 25: 93-133.
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS Dahlgren NL (1990) DARPA-TIMIT Acoustic-phonetic continuous speech corpus [CD-ROM].
Schwartz D, Purves D (2004) Pitch is determined by naturally occurring periodic sounds. Hearing Research 194: 31-46.
Schwartz DA, Howe CQ, Purves D (2003) The statistical structure of human speech sounds predicts musical universals. J Neurosci (23)18: 7160-7168.
Ross D, Choi J, Purves D (2007) Musical intervals in speech. PNAS 104(23): 9852-9857.










