A Primer on Probabilistic Approaches to Visual Perception
A growing body of evidence indicates that visual percepts are generated according to the empirical significance of light stimuli, rather than the characteristics of the stimuli as such. These findings naturally raise the question of how, in these terms, visual perception can be formally rationalized. For those who are interested in thinking about this in more formal terms, this primer compares Bayesian decision theory and empirical ranking theory, two different frameworks for understanding the human visual system in a fundamentally statistical manner; both approaches are predicated on the general idea that the visual system must solve in the inverse optic problem by using empirical information derived from the past experience.
The first and most influential advocate of using past experience as a means of contending with the uncertain provenance of visual stimuli was Hermann von Helmholtz (1866/1924). Helmholtz summarized his conception of this empirical contribution to visual percepts by proposing that the raw "sensations" generated by the physiological infrastructure of the eye and the input stages of the visual brain could be modified by information derived from experience. Helmholtz described this process as making "unconscious inferences" about reality, thus generating perceptions more nearly aligned with stimulus sources when input-level sensations proved inadequate (op cit., vol. III, p.10 ff). Despite these speculations and the ensuing debate during the second half of the 19th C., vision science during most of the 20th C. has been understandably dominated by the enormous success of modern neurophysiology and neuroanatomy. A plausible assumption in much contemporary vision research has thus been that understanding visual perception will be best achieved by gleaning increasingly precise information about the receptive field properties of visual neurons and the synaptic connectivity that gives rise to these properties. As a result, the role of past experience in determining what observers see has, until recently, received relatively little attention.
If the visual system uses empirical information to generate perceptions that reflect the real-world conditions and object relationships that observers have always had to respond to by appropriate visually-guided behavior, then understanding vision inevitably means understanding how, in statistical terms, physical sources are related to retinal images. By far the most popular approach to meeting this challenge has been Bayesian decision theory.
Thomas Bayes was an 18th C. minister and amateur mathematician whose paper entitled "An Essay towards Solving a Problem in the Doctrine of Chances"
was published posthumously in 1763. The manuscript proved a theorem showing how conditional probabilities are used in making inferences. Although Bayes' purpose in elaborating his eponymous theorem remains obscure, it has been applied to advantage in a number of disciplines as a framework for addressing statistical problems whose solution depends on an assessment of hypotheses that are only more or less likely to be true as a result of complex circumstances. In vision research, Bayes' theorem was initially used to develop pattern recognition strategies for computer vision. More recently, however, the framework provided by the theorem has been advocated as a means of rationalizing visual perception (or at least the judgments associated with visual perception).
Bayes' theorem is usually written in the form
|P(H|E) = ||P(H) * P(E|H)
where H is a hypothesis, E the evidence pertinent to its validity and P probability. The first term on the right side of Bayes' equation, P(H), is referred to as the prior probability distribution or simply the prior, and is a statistical measure of confidence in the hypothesis, absent any present evidence pertinent to its truth or falsity. With respect to vision, the prior describes the relative probabilities of different physical states of the world pertinent to retinal images, i.e., the relative frequency of occurrence of various illuminants, surface reflectance values, object sizes and so on. The second term, P(E|H), is called the likelihood function. If hypothesis H were true, this term indicates the probability that the evidence E would have been available to support it. In the context of vision, given a particular state of the physical world (i.e., a particular combination of illumination, reflectance properties, object sizes etc.), the likelihood function describes the probability that the state would generate the retinal projection in question. The product of the prior and the likelihood function, divided by a normalization constant, P(E), gives the posterior probability distribution, P(H|E). The posterior distribution defines the probability of hypothesis H being true, given the evidence E. In vision, the posterior probability distribution thus indicates the relative probability of a given retinal image having been generated by one or another of the different physical realities that might be the source of the image.
To illustrate how Bayes' theorem might be used to rationalize the percepts elicited by a visual stimulus, consider as a highly simplified example of the apparent brightness of a patch having a particular luminance value in the retinal image. (Luminance refers to the light reflected from a surface corrected for the sensitivity of human vision; we use it here as a practical index of the amount of light that actually falls on the retina). In terms of physical measurement, the luminance of the patch is, to a first approximation, the product of the amount of light falling on the object surface (the illumination) and the reflectance efficiency function of the surface underlying that part of the image. To further simplify the situation, consider illumination (Ω) and reflectance (R) the only parameters needed to specify the physical reality underlying the image (in fact, the amount of light in the stimulus will, in normal circumstances, be affected by the transmittance of the intervening atmosphere and a host of other factors).
Applying Bayes' theorem, the brightness experienced by an observer in this example would be derived from the pertinent posterior probability distribution P(Ω, R | L), where L stands for the luminance of the image. The first step, then, is to compute the prior distribution P(Ω, R), i.e., the probability distribution of different conditions of illumination occurring in conjunction with different surface reflectance values in the physical world (Figure 1A
). In principle, this distribution could be generated by sampling a large number of points in typical physical environments, measuring at each location the illumination and surface reflectance (a difficult but not impossible task; in practice, the illumination and reflectance values are usually assumed). The next step is to derive the likelihood function P(L | Ω, R), which describes, for each possible combination of illumination and reflectance, the probability that the combination would generate the luminance value of the image patch in question. If this process were free of noise, then particular combinations of illumination and reflectance values would generate a specific value of luminance. With respect to the luminance of the image patch under consideration, some of these combinations would inevitably produce that luminance and therefore have a probability of 1 in the likelihood function, whereas all others would be incapable of producing that luminance, and thus have a probability of 0. Because biological image formation is inevitably noisy, some Gaussian noise is typically added, making the values of 1 in the likelihood function somewhat less than 1, and values of 0 somewhat larger than 0.
Figure 1 / A Bayesian approach to characterizing the relationship
between a visual target of luminance L and its possible physical
sources. A) The prior distribution of illuminations (W) and
reflectance values (R) in the physical world. The distribution (which
is didactic only) shows illumination varying on an arbitrary scale of
0 to 100, and reflectance varying from 0% to 100%. B) The dashed red
line on the surface of the prior distribution indicates the position
where the product of illumination and reflectance equals the
luminance (L) of the target. If the image formation process is
assumed to be free of noise, the posterior distribution, P(W, R | L),
obtained by multiplying the prior by a likelihood function is the
section of the prior distribution along the dashed line. C) The
addition of Gaussian noise to the image formation process makes the
posterior distribution 'thicker' but does not alter the fact that
the posterior is effectively a section of the prior.
Finally, the posterior distribution, P(Ω, R | L), is obtained by multiplying the prior distribution in Figure 1A
by the likelihood function (which comprises zeros and ones, or, with added noise, approximations thereof) (Figure 1B
). The posterior distribution therefore describes the relative probabilities of all the possible physical sources that could have generated the specific image under consideration. In effect, the posterior is a section of the prior distribution; in the example here the section is the set of all the specific combinations of illumination and reflectance values whose product is L (Figure 1B
). The addition of Gaussian or other noise to the likelihood function only increases the 'thickness' of the section (Figure 1C
Bayesian decision theory
Because the posterior distribution indicates only the relative probabilities of a set of possible image sources, a particular source (i.e., a particular combination of illumination and reflectance in the example above) must be selected from this set if the aim is to predict what an observer will actually see. The usual way of addressing this further issue is to assume that the visual system makes this choice according to the behavioral consequences associated with each perceptual "decision". The influence of various consequences is typically expressed in terms of the discrepancy between the decision made and the actual state of the world, which over the full range of the possible choices defines a gain-loss function. Since there is no a priori way to model this function (indeed, given the enormous number of variables involved, a realistic gain-loss function for some aspect of vision would be extraordinarily difficult to determine), the relative cost of different behavioral responses is assumed. For example, a common assumption is that observers will "choose" the percept that corresponds to the maximum value in posterior probability distribution, since this choice would generally minimize the discrepancy between the percept and the actual state of the world.
Whatever the assumption made about an appropriate gain-loss function, applying the resulting rule to the posterior probability distribution allows the investigator to state the specific physical condition that, in this conceptual framework, corresponds to what observers actually see in response to the stimulus in question. In the example of brightness perception used here, the brightness seen in response to the luminance of the stimulus would thus correspond to the most probable physical reflectance and illumination values underlying the stimulus, much as Helmholtz had initially suggested (since we don't generally think of perceiving illumination, Helmholtz and others since have tended to describe this interpretation in terms of seeing the most likely underlying reflectance). The need for this additional gain-loss function in using Bayes' theorem to predict visual percepts explains why this general approach is referred to as Bayesian decision theory.
In sum, Bayesian decision theory in its basic application determines the physical source(s) capable of generating a given retinal image and the relative probabilities of their actually having done so; the percepts predicted are therefore explicit models of world structure.
Empirical ranking theory
The application of Bayesian decision theory to vision is clearly an important advance in that it formalizes Helmholtz's general proposal about "visual inferences" as a means of contending with stimulus uncertainty. Nonetheless, its implementation presents both conceptual and practical difficulties. With respect to the conceptual implications of Bayesian theory applied to visual perception, the intuitively appealing idea that percepts correspond to physical characteristics such as surface reflectance is problematic and in many instances false (as we explain in a later section). Practical obstacles are the difficulty determining the physical parameters relevant to any specific prior, and the need for a decision rule based on an assumed gain-loss function. Is there, then, any other way of conceptualizing how vision utilizes empirical information to deal with the inverse optics problem?
The alternative to Bayesian decision theory that we have used in rationalizing visual percepts begins by abandoning the idea that vision entails inferences (whether conscious or unconscious) about the properties of the physical world, the concept inherent in the application of Bayes' theorem to visual perception. The conceptual basis of this alternative approach is that the percept elicited by any particular stimulus parameter (e.g., the brightness elicited by the luminance of a stimulus) corresponds not to a statistically determined value of the relevant qualities in the physical world (e.g., the most likely illumination and reflectance values underlying that luminance), but rather to the relative frequency of occurrence of that particular stimulus parameter in relation to all other instances of that parameter experienced in the past. For example, with respect to the perceptual quality of brightness, the brightness perceived in response to the luminance of region of a visual scene would be determined by how often the specific luminance had occurred relative to the occurrence of all the other luminance values in that context in the past experience of observers. In other words, the brightness elicited by a target is determined by the empirical rank of the relevant luminance value within the full range of experience with similar scenes (see Yang and Purves, 2005; Howe and Purves, 2005 for examples of how this approach has actually been used). This biological rationale of this approach is that it is obviously desirable to have the full perceptual range for any visual quality (from the brightest percept we can have to the dimmest, for example) aligned with the full range of the relevant stimulus parameters generated by the physical world (from the most intense luminance experienced in visual stimuli to the least intense).
Figure 2 / The empirical ranking approach to characterizing the
relationship between images and their possible physical sources. A)
The prior distribution of illumination and reflectance values shown
in Figure 2A can be integrated along the dashed red lines (which
represent only a few examples) to produce the marginal distribution
in (B). Each dashed line is an iso-luminant line along which the
product of illumination and reflectance is a specific luminance
value. B) The marginal distribution derived by integrating the
distribution in (A) along iso-luminant lines. This distribution
describes the relative probability of occurrence of the physical
sources of different luminance intensities in human experience. C)
The cumulative probability distribution derived from (B). The
cumulative probability for any specific luminance value, l, is the
summed probability of occurrence of the physical sources that
generate luminance values less than or equal to that luminance,
derived by calculating the area underneath the curve in (B) and to
the left of the position where x = l. Thus, the y-value of each point
on the cumulative distribution indicates the percentage of physical
sources that generate luminance values less than or equal to a
specific luminance, providing a basis for ranking that luminance
value in the past experience. In the example shown, luminance L'
holds a higher rank (r') than luminance L (which holds rank r), and
should thus be seen as brighter than L.
To illustrate how this concept of visual perception can be used to predict what observers actually see in response to a given stimulus, we continue with the example of the perceived brightness of a patch in the retinal image. Prior experience with the relative frequency of different states of the world (i.e., the prior distribution in Figure 1A
) is integrated according to the luminance values produced by every combination of illumination and reflectance in the distribution (Figure 2A
), in this way generating a marginal (i.e., integrated) form of the prior (Figure 2B
). The marginal distribution in Figure 2B
describes the probabilities of occurrence of the full range of possible luminance values, each point in the distribution being the summed probability of occurrence of all the physical conditions that could have generated a specific luminance.
The cumulative probability distribution derived from this marginal distribution is, in effect, an empirical scale that orders the full range of luminance values according to past experience (Figure 2C
). The rank of any given luminance on this scale is determined by the percentage of all the physical sources that generated luminance values less than the value at issue, and the percentage that generated greater luminance values. The higher the percentage of physical sources that in past experience generated luminance values less than the luminance at issue, the higher that luminance ranks on the empirical scale defined by the cumulative distribution, and thus the brighter the percept elicited, as illustrated in Figure 2C
. Such ranking indicates how the full range of a physical feature (luminance in this example) is mapped to the full range of the corresponding perceptual space (brightness) according to past experience, which is why this approach is referred to as "empirical ranking theory".
Notice that the rank of a luminance value bears no direct relationship to the possible underlying values of reflectance and illumination, taking on meaning only in comparison to the rank of other luminance values on the same empirical scale (this same general argument would apply to any perceptual quality). As a result, the perceptual prediction generated by empirical ranking always entails an assessment of two or more stimuli, or two or more regions within a stimulus.
It should be apparent from this account that the difference between these two empirical approaches - Bayesian decision theory and the empirical ranking theory - is their different conception of visual perception. Bayesian decision theory, as it has typically been applied to vision, supposes that perceptions are effectively inferences about physical properties of the objects and conditions in the world. In empirical ranking theory visual perceptions are conceived as statistical constructs that have no direct correspondence to the possible real-world sources of a stimulus. In this alternative framework, visual percepts are simply subjective sensations that link visual stimuli to the empirical significance of their sources according to the success or failure of visually guided behavior in past experience. Deciding which approach is the more useful and the more appropriate framework for predicting and understanding will depend on the ability of these theories to explain the full range of the numerous puzzles vision presents, and which seems to offer the more realistic framework for how vision has evolved.
Bayes, T. (1763). An essay toward solving a problem in the doctrine of chances. Philos Trans R Soc, 53, 370-418.
Yang, Z, Purves, D (2004). The statistical structure of natural light patterns determines perceived light intensity. Proceedings of the National Academy of Sciences of the United States of America, 101, 8745-8750.
Howe CQ, Purves D (2005) Perceiving Geometry: Geometrical Illusions Explained in Terms of Natural Scene Statistics. New York: Springer.
Catherine CQ, Lotto RB, Purves D (2006) Empirical approaches to understanding visual perception. J Theor Biol 241: 866-875.