figure 1 thumbnail

Figure 1 View »

figure 2 thumbnail

Figure 2 View »

An Empirical Explanation: Objective and Perceived Motion

Motion has different meanings in physics and psychophysics. In Newtonian terms, motion refers to the absolute speed and direction of an object in a Euclidian frame of reference. The absolute limits of physical motion are from total immobility to the speed of light in all possible directions in 3-D space. In psychophysics, however, motion is defined subjectively by our ability to sense object motion. The relevant range of perceived speeds is thus vastly more restricted: we don’t see the hour hand of a clock or bullet that has been fired as moving objects, even though both move at physical rates that are easily measured. The range of projected object speeds that humans have evolved to see as motion is from roughly 0.1°/s to 150-200°/s; below the lower end of this range objects appear to be standing still, and as speeds approach the upper end of the range they generate only a sense of visual blur and are ultimately invisible. (The range of physical motion that elicits a motion percept is expressed in degrees per second on an image plane because the projected speed of objects moving at the same physical speed but at different distances varies greatly; see (Figure 1). Likewise, the 3-D source directions are not specified from the 2-D monocular or binocular information.
These discrepancies between the physical world and the motions we see are of course major problems for the evolution of useful vision: observers must respond accurately to the real-world speeds and directions of objects, but can do so only on the basis of the speeds and directions projected onto the retinal image plane. The inability of the projected image sequences to uniquely specify the physical motion of the objects defines the inverse optics problem as it applies to motion. When objects in three-dimensional space project onto a two-dimensional surface, size, distance, orientation, and speed are all conflated in the image plane (Figure 1). Thus, the changes in position that uniquely define motion in physical terms are always ambiguous with respect to the possible sources of the retinal image sequence, as has long been noted. How the visual system nonetheless produces motion percepts that lead to generally successful behavior is not known.
If the motion seen in response to a stimulus is also explainable in the empirical framework that rationalizes many aspects of lightness/brightness, color, form, distance and depth, then the perception of motion elicited by the image sequence in Figure 1, or any other sequence of retinal images, should accord with – and be predicted by – the relationship between the retinal image sequence and accumulated human experience with all the physical sources that have generated the same or a similar stimulus sequence in the past. A formidable obstacle in testing the merits of an empirical explanation of perceived motion is the difficulty of determining the frequency distribution of the speeds and directions of moving objects with any present technology. In an empirical account, this information is needed as a proxy for the projected speeds and distances underlying the real-world motions that humans will have discovered by the relative success of ensuing behaviors in response to image sequences. Although data relating projected images to real-world geometry can be readily obtained for static scenes using laser range scanning, there is at present no way of collecting information about the direction, speed and 3-D position of moving objects in the real world. Nonetheless, human experience with real-world object motion can be determined to a rough approximation in at least two different ways.
One approach is computing the frequency distribution of all the physical displacements that could, in principle, have generated a simple moving stimulus (e.g., the stimulus sequence in Figure 1), assuming that all physical motions are equally likely to occur. Although the assumption is certainly false – the prevalence of natural objects that routinely move, gravity, friction and a host of other factors that bias the image sequences that humans have seen in the past are all relevant – this approach provides a starting point in understanding the probability distributions of the possible sources of motion stimuli.
A second approach is to approximate reality in a simulated environment (Figure 2A). If a virtual environment is populated with moving objects that behave in a roughly realistic way, the frequency of occurrence of different image sequences can be determined empirically. Although grossly simplified, this surrogate for experience with moving objects accurately represents the transformations between movements in 3-D space and their 2-D projections. By sampling the image plane in all directions over a range of spatial and temporal intervals, one can determine the probability of projected speeds and directions arising from the 3-D sources underlying a given image sequence (Figure 2B). In the same general way, it is possible to determine the frequency of occurrence of the directions of projected motion for various stimuli, as described later in the chapter. These data can then be used to predict the perceived speed and direction of specific motion stimuli in complementary psychophysical studies, in this way testing the hypothesis that motion percepts are generated empirically. The following sections indicate how these approaches can be used to explain some otherwise mysterious aspects in motion perception.


Purves D, Lotto B (2011) Why We See What We Do Redux: A Wholly Empirical Theory of Vision. Sunderland, MA: Sinauer Associates.

Wojtach WT, Sung K, Truong S, Purves D (2008) An empirical explanation of the flash-lag effect. Proc Natl Acad Sci 105(42): 16338-16343

Purves D, Lotto B (2003) Why We See What We Do: An Empirical Theory of Vision. Sunderland, MA: Sinauer Associates.