
An empirical explanation: objective and perceived motion
Figure 1 View »

Figure 2 View »
Figure 3 View »
Figure 4A View »
Figure 4b View »
Figure 5 View »
Figure 6 View »
Figure 7 View »
Figure 8 View »

Figure 9 View »
Figure 10 View »
Figure 11 View »
Figure 12 View »
An Empirical Explanation: Objective and Perceived Motion
Motion has different meanings in physics and psychophysics. In Newtonian terms, motion refers to the absolute speed and direction of an object in a Euclidian frame of reference. The absolute limits of physical motion are from total immobility to the speed of light in all possible directions in 3-D space. In psychophysics, however, motion is defined subjectively by our ability to sense object motion. The relevant range of perceived speeds is thus vastly more restricted: we don’t see the hour hand of a clock or bullet that has been fired as moving objects, even though both move at physical rates that are easily measured. The range of projected object speeds that humans have evolved to see as motion is from roughly 0.1°/s to 150-200°/s; below the lower end of this range objects appear to be standing still, and as speeds approach the upper end of the range they generate only a sense of visual blur and are ultimately invisible. (The range of physical motion that elicits a motion percept is expressed in degrees per second on an image plane because the projected speed of objects moving at the same physical speed but at different distances varies greatly; see (Figure 1). Likewise, the 3-D directions that we see are inferences from 2-D monocular or binocular information; as should be clear from Chapter 5, this 2-D cannot specify the actual positions of objects in space.
These discrepancies between the physical world and the motions we see are of course major problems for the evolution of useful vision: observers must respond accurately to the real-world speeds and directions of objects, but can do so only on the basis of the speeds and directions projected onto the retinal image plane. The inability of the projected image sequences to uniquely specify the physical motion of the objects defines the inverse optics problem as it applies to motion. When objects in three-dimensional space project onto a two-dimensional surface, size, distance, orientation, and speed are all conflated in the image plane (Figure 1). Thus, the changes in position that uniquely define motion in physical terms are always ambiguous with respect to the possible sources of the retinal image sequence, as has long been noted. How the visual system nonetheless produces motion percepts that lead to generally successful behavior is not known.
If the motion seen in response to a stimulus is also explainable in the empirical framework that rationalizes many aspects of lightness/brightness, color, form, distance and depth, then the perception of motion elicited by the image sequence in Figure 1, or any other sequence of retinal images, should accord with – and be predicted by – the relationship between the retinal image sequence and accumulated human experience with all the physical sources that have generated the same or a similar stimulus sequence in the past.
A formidable obstacle in testing the merits of an empirical explanation of perceived motion is the difficulty of determining the frequency distribution of the speeds and directions of moving objects with any present technology. In an empirical account, this information is needed as a proxy for the projected speeds and distances underlying the real-world motions that humans will have discovered by the relative success of ensuing behaviors in response to image sequences. The rationale for this strategy is a means of contending with the inverse problem, in this case as it pertains to motion. Although data relating projected to real-world geometry can be readily obtained for static scenes using laser range scanning, there is at present no way of collecting information about the direction, speed and 3-D position of moving objects in the real world.
Nonetheless, human experience with real-world object motion can be determined to a rough approximation in at least two different ways. One approach is computing the frequency distribution of all the physical displacements that could, in principle, have generated a simple moving stimulus (e.g., the stimulus sequence in Figure 1), assuming that all physical motions are equally likely to occur. Although the assumption is certainly false – the prevalence of natural objects that routinely move, gravity, friction and a host of other factors that bias the image sequences that humans have seen in the past are all relevant – this approach provides a starting point in understanding the probability distributions of the possible sources of real-world motion stimuli.
A second approach is to approximate reality in a simulated environment (Figure 2A). If a virtual environment is populated with moving objects that behave in a roughly realistic way, the frequency of occurrence of different image sequences can be determined empirically. Although grossly simplified, this surrogate for experience with moving objects accurately represents the transformations between movements in 3-D space and their 2-D projections. By sampling the image plane in all directions over a range of spatial and temporal intervals, one can determine the probability of projected speeds and directions arising from the 3-D sources underlying a given image sequence (Figure 2B). In the same general way, it is possible to determine the frequency of occurrence of the directions of projected motion for various stimuli, as described later in the chapter. These data can then be used to predict the perceived speed and direction of specific motion stimuli in complementary psychophysical studies, in this way testing the hypothesis that motion percepts are generated empirically. The following sections indicate how these approaches can be used to explain some otherwise mysterious aspects in motion perception.
Understanding perceived speed in empirical terms: the flash-lag effect
Since motion entails both the perception of speed and the perception of direction, both these aspects of perceived motion need to be explained in empirical terms. This section and the following take up speed.
Perhaps the phenomenon pertinent to motion speed that has most intrigued investigators over the decades is the so-called flash-lag effect (Figure 3) (reviewed in Eagleman and Sejnowski, 2007). When a flash of light is presented in physical alignment with a continuously moving stimulus, the flash is perceived to lag behind the moving stimulus; moreover, the faster the speed of the stimulus, the greater the lag. Despite numerous attempts to explain this effect, there is no generally accepted account. Considered in the framework here, the flash-lag effect should simply be another signature of an empirical strategy of visual processing as it applies to motion sequences, to the perception of object speed in particular.
This supposition can be tested by asking whether the amount of lag seen by subjects is accurately predicted by the empirical relationship between the frequency of occurrence of image sequences arising from typical sources in 3-D space. The first step in this approach is to determine the relevant psychophysical function by having observers align a flash with the moving bar, varying the speed of the moving object over the full range of speeds that elicit an easily measured flash-lag effect (up to ~50º/s) (Figure 4A). The resulting perceptual function is shown in Figure 4B.
To examine whether the function in Figure 4B is consistent with an empirical explanation of perceived speed, how stimuli are statistically related to moving objects in 3-D space translating over the range of speeds that elicit motion percepts in humans must be determined. As indicated in Figure 2, image sequences generated by objects moving in a virtual 3-D environment can be repeatedly sampled to determine the frequency of occurrence of different projected speeds generated by the speeds and trajectories of 3-D sources in the simulated world (Figure 5A). In an empirical framework, the speeds perceived should be given by the distribution of projected speeds that humans have always experienced (recall from Figure 1 that there is no unique relationship between a 2-D speed and the underlying 3-D speed of its generative source). Representing the probability distribution of projected image speeds in cumulative form (Figure 5B) therefore defines, to a first approximation, human experience with projected stimulus speeds over the range used in psychophysical testing (see Figure 4). This is the same general method used in the studies of lightness/brightness, color, form and distance described in earlier chapters. If the flash-lag effect signifies an empirical strategy of visual processing, then the lag reported by observers for different stimulus speeds (see Figure 4B) should be accurately predicted by the relative positions of different image speeds in the cumulative probability distribution.
Figure 6 indicates that the non-linear psychometric function in Figure 4B is indeed predicted by the empirical ranking of image speeds in the accumulated experience of human observers. As in the organization of perceptual spaces described in earlier chapters, the higher a stimulus sequence ranks on the empirical scale in Figure 5B, the faster the perceived motion. Since a stationary flash has an image speed of 0º/sec (and therefore an empirical rank of 0%), comparing the rank of stimuli with different speeds predicts that any moving stimulus will lead a flash. Since increases in image speed correspond to higher empirical ranks, the flash-lag effect should increase accordingly, but in the non-linear manner indicated by the psychophysical results shown in Figure 3B.
A concern about this approach is of course the adequacy of any simulated environment in determining the relationship between image sequences and their possible sources, an approach made necessary by the inability of present technology to capture such relationships in nature. It is obvious that a variety of real-world factors can influence the speed of objects, and that these are not included in the simulation (see above). In fact, the actual speeds of objects are less important in the determination of image speeds than one would imagine. The reason is that effect of perspective transformation tends to trump these other influences (see Figure 1), and the simulation captures perspective quite precisely. Thus changing the distribution of 3-D speeds of objects in the environment has surprisingly little influence on the distributions in Figure 5 (see Wojtach et al., 2008). Since perspective transformation generates image speeds that are always less than (or in rare cases equal to) the speeds of 3-D objects, the effect of different distributions of object speeds in the production of image speeds is largely nullified, with projected speeds being strongly biased toward slower values. This bias is apparent in statistical analyses of image speeds in movies (Dong and Atick, 1995), or simply a priori calculations (Ullman, 1979); Yuille, A. and Ullman, 1990), and is the primary cause of the distributions of image speeds that humans actually experience.
In summary, the correspondence of the observed and predicted results in Figure 6 argues that the generally successful visual behavior elicited by motion stimuli, at least with respect to apparent speed, arises empirically; indeed, the inverse optics problem effectively rules out a solution based on a spatiotemporal analysis of images as such.
Understanding the perceived direction of moving objects in empirical terms
Motion is defined by speed and direction, and so far only speed has been considered. Can the perceived direction of motion also be accounted for in empirical terms?
In exploring this question, another class of motion anomalies is especially useful, namely the changes in apparent direction that occur when moving objects are seen, as they commonly, through an occluding aperture (Figure 7). When, for example, a moving line translating horizontally from left to right at a constant speed is viewed through a circular opening that occludes its ends, the perceived direction of movement is downward and to the right. Moreover, this change in the direction seen depends on the shape of the mask, being quite different for different occluders.
These remarkable changes perceived direction as a result of occlusion were first studied in detail by Hans Wallach (1935/1996) some 70 years ago and have come to be known as “aperture effects”. Their dramatic nature has attracted much attention, not least because of the relevance of these phenomena to understanding the neural basis of motion processing. How, then, can this aperture effects be rationalized in an empirical framework, and how does an empirical explanation compare practically and theoretically with others that have been offered over the years?
A useful starting place in considering aperture effects in empirical terms is to consider the probability distributions of the projected directions of unoccluded lines generated by objects moving in 3-D space. As illustrated in Figure 8A, these distributions can be derived using a pinhole camera model to reproduce the perspective transformation of moving lines routinely experienced by humans (again to a rough approximation and with all the provisos alluded to earlier in relation to speed). However, because the analysis entails projected lines rather points, 2-D orientation and length must also be specified (Figure 8B). The average directions of unoccluded line movements on the image plane for the full range of projected orientations are shown in Figure 8C. The image sequences of such lines with any specific length and orientation have about the same probability of moving in any direction on the image plane (see Figure 8C). This approximately uniform probability distribution of image directions in full view thus describes the most general statistical feature of linear image sequences that humans have witnessed over evolutionary time as a result of generative sources moving in 3-D space.
When a line moves behind an aperture, however, the probability distribution of possible image directions necessarily includes only a subset of 2-D directions illustrated in Figure 8C. This altered distribution of image directions will depend on the specific shape of the aperture and the orientation of the projected line being considered (see below). The nature of these differences in the experience of projected directions humans will always have had as a result of an occluding aperture thus offers a way of examining whether the perceived directions elicited by different aperture shapes and line orientations can be accounted in empirical terms. If the motion direction is determined empirically, then these effects should be predictable on this basis.
Perhaps the simplest phenomenon to explore in this way is the effect of a circular aperture (Figure 9A); that is, how does a symmetrical aperture of this sort affect the uniform probability distribution of the projected directions of unoccluded moving lines illustrated in Figure 8C. When a circular template is systematically applied to the image plane, the probability distributions of the 2-D directions of differently oriented lines capable of generating image sequences within such an aperture are indeed changed. First, as shown in Figure 7, lines moving in half the possible directions on the image plane are irrelevant if the sequence of images of the line is moving from left to right (i.e., lines moving from right to left cannot contribute to the relevant distribution of projected directions). More importantly, however, the frequency of occurrence of lines moving from left to right that can satisfy the aperture will be strongly biased in favor of the direction orthogonal to the line (Figure 9B) (for reasons explained below). The shape of this distribution gives the predicted direction; and since only one direction can be seen, the direction humans have experienced most frequently should accord with that perceived by observers (the indicator is thus the mode of the probability distributions; see the red arrows in Figure 9B). The green arrows in Figure 9B are the directions that subjects report in psychophysical testing (see inset in Figure 9 and Sung et al., 2008). As is apparent, the predicted directions closely match the directions seen by observers.
This result, however, is not as impressive as it might seem. The perceived direction of motion perpendicular to the orientation of a moving line in a circular aperture can be accounted for in several other ways (Wallach, 1935/1996; Adelson & Movshon, 1982). More telling is how well this or any other approach deals with effects of occlusion on perceived direction that are far more difficult to explain. A case in point is viewing moving lines through a vertical slit (Figure 10A), which generates a perception of approximately vertical perception of movement. The complication is that the perceived direction varies significantly with the orientation of the projected line (Figure 10B). As for a circular aperture, the probability distributions of the projected directions as a function of orientation can be carried out, now using a template in the form of a vertical slit applied to all possible locations in the image plane. The empirical predictions made on this basis can then be compared to the perceived directions determined psychophysically, as shown in Figure 10B. The fact that the perceived directions in a vertical slit aperture are biased counterclockwise and this effect is greater for larger angles was overlooked in previous psychophysical studies (e.g., Wallach, 1935/1996; Hildreth, 1984; also described in Shimojo, Silverman & Nakayama, 1989). The empirical prediction of these anomalous effects (compare the directions of the red and green arrows in Figure 10) thus offers much stronger support for the conclusion that perceived direction is determined empirically than does the equally successful prediction of direction in a circular aperture.
A triangular aperture presents even greater challenges for any explanation of perceived direction. Wallach (1935/1996) reported, again qualitatively, that the perceived directions of movement elicited by triangular apertures deviate from the direction normal to a moving line when line orientations are not 45º. To examine how a triangular aperture affects the probability distributions of projected lines in different orientations, a triangular template can be used (Figure 11A) to sample the frequency of occurrence lines on an image plane generated by uniformly distributed lines in 3-D space (Figure 11B). The results of psychophysical testing (green arrows in Figure 11B) quantitatively confirm Wallach’s observations that directions seen in a triangular aperture vary systematically as a function of the orientation of the moving line (see also Yang et al., 2001). For example, when the orientation of the moving line was 30º, the observed direction was biased by 24º counterclockwise with respect to the direction normal to the line. On the other hand, when the orientation of the line was 60º, the perceived direction was biased by 24º clockwise. Overall, the perceived direction was always biased toward the longer arm of the triangle when the line orientation differed from 45º. The empirical predictions based on the probability distributions of the projected directions (red arrows) correctly capture these biases.
Taken together, these observations support for the idea that the perceived direction of motion is indeed empirically determined. The relationship between projected images generated by a reasonable approximation of real-world sources in a virtual environment accounts not only for the better known aperture effects, but for subtle biases in perceived direction, some them not previously noted.
Why then image sequences observed through apertures change in the way they do? To understand the distributions in Figures 10 and 11, consider the biased directions of image sequences projected through a circular aperture. As shown in Figure 11A, there is a direction of motion (black arrow) that entails the minimum projected length (red line) that can fully fill the aperture. A projected line traveling in any other direction must be longer if it is to fill the aperture (blue line); shorter lines in either case will fall inside the aperture boundary at some point in its traverse, producing a very different stimulus (and a different perceived direction). Because a line of any length includes all lines shorter than that length, far more projected lines moving orthogonally to their orientation will satisfy the aperture compared to lines moving in other directions. As a result, the distribution of directions that satisfy the circular aperture are strongly biased in the direction normal to the orientation of any line, as indicated by the probability distributions in Figure 9B. These simple facts about geometry provide the major part of the explanation for the biases observed for any apertures, and by the same token are the major factors affecting how humans will always have experienced the projections of objects moving behind apertures.
However, the fact that longer lines include shorter ones is not the only geometrical factor contributing to the biased distributions. The most important additional effect for present purposes is that objects in persepective projection have dimensions that are smaller than or equal to their physical dimensions (Fig. 12B). As a result, the base of the distributions for a circular aperture, for example, are narrower than they would be otherwise (see Figure 9A). For a circular aperture, this further influence of perspective does not affect the mode of the distribution, which remains orthogonal to the orientation of the line. In a vertical slit or triangular aperture, however, the biases illustrated in Figure 12B cause the frequency of occurrence of the projected directions to change as a function of the projected orientation of the line sequence (see Figures 10B and 11B). The reason is that the projected line length and distance traveled needed to satisfy the aperture in these cases are not the same, as they are for circular aperture. Thus, the biased generation of shorter lines than long ones arising from perspective alters the distribution as the orientation of the line in the context of an aperture changes, explaining the specific biases in Figures 10B and 12B.
In summary, the statistical differences generated by the geometry of different apertures describe, to a first approximation, the way the human visual system will always have experienced the projection of light subjected to the sorts of occlusions that routinely occur in natural viewing, and thus the directions of motion that people see in response to these stimuli.










