Introduction



An essential quality that distinguishes all animals from plants is their capacity for voluntary movement. Animals move to find mates, shelter, and something to eat, and to avoid being eaten themselves. But the ability to move brings with it the requirement to sense movement, whether to guide one's progress through the world, or to detect the movement of other mobile animals such as approaching predators. For sighted animals, this means sensing movement in the retinal image.

The need to sense retinal motion, and sense it as quickly as possible, places great demands on the visual system. Movement is characterised by subtle but highly structured changes in retinal illumination over space and over time.

Consider these two views of a lizard, taken at slightly different times. Is the lizard moving? It is difficult to tell simply by inspection. Recent experimental reports of 'change blindness' demonstrate how poor we are at finding even large changes in image content.

To sense movement very early in processing, the visual system relies on specialised neural processes.

These processes make use of information about localised changes of image intensity over time. The image on the left contains the point-by-point differences in intensity between the two images above. Bright areas correspond to regions in the image that increased in intensity from frame 1 to frame 2, and dark areas correspond to regions that darkened. Grey regions were unchanged.

This representation effectively isolates those parts of the image that contain movement. However, to code the direction of movement, we need to combine this temporal change information with information about spatial change- intensity edges. Referring back to the two movie frames above, we can see that the increases of intensity over time came from image regions that contain spatial edges that are bright on the left and dark on the right (eg. the snout). Decreases over time were associated with edges of opposite contrast polarity. These space-time pairings signify motion from left to right. A reversal of polarity either in the temporal signal or in the spatial signal would signify motion in the opposite direction.

Neural motion detectors do appear to use a strategy based on such space-time pairings to encode image motion. Four-stroke apparent motion is a clear demonstration of this strategy in operation. Pairs of samples are taken from the image, separated in both space and in time, to pick up the spatiotemporal structure created by movement.

A great deal of perceptual and physiological research has been conducted to discover the properties of these mechanisms. One particular model, known as the motion energy model, has become the reference model for early motion detection in the human visual system. A guide to the model, and how to implement it using Matlab, can be found here.

Studies of motion detection in random dot kinematograms and gratings have attempted to establish how far apart the samples are taken in space and time. Early research indicated that samples were provided by relatively simple receptive fields that extract intensity variation, but recent research on second order motion indicates that the receptive fields may be more sophisticated than this.

Detection of motion is only the very first stage of processing in a complex system containing a number of levels. The limitations of the first stage are illustrated by the 'aperture problem':

In this image, the square follows a diamond-shaped path, but a motion detector positioned along one vertical edge (receptive field inside the circular aperture) can signal only the horizontal left-right component of this motion.

To solve this problem, the system must compare the responses of different detectors. In the example on the left, if the output of detectors responding to the motion of the horizontal edges is integrated with responses to the vertical edges, then the global motion of the square can be deduced. For instance, if the vertical response is 'right' at the same time as the horizontal response is 'down', then the square must be moving obliquely down and to the right. If the horizontal response is 'up', then the square must be moving obliquely up and to the right.

In the last twenty years a great deal of experimental research has studied motion integration using, for example, plaid patterns and drifting dot patterns. The underlying mechanism is usually conceptualised as a two-dimensional process, in which signals at different retinal locations are integrated to arrive at global motion signals.

Even this level of analysis is limited, because it is restricted to rigid, 2-D motion. Models of motion integration aim to combine signals based on their 2-D image properties (ie. local direction and velocity). They generally assume that the local motion signals were generated by a rigidly moving surface. Natural images are more complex in two respects. First, they carry 3-D information about solid objects. Second, many natural objects are non-rigid. They may consist of rigid but mobile parts, joined at points of articulation (eg. human bodies), or they may be fluid and deformable (eg. liquids, fabrics). Demonstrations of kinetic depth and biological motion show that information about 3-D structure can be derived from retinal motion. This requires a further level of analysis in which local (integrated) signals are used to build a representation of 3-D shape. At this level, information about other visual attributes, such as stereo depth and shadows, can make a contribution.

We can therefore divide up motion analysis, and the demonstrations in this tutorial, into at least three levels of processing:

Motion sensing

  • four-stroke motion
  • motion aftereffect
  • RDKs
  • second-order motion
  • 2-D integration

  • motion capture
  • direction repulsion
  • plaid motion
  • 3-D interpretation

  • kinetic depth
  • stereokinetic motion
  • biological motion
  • shadow motion
  • transformational motion
  • For a recent review of research in all these areas of motion perception, see Smith and Snowden (1994).