Teaching a machine to see the houses for the trees
Madison, Wisc. — A man with a corn-husking machine has a problem: Only half the ears entering his device are pointed in the right direction to be stripped clean. He can easily build a mechanism to flip the backward cobs around - if only his machine could tell which ones those were. The man now relies on workers to spot and flip. But he wants the speed and tireless reliability of full automation. He wants his machine to see.
Machine vision comparable to human seeing is a major goal of researchers in robotics. Sighted machines will be capable of tasks many times more complex than making heads or tails of corn cobs - quality control inspections of electrical components, for example, or remote detection of military targets from among decoys.
Today's conventional computers identify an object by matching its image to a template stored in its memory. Because a three-dimensional object can be oriented in almost an infinite nnumber of ways, a similar number of templates must be provided. Finding a match takes time.
Electrical engineer Rafael Sela of the University of Wisconsin here believes an unconventional computer combination is the key to machine vision. His approach is to create a hybrid computer, consisting of a conventional computer linked to an optical computer.
An optical computer has no moving parts. Its imput is an image. Its processor is a special kind of hologram that bends image light in meaningful ways. Its output is a light pattern that can tell a conventional computer if a certain object is the image's source.
The hybrid computer, once perfected, will be a machine that instantly recognizes certain objects regardless of position or size. It will see them.
``Sighted'' machines in use now typically consist of a video camera attached to a conventional computer. A video picture grid of 500 by 500 pixels (picture elements) yields a quarter million image points. Sifting through all that information, pixel by pixel, takes a great deal of digital computer time.
A machine vision designer must write a program that tells a conventional computer how to identify essential features. ``Suppose the problem is classification - you want a machine that tells oranges from pears by sight,'' says Professor Sela. ``First, you examine many examples of each class of object. You determine those features unique to each class - say, asymmetry, the ratio of width to height. If the ratio is close to one, you've got an orange. If it's not close to one, you've got a pear.''
A single feature isn't ususally enough. So you, as designer, must measure other distinguising features. Then you determine a ``decision rule'' from representative feature values. The decision rule separates the values for ``orange'' features from those of ``pear'' features. A conventional computer's program applies the rule to a set of values in the image information it receives. It then can decide which class an object in a scene belongs to. The more complicated the scene, the more calculations are required before the decision rule can be applied.
``A `dumb' camera gives the conventional computer a flat, single picture of what it sees,'' says Sela. It must measure the image point by point, looking for key features. ``Most of this information is useless. The digital computer has to search the scene for clues to tell what a particular object is. An optical computer can extract information about a scene before giving it to the digital computer.'' Here's how the optical computer simplifies a scene:
Rather than microchips and software, an optical computer is made of mirrors, prisms, lenses, and holograms that act like lenses. Image light that enters is rerouted, added, multiplied. The effect is the same as performing millions of conventional calculations on the image. The intensity and location of the light that exits represents a handful of key values to a digital computer. In effect, the optical computer automatically compresses millions of bits of visual data into a few essential ones. the result is a tremendous gain in speed.
The speed advantage applies only to certain kinds of tasks - skimming texts, for example. Let's say you needed to find all the articles about machine vision printed in the last five years. You could set up a machine to ``look'' at every page on microfiche. Whenever the words ``machine vision'' appeared, your optical computer would produce a bright spot of light. Your conventional computer would in turn flag all articles with spots.
This system would be far faster than the manual search, of course, but it has limits. The optical computer would have to be adjusted for the typeface it was examining. Even then, it might be confused by words like ``visit.'' A similar device has read printed zip codes for the United States postal service for many years. When it comes to hand-written addresses, however, the zip code reader is illiterate. Human letter sorters are much better equipped for the variety of human script.
Truly flexible machine vision is quite difficult, according to Prof. Roland Chin, one of Sela's colleagues at the University of Wisconsin. ``For a machine to have that kind of understanding, its knowledge base has to be large enough to deal with all the alternatives it may encounter.
``A machine looking for houses in photographs must understand a house under trees, a house with snow on the roof, a house in the rain. Humans are good at that sort of thing.'' Humans can search their memories quickly and recall a small incident from 20 years ago. For a similar search, says Professor Chin, ``a digital computer's retrieval time would be enormous.''