HomeNewsResearchers are improving peripheral vision in AI models

Researchers are improving peripheral vision in AI models

Peripheral vision allows people to see shapes that are usually not directly in our line of sight, albeit with less detail. This ability expands our sight view and will be helpful in lots of situations, corresponding to detecting a vehicle approaching our automotive from the side.

Unlike humans, AI doesn’t have peripheral vision. Equipping computer vision models with this capability could help detect approaching dangers more effectively or predict whether a human driver would notice an oncoming object.

To take a step on this direction, MIT researchers have developed a picture dataset that enables them to simulate peripheral vision in machine learning models. They found that training models on this data set improved the models' ability to acknowledge objects within the visual periphery, although the models still performed worse than humans.

Their results also showed that, unlike humans, neither the scale of objects nor the quantity of visual clutter in a scene had a powerful impact on AI performance.

“Something fundamental is going on here. We've tested so many various models, and even after we train them, they get a little bit bit higher, but they're not quite like humans. So the query is: What are these models missing?” says Vasha DuTell, postdoctoral researcher and co-author of a Paper detailing this study.

Answering this query could help researchers develop machine learning models that may higher see the world the best way humans do. In addition to improving driver safety, such models may be used to develop displays which might be easier for people to read.

Additionally, a deeper understanding of peripheral vision in AI models could help researchers higher predict human behavior, adds lead writer Anne Harrington MEng '23.

“Modeling peripheral vision may help us understand the features of a visible scene that make our eyes move to assemble more information if we will truly capture the essence of what’s being represented within the periphery,” she explains.

Her co-authors include Mark Hamilton, a graduate student in electrical engineering and computer science; Ayush Tewari, a postdoctoral fellow; Simon Stent, research director at Toyota Research Institute; and senior authors William T. Freeman, Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Ruth Rosenholtz, senior research scientist within the Division of Brain and Cognitive Sciences and a member of CSAIL. The research shall be presented on the International Conference on Learning Representations.

“Any time a human interacts with a machine – a automotive, a robot, a user interface – it is amazingly essential to know what the person can see. Peripheral vision plays an important role on this understanding,” says Rosenholtz.

Simulation of peripheral vision

Extend your arm in front of you and lift your thumb—the small area around your thumbnail shall be seen out of your fovea, the small depression in the middle of your retina that gives the sharpest vision. Everything else you possibly can see is in your visual periphery. Your visual cortex presents a scene with less detail and reliability the further it gets from this sharp point of focus.

Many existing approaches to modeling peripheral vision in AI represent this deteriorating level of detail by blurring the perimeters of the image, however the loss of knowledge that happens within the optic nerve and visual cortex is way more complex.

For a more precise approach, MIT researchers began with a method used to model peripheral vision in humans. This method, referred to as texture tile modeling, transforms images to represent a human's visual loss of knowledge.

They have modified this model in order that it could actually transform images in an analogous way, but in a more flexible way that doesn’t require knowing prematurely where the person or AI will direct its eyes.

“This allows us to model peripheral vision as faithfully as is completed in research on human vision,” says Harrington.

The researchers used this modified technique to generate an enormous dataset of transformed images that appear more textured in certain areas to represent the lack of detail that happens as a human looks further into the periphery.

They then used the dataset to coach multiple computer vision models and compare their performance to that of humans on an object recognition task.

“We needed to be very clever in organising the experiment in order that we could also test it within the machine learning models. “We didn’t wish to should retrain the models to do a toy job that they weren’t intended for,” she says.

Strange performance

Humans and models were shown pairs of transformed images that were an identical except that in a single image there was a goal object within the periphery. Each participant was then asked to pick the image containing the goal object.

“One thing that basically surprised us was how well people were in a position to recognize objects of their periphery. We went through a minimum of 10 different sets of images that were just too easy. We had to make use of smaller and smaller objects,” adds Harrington.

The researchers found that training models from scratch on their data set resulted in the best performance gains and improved their ability to detect and recognize objects. Fine-tuning a model with its data set, a process by which a pre-trained model is optimized in order that it could actually perform a brand new task, resulted in smaller performance gains.

But in any case, the machines weren't nearly as good as humans, and so they were particularly bad at detecting objects within the distant periphery. Their performance also didn’t follow the identical patterns as humans.

“This could suggest that the models are usually not using context in the identical way that humans do to finish these recognition tasks. The strategy of the models might be different,” says Harrington.

The researchers plan to further investigate these differences, with the goal of finding a model that may predict human performance within the visual periphery. This could, for instance, enable AI systems that alert drivers to dangers they could not see. They also hope to encourage other researchers to conduct further computer vision studies using their publicly available dataset.

“This work is very important since it contributes to our understanding that, as a result of the limited variety of our photoreceptors, human peripheral vision mustn’t be viewed as just impoverished vision, but relatively as a representation optimized for performing real-world tasks. “-world consequence,” says Justin Gardner, an associate professor within the Department of Psychology at Stanford University, who was not involved on this work. “Furthermore, the work shows that, despite their advances lately, neural network models cannot match human performance on this regard, which should result in more AI research being conducted to learn from the neuroscience of human vision “This future research shall be greatly assisted by the database of images mimicking human peripheral vision provided by the authors.”

This work is supported partly by the Toyota Research Institute and the MIT CSAIL METEOR Fellowship.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read