Four-legged robot traverses tricky terrains thanks to improved 3D vision

0
33
African penguins: Climate refugees from a distant past? A new study on the paleo-historical geographic range of the endangered African penguin since the last Ice Age paints a grave picture of a species in steep decline


Researchers led by the University of California San Diego have developed a new model that trains four-legged robots to see more clearly in 3D. The advance enabled a robot to autonomously cross challenging terrain with ease — including stairs, rocky ground and gap-filled paths — while clearing obstacles in its way.

The researchers will present their work at the 2023 Conference on Computer Vision and Pattern Recognition (CVPR), which will take place from June 18 to 22 in Vancouver, Canada.

“By providing the robot with a better understanding of its surroundings in 3D, it can be deployed in more complex environments in the real world,” said study senior author Xiaolong Wang, a professor of electrical and computer engineering at the UC San Diego Jacobs School of Engineering.

The robot is equipped with a forward-facing depth camera on its head. The camera is tilted downwards at an angle that gives it a good view of both the scene in front of it and the terrain beneath it.

To improve the robot’s 3D perception, the researchers developed a model that first takes 2D images from the camera and translates them into 3D space. It does this by looking at a short video sequence that consists of the current frame and a few previous frames, then extracting pieces of 3D information from each 2D frame. That includes information about the robot’s leg movements such as joint angle, joint velocity and distance from the ground. The model compares the information from the previous frames with information from the current frame to estimate the 3D transformation between the past and the present.

The model fuses all that information together so that it can use the current frame to synthesize the previous frames. As the robot moves, the model checks the synthesized frames against the frames that the camera has already captured. If they are a good match, then the model knows that it has learned the correct representation of the 3D scene. Otherwise, it makes corrections until it gets it right.

The 3D representation is used to control the robot’s movement. By synthesizing visual information from the past, the robot is able to remember what it has seen, as well as the actions its legs have taken before, and use that memory to inform its next moves.

“Our approach allows the robot to build a short-term memory of its 3D surroundings so that it can act better,” said Wang.

The new study builds on the team’s previous work, where researchers developed algorithms that combine computer vision with proprioception — which involves the sense of movement, direction, speed, location and touch — to enable a four-legged robot to walk and run on uneven ground while avoiding obstacles. The advance here is that by improving the robot’s 3D perception (and combining it with proprioception), the researchers show that the robot can traverse more challenging terrain than before.

“What’s exciting is that we have developed a single model that can handle different kinds of challenging environments,” said Wang. “That’s because we have created a better understanding of the 3D surroundings that makes the robot more versatile across different scenarios.”

The approach has its limitations, however. Wang notes that their current model does not guide the robot to a specific goal or destination. When deployed, the robot simply takes a straight path and if it sees an obstacle, it avoids it by walking away via another straight path. “The robot does not control exactly where it goes,” he said. “In future work, we would like to include more planning techniques and complete the navigation pipeline.”

Video: https://youtu.be/vJdt610GSGk

Paper title: “Neural Volumetric Memory for Visual Locomotion Control.” Co-authors include Ruihan Yang, UC San Diego, and Ge Yang, Massachusetts Institute of Technology.

This work was supported in part by the National Science Foundation (CCF-2112665, IIS-2240014, 1730158 and ACI-1541349), an Amazon Research Award and gifts from Qualcomm.



Source link