- June 10, 2026
- By Tom Ventsias
What if teaching a robot a new task was as simple as showing it a video? Thanks to a new artificial intelligence framework developed by University of Maryland researchers, that’s now a possibility.
The system, called HumanEgo, requires no robot demonstrations, no robot-specific training data and no large-scale pretraining. Instead, the robotic systems can acquire new manipulation skills from as little as 30 minutes of first-person videos showing people performing tasks.
The research addresses one of robotics’ most persistent challenges, known as the embodiment gap. It describes the fundamental differences between human bodies and robots—how they look, move and perceive the world—that make it difficult to translate human actions into robotic behaviors.
HumanEgo approaches the problem from a different angle. Rather than teaching robots to imitate human movements, the system focuses on understanding the essence of the interaction—how a hand approaches, grasps, moves and releases an object.
To do that, the researchers developed a new representation called Interaction-Centric Tokens (ICT). The approach captures the spatial relationship between hands and objects, allowing robots to learn the essence of a task regardless of who performs it or what robot eventually carries it out.
The research is detailed in a paper that is currently under review. The UMD study demonstrates that robots can learn useful manipulation skills directly from human video, eliminating the need for robot-specific, pre-programmed demonstrations during training.
“Most robot learning systems today still rely on collecting hundreds or thousands of demonstrations from the robot itself,” said Zhi “Leo” Wang, a doctoral student in computer science and lead author of the study. “HumanEgo shows that robots can instead learn from ordinary human demonstrations recorded with smart glasses, dramatically reducing the amount of specialized data needed to teach new skills.”
The breakthrough taps into the vast amount of knowledge humans generate every day rather than requiring robots to learn each task from scratch through their own experience.
“For decades, robotics has been limited by the need to collect large amounts of robot data,” said Yiannis Aloimonos, professor of computer science and Wang’s adviser. “HumanEgo shows that robots can begin learning directly from human experience.”
One of the biggest challenges is identifying what information should transfer from humans to robots, said Furong Huang, associate professor of computer science and a co-author of the study.
“Our results show that understanding the interaction between a hand and an object is far more important than replicating how a human looks or moves,” she said.
In addition to overcoming the embodiment gap, HumanEgo addresses another major challenge in robotics: learning effectively from limited data. The researchers paired ICT with a generative AI technique known as flow matching, which can model multiple valid ways of completing a task while remaining fast enough for real-time robotic control.
In addition to Wang, Huang and Aloimonos, the project includes Ruohan Gao, assistant professor of computer science, and UMD graduate students Botao He. Kelin Yu and Seungjae Lee.
Aloimonos, Huang and Gao also hold appointments in the University of Maryland Institute for Advanced Computer Studies (UMIACS), which provides technical and administrative support for the project.
AI at Maryland
The University of Maryland is shaping the future of artificial intelligence by forging solutions to the world’s most pressing issues through collaborative research, training the leaders of an AI-infused workforce and applying AI to strengthen our economy and communities.
Read more about how UMD embraces AI’s potential for the public good—without losing sight of the human values that power it.
Learn how Forward: The University of Maryland Campaign for the Fearless will accelerate our momentum in addressing the grand challenges of our time and changing life and lives.