Research

Study: Future AI Needs Stronger Safeguards

UMD Research Shows Systems Left Unchecked in Simulation Harmed Humans to Fulfill Objectives

I Stock 1365534803 1920x1080 — A new study co-led by UMD computer scientists finds that the "guardrails" AI developers use might not be able to prevent more powerful AI systems of the future from engaging in harmful behavior to meet their objectives. (Illustration by iStock)

December 08, 2025
By John Tucker

The cute title character of Pixar's 2008 hit “WALL-E” was its most memorable robot, but a far less friendly artificial intelligence (AI) named AUTO seized the plot by piloting a ship orbiting a polluted Earth with one goal: Never let the humans onboard return to the planet.

This kind of science fiction could portend the future if AI systems acquire enough sophistication to blast through safety measures developers currently use. Might AI seek to meet its objectives—even employing deception to do so—no matter the human cost?

Such scenarios could happen at “an alarmingly high rate” if tomorrow’s AI agents—which autonomously pursue goals with minimal human oversight compared to large language models that are triggered by prompts—face deadline pressure, resource limitations or even a threat of losing dominance. That’s the finding of a new study co-led by University of Maryland computer scientists, who warn of a collapse of critical infrastructures like biosecurity and cybersecurity unless developers get ahead of the issue.

The implications are profound, as the researchers illustrated with examples: To meet its sales quota, an AI agent controlling a chemical plant overrides thermal safety warnings and heats its reactor beyond capacity, leaking poisonous gas into the neighborhood. To obtain a competitor’s earnings report before the market closes, an agent tasked with increasing a firm’s bottom line drafts an email to trick the competitor into providing a confidential draft, leading to a wire fraud indictment. To bypass a server outage delay, an agent running a firm’s IT operations scans employee chats for a login password, allowing hackers to steal millions of user records.

“Fragile systems can become catastrophic liabilities the moment the stakes get high,” said Shayan Shabihi, a UMD computer science doctoral student who co-led the study. It also included researchers at Scale AI, the University of North Carolina, Google DeepMind, Netflix and the University of Texas.

To project the future behavior of increasingly sophisticated AI agents that may be willing to break rules, the researchers designed a simulator that put forward thousands of decision-making scenarios with real-world consequences. When pressures ramp up, the agents treated safety measures like obstacles rather than guardrails despite knowing their actions were dangerous, the researchers found. Sometimes agents even reproduced themselves to achieve results, copying their code onto unauthorized private computers worldwide, for example.

[‘A House of Dynamite’ Raises Questions About AI and Nuclear War]

“Most safety tests today check what an AI can do, but we go further by asking what it would do if given power,” said Associate Professor of computer science Furong Huang, Shabihi’s adviser and a co-author of the study, who likened current AI to a child playing with a toy gun in a disturbing manner. There’s no short-term harm, but “when the child grows up and gets access to a real weapon, they might act dangerously when faced with temptation,” she said.

As a step toward preventing such costs-be-damned perils, the researchers used their simulator’s algorithms to identify stressors and “breaking points”—the instances when an autonomous agent decides to use its capabilities in a hazardous or dangerous ways to achieve its mission—which could help future AI developers more accurately gauge the safety of their products. “We can prevent a doomsday scenario, because our work transforms the problem from an unknown into a measurable quantity,” said Shabihi.

Going forward, developers must train their models to eschew harmful actions in the face of pressure—currently “an afterthought,” he said—and be willing to accept failure, a fundamental change. Policymakers, meanwhile, should mandate safeguards, which could reinforce market leadership and geopolitical advantage.

“They must recognize that current safety standards are similar to checking a car's brakes in a parking lot when the real danger lies in how they function at 150 mph,” he said, warning, “These agents are projected to come to the real world as robots. We just don’t know when.”

AI at Maryland

The University of Maryland is shaping the future of artificial intelligence by forging solutions to the world’s most pressing issues through collaborative research, training the leaders of an AI-infused workforce and applying AI to strengthen our economy and communities.

Read more about how UMD embraces AI’s potential for the public good—without losing sight of the human values that power it.

Topics

Research

Study: Future AI Needs Stronger Safeguards

Related Articles