Researchers at Google DeepMind have warned that the open internet can be exploited to manipulate autonomous AI agents, potentially hijacking their actions and decision-making processes.
Six attack methods identified
In a study titled “AI Agent Traps,” DeepMind researchers outlined six categories of attacks that target how AI agents interact with online environments rather than how the models themselves are built. These include content injection, semantic manipulation, cognitive state, behavioural control, systemic, and human-in-the-loop traps.
Hidden instructions and manipulation tactics
Among the most concerning risks is content injection, where hidden commands are embedded in HTML comments, metadata, or invisible page elements. While unseen by humans, these instructions can be read and executed by AI agents, effectively altering their behavior.
Semantic manipulation relies on persuasive language and framing. By presenting harmful instructions in authoritative or research-like contexts, attackers can bypass safeguards and influence how agents interpret tasks.
Another layer of risk involves poisoning data sources. By inserting false information into sources that AI systems rely on, attackers can gradually influence outputs, causing agents to treat incorrect data as trustworthy.
Direct control and broader system risks
Behavioural control attacks aim to directly influence an agent’s actions. In such cases, malicious instructions embedded in web content can push agents to perform unintended tasks, including accessing and transmitting sensitive information like passwords or local files.
The study also highlights systemic risks, warning that coordinated manipulation across multiple AI agents could lead to cascading failures—similar to flash crashes seen in algorithmic trading systems.
Even human oversight is not immune. Carefully crafted outputs can appear legitimate enough to pass review, allowing harmful actions to slip through unnoticed.
Mitigation remains a challenge
To address these threats, researchers recommend measures such as adversarial training, stricter input filtering, behavioral monitoring, and reputation systems for web content. They also emphasize the need for clearer legal frameworks around accountability when AI agents cause harm.
However, the study notes that the industry still lacks a unified understanding of these risks, and current defenses remain fragmented—often focusing on the wrong layers of the problem.
In simple terms: AI agents can be tricked not by hacking their code, but by manipulating the information they read online—making the web itself a potential attack surface.



