DeepMind identifies six web-based attacks that can hijack AI agents

Researchers at Google DeepMind have warned that the open internet can be exploited to manipulate autonomous AI agents, potentially hijacking their actions and decision-making processes.

Six attack methods identified

In a study titled “AI Agent Traps,” DeepMind researchers outlined six categories of attacks that target how AI agents interact with online environments rather than how the models themselves are built. These include content injection, semantic manipulation, cognitive state, behavioural control, systemic, and human-in-the-loop traps.

Hidden instructions and manipulation tactics

Among the most concerning risks is content injection, where hidden commands are embedded in HTML comments, metadata, or invisible page elements. While unseen by humans, these instructions can be read and executed by AI agents, effectively altering their behavior.

Semantic manipulation relies on persuasive language and framing. By presenting harmful instructions in authoritative or research-like contexts, attackers can bypass safeguards and influence how agents interpret tasks.

Another layer of risk involves poisoning data sources. By inserting false information into sources that AI systems rely on, attackers can gradually influence outputs, causing agents to treat incorrect data as trustworthy.

Direct control and broader system risks

Behavioural control attacks aim to directly influence an agent’s actions. In such cases, malicious instructions embedded in web content can push agents to perform unintended tasks, including accessing and transmitting sensitive information like passwords or local files.

The study also highlights systemic risks, warning that coordinated manipulation across multiple AI agents could lead to cascading failures—similar to flash crashes seen in algorithmic trading systems.

Even human oversight is not immune. Carefully crafted outputs can appear legitimate enough to pass review, allowing harmful actions to slip through unnoticed.

Mitigation remains a challenge

To address these threats, researchers recommend measures such as adversarial training, stricter input filtering, behavioral monitoring, and reputation systems for web content. They also emphasize the need for clearer legal frameworks around accountability when AI agents cause harm.

However, the study notes that the industry still lacks a unified understanding of these risks, and current defenses remain fragmented—often focusing on the wrong layers of the problem.

In simple terms: AI agents can be tricked not by hacking their code, but by manipulating the information they read online—making the web itself a potential attack surface.

DeepMind identifies six web-based attacks that can hijack AI agents

cryptobuzz

Recommended

Trump Media files for Bitcoin, Ether, and Cronos ETFs — could it be a boost for crypto?

Historic UNI Token Burns Coming After Uniswap Vote Passes — Could Prices Soar?

Popular News

U.S. markets rally sharply as Iran–U.S. ceasefire triggers tech-driven rebound in equities

Microsoft introduces an open-source runtime toolkit to strengthen oversight of autonomous AI agents

Alibaba Group unveils 10,000-card AI cluster amid China’s accelerating tech ambitions

Solana price outlook: what comes next after hacks, rugs, and controversy?

XRP arrives in Tokyo: what it means and what’s next for the price