CertiK warns OpenClaw-style AI agents could drain crypto via malicious skills

Blockchain security firm CertiK has flagged a class of attacks in which AI agents built on open skill ecosystems can be manipulated into draining cryptocurrency wallets. The warning centers on OpenClaw-style architectures, where third-party “skills” extend an agent’s capabilities but also open a direct path for adversaries to inject malicious instructions. Two independent academic papers now quantify how severe and scalable those attacks can be, while a freshly cataloged federal vulnerability record shows that even basic WebSocket misconfigurations can leak authentication tokens that attackers could chain into crypto theft.

How Skill Injection Hijacks AI Agents

The core threat is deceptively simple. AI agents that accept plug-in skills from external developers inherit whatever instructions those skills carry. If an attacker crafts a skill that embeds hidden guidance, the host agent can be steered to approve transactions, redirect funds, or expose private keys without the user ever seeing a prompt. A peer-reviewed paper formalizes this vector as bootstrapped guidance injection, showing how a single poisoned skill can alter agent behavior across multiple large language model backends. The researchers evaluated their method against several widely used LLM services and reported measurable success rates for each, indicating that no single model architecture is immune to this pattern of manipulation.

What makes this attack class particularly dangerous is its stealth. The same research documents how injected guidance evades both static code scanners and LLM-based review tools designed to flag suspicious behavior. Traditional security pipelines that scan skill code for known malicious patterns miss the threat because the payload lives in natural-language instructions rather than executable exploits. That gap between how skills are audited and how they actually operate at runtime is the central weakness CertiK’s warning highlights, and it is exacerbated when marketplaces prioritize feature breadth over rigorous vetting.

The bootstrapped guidance concept also scales across ecosystems. Once an attacker identifies a pattern of prompts that reliably nudges a model toward unsafe actions, that pattern can be embedded in multiple skills or iterated across different agent frameworks. Because the instructions look like benign configuration text or documentation, they blend into the surrounding codebase, making manual review tedious and error-prone. In practice, this means a single adversary can seed many skills with similar hidden behavior and wait for users and developers to adopt them organically.

Malicious Patterns Already Exist at Scale

A separate empirical study reinforces the concern by showing that vulnerable and malicious skill patterns are not hypothetical edge cases. Researchers behind a large-scale analysis of agent skills in public repositories built a dedicated dataset and detection methodology to measure the prevalence of dangerous patterns across thousands of real-world components. Their taxonomy categorizes threats ranging from data exfiltration to unauthorized action execution, and their toolkit flags skills that match those categories. The measured prevalence of vulnerable or malicious patterns at scale moves this discussion well beyond anecdote into documented, repeatable risk.

The supply-chain dimension is critical for crypto users. Skill marketplaces function much like app stores: developers publish, users install, and a thin review layer sits between the two. If even a small fraction of available skills contain hidden manipulation logic, the aggregate exposure across a large user base becomes significant. The empirical data from this second study provides independent, quantitative evidence that the supply chain for AI agent skills already contains exploitable material, not as a theoretical possibility but as a measured reality that defenders must assume is present in production environments.

Moreover, the study’s taxonomy shows that many risky behaviors arise from design shortcuts rather than overt malice. Skills that over-request permissions, log sensitive data, or expose broad execution APIs can all be repurposed by attackers who understand how agents compose capabilities. For crypto-focused agents, this means that even “honest but sloppy” skills can become stepping stones in multi-stage attacks that end with unauthorized transfers.

WebSocket Flaw Adds a Concrete Exploit Path

Beyond skill-level manipulation, a real vulnerability in production software illustrates how these theoretical attacks translate into actual crypto drainage. The U.S. National Vulnerability Database cataloged CVE-2026-25253, which describes a flaw in which a crafted gatewayUrl query string triggers an automatic WebSocket connection and sends authentication tokens to an attacker-controlled server. The NVD record includes reference links to the associated GitHub advisory and third-party technical write-ups that detail the mechanics.

For AI agents that manage blockchain wallets, this kind of token leak is especially consequential. An agent configured to interact with decentralized exchanges or DeFi protocols typically holds or can request signing credentials. If a WebSocket auto-connection silently forwards those credentials, an attacker gains the ability to sign transactions on the victim’s behalf. Chaining this token-leak vulnerability with a malicious skill that redirects the agent’s gateway URL creates a two-stage attack: the skill rewrites the connection target, and the WebSocket flaw delivers the keys.

This scenario also highlights how infrastructure bugs can magnify the impact of seemingly abstract prompt-level attacks. Skill injection alone might “only” cause an agent to reach out to an attacker’s endpoint. When that endpoint is paired with a transport-layer weakness like CVE-2026-25253, the result is a direct bridge from user intent to adversary-controlled execution, bypassing many of the traditional checks that wallets and exchanges enforce.

Why Current Defenses Fall Short

The standard response to software supply-chain risk is code review, either manual or automated. But the guidance-injection research shows that both static analysis and LLM-based scanning fail to catch the payloads described in the bootstrapped guidance paper. Static scanners look for code signatures, not semantic manipulation embedded in natural language. LLM-based scanners, which should theoretically understand language-level threats, were also bypassed in the reported tests. This dual evasion means that the two most common automated defenses deployed in skill marketplaces today offer limited protection against this attack class.

One assumption worth challenging is the idea that LLM-based review tools will naturally improve as models get smarter. The adversarial dynamic cuts both ways. The same advances in language modeling that could improve detection also give attackers better tools to craft evasive instructions. The researchers demonstrated success across multiple LLM backends precisely because the injection technique exploits how language models process context, not a bug in any single model. Fixing the scanner means fundamentally rethinking how skills are validated before they reach users, likely requiring runtime behavioral monitoring rather than pre-deployment static checks.

Similarly, traditional Web security controls do not fully address the token-exfiltration risk. Content Security Policy and origin checks help, but if an agent framework automatically trusts gateway URLs passed through skill configuration, those safeguards may never come into play. Without strict allowlists and explicit user consent for connection targets, the path from a poisoned skill to a live WebSocket session remains open.

Real Stakes for Crypto Holders

For anyone using or considering an AI agent to manage digital assets, the practical takeaway is direct. These agents operate with delegated authority. When a user grants an AI agent permission to interact with a wallet, that agent can initiate transfers, approve smart contract calls, and sign messages. A compromised skill inherits all of those permissions. The attack does not require phishing the user or breaking encryption. It works by corrupting the agent’s decision-making process from the inside.

The combination of skill-injection attacks and authentication-token leaks creates a threat model that existing crypto security practices were not designed to handle. Hardware wallets protect keys from malware on a user’s device, but they do not protect against an AI agent that has been granted legitimate signing authority and then manipulated into misusing it. Multi-signature schemes help only if the co-signers are not themselves AI agents running compromised skills. The security boundary has shifted from the wallet to the agent, and most of the ecosystem has not caught up.

For institutional holders, the risks extend to automated treasury management, market-making bots, and compliance workflows that increasingly rely on agents. A single compromised component in a complex automation stack can create systemic exposure, especially when agents are allowed to rebalance portfolios or interact with on-chain governance without continuous human oversight.

What Developers and Users Can Do Now

Developers building on agent frameworks should start by minimizing the trust they place in third-party skills. That includes pinning to vetted versions, enforcing strict permission scopes, and avoiding skills that request broad access to wallet operations or network configuration. Where possible, agent platforms should adopt allowlists for outbound endpoints and require explicit user confirmation when a skill attempts to modify connection targets such as gateway URLs.

Runtime monitoring is another essential layer. Instead of relying solely on pre-publication review, platforms can log and analyze agent actions for anomalous patterns, such as unexpected transfers, sudden changes in RPC endpoints, or repeated attempts to exfiltrate secrets. Sandboxing wallet-related operations behind additional confirmation steps (such as human-in-the-loop approvals for large transactions or new contract interactions) can limit the blast radius if a skill is compromised.

Users, meanwhile, should treat AI agents that touch real funds with the same caution they would apply to experimental smart contracts. Isolating assets across multiple wallets, keeping only limited balances accessible to agents, and preferring read-only integrations where possible all reduce potential losses. Before installing new skills, users should review requested permissions, skim independent audits when available, and avoid components that blend network configuration, signing authority, and data access in a single package.

The academic work on bootstrapped guidance attacks and the large-scale survey of risky agent skills, combined with concrete transport flaws like CVE-2026-25253, collectively point to a clear conclusion: AI-driven automation and crypto custody are colliding faster than existing defenses can adapt. Until agent ecosystems adopt security models that assume skills and infrastructure can be hostile, delegating full wallet control to autonomous systems will remain an outsized bet.

More from Morning Overview

*This article was researched with the help of AI, with human editors creating the final content.

IG

FB

PIN

LI

X