Invisible characters hidden in text can trick AI agents into following secret instructions — we tested 5 models across 8,000+ cases
Sentiment Mix
Geography
Expert Signals
thecanonicalmg
author • 2 mentions
r/LocalLLaMA
source • 1 mention
r/artificial
source • 1 mention
Extracted Claims
Reverse CAPTCHA: We tested whether invisible Unicode characters can hijack LLM agents: 8,308 outputs across 5 models.
Supported by 1 story
Two encoding schemes (zero-width binary and Unicode Tags), 5 models (GPT-5.2, GPT-4o-mini, Claude Opus 4, Sonnet 4, Haiku 4.5), 8,308 graded outputs.
Supported by 1 story
Key findings: * **Tool access is the primary amplifier.** Without tools, compliance stays below 17%.
Supported by 1 story
With tools and decoding hints, it reaches 98-100%.
Supported by 1 story
* **Encoding vulnerability is provider-specific.** OpenAI models decode zero-width binary but not Unicode Tags.
Supported by 1 story
Invisible characters hidden in text can trick AI agents into following secret instructions — we tested 5 models across 8,000+ cases.
Supported by 1 story
The biggest finding: giving the AI access to tools (like code execution) is what makes this dangerous.
Supported by 1 story
We tested GPT-5.2, GPT-4o-mini, Claude Opus 4, Sonnet 4, and Haiku 4.5 across 8,308 graded outputs.
Supported by 1 story
Related Events
Large-scale online deanonymization with LLMs
LLMs • 2/27/2026
Hacker used Anthropic's Claude chatbot to attack government agencies in Mexico
LLMs • 2/26/2026
Claude Code Remote Control
LLMs • 2/26/2026
Improving support with every interaction at OpenAI
LLMs • 2/26/2026
OpenAI and Target team up on new AI-powered experiences
LLMs • 2/26/2026
Causality Chain
Preceded By