Type something to search...
I Scanned 152 Files of My Own AI-Generated Code for Invisible Unicode Malware

I Scanned 152 Files of My Own AI-Generated Code for Invisible Unicode Malware

Two weeks ago, a supply chain attack called Glassworm compromised 150+ GitHub repositories and 72+ browser extensions by hiding malicious payloads in characters that are literally invisible in every editor, terminal, and code review tool on the planet.

I build AI infrastructure for a living. Every hook file, every automation script, every Nexus job in my homelab was generated by Claude Code. When I read the Glassworm post-mortem and saw “targets AI-generated code using invisible Private Use Area Unicode characters,” I had one thought: I should scan my own files.

What Glassworm Actually Did

The Glassworm campaign (March 3–9, 2026) exploited a fundamental property of Unicode that almost no developer thinks about: the character encoding standard has thousands of valid codepoints that render as absolutely nothing. No visual width. No pixel on screen. No line in a diff. But they exist as bytes in your files, and in the right parser context, they can change execution.

The specific technique: insert Variation Selector characters (U+FE00–U+FE0F) adjacent to identifiers or string literals in JavaScript and Python code. These characters are designed to modify the visual rendering of the preceding character — but when placed after an ASCII character, they’re simply invisible noise to the human reviewer. The parser, however, sees them.

“The attack doesn’t need to change what code looks like. It needs to change what code is.”

The Glassworm operators wrapped their payload commits in AI-generated changelogs — benign descriptions of unrelated refactors, generated to pass casual review. The combination of invisible characters and AI-authored cover text made it nearly undetectable through normal review workflows.

Why My Infrastructure Was the Target Profile

I run approximately 30 Claude Code hooks that execute on every tool call, session start, and file write. I have 15 Nexus job scripts that run headlessly on a cron-driven pipeline with access to my homelab infrastructure. All of it was written by an LLM.

This is exactly the profile Glassworm targets:

  • AI-generated code trusted without byte-level review
  • Headless execution with elevated permissions
  • No human review step between generation and execution
  • Active development cycle — new files added regularly

I’m not saying Claude Code has been compromised. I’m saying the attack surface exists, the tooling to audit it is trivial to build, and I had never run it.

Building the Scanner

Python’s unicodedata module makes this straightforward. I wrote a recursive file scanner that checks every text file in my hooks and jobs directories against the known Glassworm character ranges, plus a broader set of invisible Unicode that’s been used in related attacks:

  • U+FE00–U+FE0F: Variation Selectors — Glassworm’s primary vector
  • U+E0100–U+E01EF: Variation Selectors Supplement — secondary vector with 240-character payload capacity
  • U+202A–U+202E: Bidirectional overrides — the Trojan Source attack family
  • U+200B–U+200D: Zero-width characters — string comparison bypass

The scanner runs in under 3 seconds across 152 files. It outputs per-file JSON that can be rendered into an audit report.

What I Found

The scanner flagged 7 files with 8 CRITICAL-severity findings, all in the U+FE00–U+FE0F range — Glassworm’s primary vector. For a moment, that felt alarming.

Then I investigated each finding. Every single one was U+FE0F (Variation Selector 16) following U+26A0 (⚠). The sequence U+26A0 U+FE0F is how UTF-8 encodes the ⚠️ warning emoji. It’s present in my hooks because Claude Code writes console.log('⚠️ Warning: ...') — standard, intentional, completely benign.

The actual Glassworm payload range (U+E0100–U+E01EF) returned zero results. No zero-width characters. No bidirectional overrides. The infrastructure is clean.

The Interesting Part: False Positives Are Feature, Not Bug

The scanner’s “false positive” on ⚠️ emoji is actually the right behavior for a security tool. Here’s why: you want the scanner to flag every occurrence and make you investigate. A scanner that silently skips “probably fine” characters misses the attack.

A more sophisticated version would check context — VS16 is only suspicious when not preceded by a valid emoji base character. I’ve included that logic in the pre-commit hook version of the scanner. For the audit scan, I prefer to over-flag and investigate manually. The investigation confirmed clean code.

What Every Developer with AI-Generated Code Should Do

Three things, in order of effort:

1. Run a one-time audit. Take the scanner from this post and run it against your AI-generated files. It takes 5 minutes. The peace of mind is worth it.

2. Install a pre-commit hook. Add the context-aware version to .git/hooks/pre-commit. It blocks any commit with suspicious invisible Unicode, with a carve-out for legitimate emoji sequences. Two minutes to install, permanent protection.

3. Treat AI-generated code as external input. We review human PRs. We review open-source dependencies. AI-generated code deserves the same skepticism — especially code that runs headlessly with elevated permissions.

The Broader Lesson

We’re in a transitional period where AI-generated code is becoming the majority of new code written in many organizations. The tooling assumptions we built for human-authored code — visual code review, diff comparison, “obviously malicious” pattern recognition — don’t hold when the attacker can generate perfect-looking AI output with invisible payload characters.

The Glassworm campaign is a preview. The techniques will get more sophisticated. The payloads will get more targeted. The AI-generated cover commits will get more convincing.

The good news: the countermeasures are simple. Byte-level scanning catches what visual review misses. The characters have to be in the file. Your tools just need to look for them.

I scanned 152 files. It took 3 seconds. It found 8 findings. I investigated all 8 in 10 minutes. Infrastructure clean.

That’s a good outcome. Make it reproducible.


Scanner source and full audit report: aurora.theklyx.space/aurora/2026-03-17-glassworm-scanner/. Pre-commit hook included in the remediation section.

Related Posts

4 Essentials for Executive & Business Buyin on your Incident Response Plan

4 Essentials for Executive & Business Buyin on your Incident Response Plan

The impact and subsequent fallout from a business-impacting cyber security attack are stressful at the best of times. Experience time and again shows that organizations without the benefit of an Inci

read more
The CyberSecurity & Evolving Threats

The CyberSecurity & Evolving Threats

Cybersecurity is a critical concern in today's world, as more and more of our daily lives are conducted online. The threat landscape is constantly evolving, and it can be challenging to keep up with t

read more
Top 5 things for a Successful Cyber Response 'IR' Plan

Top 5 things for a Successful Cyber Response 'IR' Plan

Incident Response Planning & Strategy How important is an Incident Response Plan? Some studies show that just having a plan, can reduce the cost of a breach [example one](https://insights.integrity36

read more
Pre-Selection Beats Post-Selection: How I Made Claude Code 10-30x Faster

Pre-Selection Beats Post-Selection: How I Made Claude Code 10-30x Faster

Every code navigation costs time. When you multiply 300ms delays across hundreds of searches per day, you're losing hours p

read more
I Ran 849 Tests on AI Context Files. Here's What Actually Works.

I Ran 849 Tests on AI Context Files. Here's What Actually Works.

After 849 controlled tests, $20 in API costs, and a week of experiments, I can tell you exactly how to organize your Claude Code reference files. The short version: Put everything in one flat fol

read more
How I Made Claude Code Safer (And You Can Too)

How I Made Claude Code Safer (And You Can Too)

I've been running Claude Code on real projects for months. It's great at writing code — but it doesn't always understand the consequences of what it writes. Claude Code validates which tools can run.

read more
Claude Code Has Two New CVEs — Here's What They Exploit and How to Harden Your Setup

Claude Code Has Two New CVEs — Here's What They Exploit and How to Harden Your Setup

Your engineers cloned repositories today. Probably dozens. If any of those repos contained a malicious .claude/settings.json, they may have executed arbitrary shell code without a single confirmatio

read more
Four Generations of Broken Promises: Why AI SOC Agents Might Actually Be Different

Four Generations of Broken Promises: Why AI SOC Agents Might Actually Be Different

Series: The SIEM & AI Reckoning — Article 1 of 10Over twenty years and hundreds of vendor pitches, one line never changes: "This is going to change everything." 2005, SIEM. 2012, Next-Gen

read more
The Math Problem AI Just Changed for Security Testing

The Math Problem AI Just Changed for Security Testing

Published: 2026-03-22 | RSA 2026 Pre-Conference SeriesHere's the problem every security team lives with but rarely says out loud. Your environment changes every time a developer merges code,

read more
The SIEM Cost Trap — Why Your Data Lake + AI Agents Will Win

The SIEM Cost Trap — Why Your Data Lake + AI Agents Will Win

If you've ever sat across from your CFO, your VP of Engineering, or your board and tried to explain why your SIEM costs what it costs — you already know how this conversation goes. The short version o

read more
Your Data Lake Is Only as Useful as Its Ability to Answer a Question

Your Data Lake Is Only as Useful as Its Ability to Answer a Question

You moved your security data out of the SIEM and into a data lake. Costs dropped. For the first time in years, you had budget to spare. Then an investigation hit — and your team spent two weeks findi

read more