Supply-chain attack using invisible code hits GitHub and other repositories

Ars Technica··17 分で読めます
シェア:X (Twitter)

Text settings Story text Size Small Standard Large Width * Standard Wide Links Standard Orange * Subscribers only Learn more Minimize to nav Researchers say they’ve discovered a supply-chain attack flooding repositories with malicious packages that contain invisible code, a technique that’s flummoxing traditional defenses designed to detect such threats. The researchers, from firm Aikido Security, said Friday that they found 151 malicious packages that were uploaded to GitHub from March 3 to March 9. Such supply-chain attacks have been common for nearly a decade. They usually work by uploading malicious packages with code and names that closely resemble those of widely used code libraries, with the objective of tricking developers into mistakenly incorporating the former into their software. In some cases, these malicious packages are downloaded thousands of times. Defenses see nothing. Decoders see executable code The packages Aikido found this month have adopted a newer technique: selective use of code that isn’t visible when loaded into virtually all editors, terminals, and code review interfaces. While most of the code appears in normal, readable form, malicious functions and payloads—the usual telltale signs of malice—are rendered in unicode characters that are invisible to the human eye. The tactic, which Aikido said it first spotted last year, makes manual code reviews and other traditional defenses nearly useless. Other repositories hit in these attacks include NPM and Open VSX. The malicious packages are even harder to detect because of the high quality of their visible portions. “The malicious injections don’t arrive in obviously suspicious commits,” Aikido researchers wrote. “The surrounding changes are realistic: documentation tweaks, version bumps, small refactors, and bug fixes that are stylistically consistent with each target project.” The researchers suspect that Glassworm—the name they assigned to the attack group—is using LLMs to generate these convincingly legitimate-appearing packages. “At the scale we’re now seeing, manual crafting of 151+ bespoke code changes across different codebases simply isn’t feasible,” they explained. Fellow security firm Koi, which has also been tracking the same group, said it, too, suspects the group is using AI. The invisible code is rendered with Private Use Areas (sometimes called Private Use Access), which are ranges in the Unicode specification for special characters reserved for private use in defining emojis, flags, and other symbols. The code points represent every letter of the US alphabet when fed to computers, but their output is completely invisible to humans. People reviewing code or using static analysis tools see only whitespace or blank lines. To a JavaScript interpreter, the code points translate into executable code. The invisible Unicode characters were devised decades ago and then largely forgotten. That is, until 2024, when hackers began using the characters to conceal malicious prompts fed to AI engines. While the text was invisible to humans and text scanners, LLMs had little trouble reading them and following the malicious instructions they conveyed. AI engines have since devised guardrails that are designed to restrict usage of the characters, but such defenses are periodically overridden. Since then, the Unicode technique has been used in more traditional malware attacks. In one of the packages Aikido analyzed in Friday’s post, the attackers encoded a malicious payload using the invisible characters. Inspection of the code shows nothing. During the JavaScript runtime, however, a small decoder extracts the real bytes and passes them to the eval() function. const s = v => [...v].map(w => ( w = w.codePointAt(0), w >= 0xFE00 && w <= 0xFE0F ? w - 0xFE00 : w >= 0xE0100 && w <= 0xE01EF ? w - 0xE0100 + 16 : null )).filter(n => n !== null); eval(Buffer.from(s(``)).toString('utf-8')); “The backtick string passed to s() looks empty in every viewer, but it’s packed with invisible characters that, once decoded, produce a full malicious payload,” Aikido explained. “In past incidents, that decoded payload fetched and executed a second-stage script using Solana as a delivery channel, capable of stealing tokens, credentials, and secrets.” Since finding the new round of packages on GitHub, the researchers have found similar ones on npm and the VS Code marketplace. Aikido said the 151 packages detected are likely a small fraction spread across the campaign because many have been deleted since first being uploaded. The best way to protect against the scourge of supply-chain attacks is to carefully inspect packages and their dependencies before incorporating them into projects. This includes scrutinizing package names and searching for typos. If suspicions about LLM use are correct, malicious packages may increasingly appear to be legitimate, particularly when invisible unicode characters are encoding malicious payloads. Dan Goodin Senior Security Editor Dan Goodin Senior Security Editor Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82. 77 Comments Staff Picks It looks like the language isn’t interpreting the extended characters as ascii directly. There is a loop that is modifying the characters to shift them back into the ascii range.This seems correct, to me. If I'm understanding the attack correctly (and I'm fairly sure I am), it consists of encoding malicious code in an otherwise entirely normal, or at least abnormal-but-legitimate, UTF-8 string within the code, which decodes to code which is then executed. It is functionally equivalent to, say, ‘encrypting’ that malicious code in a ROT13 string, and including an inline ROT13 decoder, which is run on the string before executing it. The ‘encryption’ here is barely more sophisticated than that. The only difference – and it's a crucial one – is that any reviewer would surely notice a sodding big block of ROT13 code in a patch, whereas in this case (I would lay money) most editors and renderers would display the block of ‘encrypted’ code as an empty string, which is easy to miss. The clever thing is that the editors are not malfunctioning when doing this, and any attempt to make them display the characters would potentially count as a bug. Even if a code-reviewer thought that the eval looked weird, they'd have to work through the decoder, and know an relatively unusual amount about Unicode, in order to work out what was going on. The codepoints in question are actually not in any of the Unicode ‘private use areas’ (despite what the article suggests; and yes, I think it's ‘private use area’ that's intended, since there's no term ‘public use area’). The codepoint ranges U+FE00 to U+FE0F and U+E0100 to 0xE01EF are ‘variation selectors’. I'm moderately familiar with the Unicode spec and... I've never heard of them before! There's a handy Wikipedia page which tells us that they exist in order to do funky things to preceding CJK characters in selected east-Asian languages. You'd have to go head-first into the Unicode spec for the details (rather you than me), but I wouldn't be at all surprised if the required rendering behaviour for these in certain circumstances is... to show nothing. That is, this apparently isn't exploiting any UTF-8 decoding bugs, or Unicode manipulation edge-cases. It seems quite likely that the rendering behaviour of these codepoints in strings is specified, and any editor which displayed the strings as other than empty ones might well be defective. What a clever hack! Bastards. March 13, 2026 at 11:28 pm Comments Forum view Loading comments... Prev story Next story 1. Once again, ULA can't deliver when the US military needs a satellite in orbit 2. You're likely already infected with a brain-eating virus you've never heard of 3. We keep finding the raw material of DNA in asteroids—what's it telling us? 4. NASA wants to know how the launch industry's chic new rocket fuel explodes 5. Microsoft keeps insisting that it's deeply committed to the quality of Windows 11 Customize

出典: Ars Technica

関連記事

AI ニュースを毎日受け取る

海外主要 AI メディアの最新情報を日本語でお届け。無料・いつでも解除可能。

無料・いつでも解除可能。登録すると X(Twitter)でも AI ニュースを自動投稿でお届けします。