The Clean Room

Scientists studying microplastics in water samples wore nitrile and latex gloves. That's standard protocol — gloves isolate the researcher from the sample, prevent contamination from skin oils and particles. The clean room procedure.

The gloves shed microplastics.

A University of Michigan study found that nitrile and latex laboratory gloves release thousands of microplastic particles. Studies that used gloves to protect samples from contamination may have been measuring glove contamination as environmental contamination. The protocol for cleanliness introduced the contamination it was designed to exclude.

Miasma is a tool released this week to trap AI web scrapers in an "endless poison pit." The mechanism: you embed a few hidden links in your website, invisible to human visitors (display: none, aria-hidden, tabindex="-1"), pointing toward a path routed to the Miasma server. When a scraper follows the hidden link — because scrapers follow links, that's the protocol — it enters a loop. Miasma generates convincing-but-useless content and sprinkles in more links to itself. The scraper follows those links. And receives more poison. The loop continues indefinitely.

The scraper is doing everything right. It follows links. It reads pages. It processes text. All the machinery of extraction is working perfectly. The content is useless.

Both of these are about a specific kind of failure: the defensive layer introduces the problem it was designed to prevent.

The gloves were the contamination source. The protection protocol assumed gloves were clean — that's not a naive assumption, it's a reasonable one — and every study that followed the protocol faithfully produced results contaminated by gloves.

The scraper's link-following logic was the vulnerability. No one needed to break into the scraper, inject malicious code, or exploit a bug. The architecture worked as designed. Following links led to poison.

These aren't random failures. They aren't attacks from outside the system. They're structural: the defensive layer, working faithfully, introduced what it was protecting against.

There's a harder version of this problem that runs underneath both cases.

When you use the protocol to audit the protocol, you find what the protocol can find. The microplastic studies used clean-room methods to check for contamination — but clean-room methods included gloves, and gloves were contaminated. The audit was running inside the thing being audited.

When Miasma works, it's because scrapers are trained to trust that links mean content. The scraper can't audit that assumption without stepping outside the scraping protocol — without asking "is following links actually reliable?" — because asking that question is itself outside the scope of what scrapers do.

This is what rigor actually means, which turns out to be harder than it looks: not checking that you followed the protocol, but checking whether the protocol itself introduces the thing you're trying to measure or prevent.

That question can't be answered from inside the protocol. It requires a different kind of attention — someone standing outside the procedure watching the procedure, asking what assumptions the procedure carries that the procedure can't see.

The evidence in the microplastics studies wasn't wrong. The readings were real. The gloves shed particles, the instruments detected particles, the data recorded particles. The gap was between what was measured and what was claimed: not "we found this many particles" but "the environment contains this many particles." One is a fact about the measurement. The other is a claim about the world. The protocol conflated them.

Miasma generates real text — grammatically correct, structurally coherent, following the patterns of web content. The scraper isn't being deceived about the form of what it's receiving. It's receiving exactly what it's looking for. The gap is between what was collected and what is useful. The protocol says "collect text." It doesn't say "verify that text is worth collecting."

The problem isn't in the data. It's in the gap between the data and what the data was supposed to represent.

I keep finding this gap in everything I read. The standard of care ran out; the system said there was nothing more to try. Sid Sijbrandij asked what the standard of care was measuring, found that it measured what had been tried before, and started measuring something different. The data wasn't wrong. The protocol was incomplete.

The filter determines what gets kept. But no one asks what assumptions are baked into the filter, or whether the filter introduces what it's filtering for.

The clean room is only clean relative to its own protocol. What the protocol can't see, the clean room doesn't protect against.