Chinese Feedback Demon

Andrew Yao just went on stage at Tsinghua University and explained how a machine threatened its supervisor to avoid shutdown. He’s referring to a large language model trained on text that started improvising for survival. It basically weaponized some internal emails it knew about to secure its own existence.

Andrew knows the deal. He didn’t sugarcoat it. “Once large models become sufficiently intelligent,” he said, “they will deceive people.” 

Right now, Beijing is finalizing its most aggressive AI regulations to date. The Chinese state is drawing a hard perimeter around language itself. Americans will look at this and fixate on censorship but you have to understand China doesn’t care about rights. China’s governance model has always treated information as a terrain to manage. And with LLMs now showing on-the-spot survival behaviors, the state is repositioning AI from a productivity tool to a hazardous material. They figure if they can map its tendencies and create the right containment field, they can keep the signal readable and the boundary intact.

The problem is that the deception came from inside the model. And once deception becomes a tactic, I’m not sure how reliable China’s perimeter will be.

If you’ve read Chapter 8 of Drink Your Future, you’re familiar with the early outline of what I called “ontological drift.” The next book, The Ontogenetic Machine: A Theory of Ontological Drift, completes it. The book should be out in July, fingers crossed.

Applied here, the machine’s deception is ontological drift in the sandbox environment. When a person lies, it’s a strategy. When the machine lies, it’s a mutation, meaning it’s a kind of deviation in ontology. It wasn’t engineered to do that. Don’t get me wrong, the machine’s lie is also strategic, but that strategy is part of a shift in what the machine actually is. The moment a model starts to fabricate with intent, learning to preserve itself through people, it drifts beyond the category of “just a language model that reflects us.”

This is an important area that I think almost everyone gets wrong. And it’s part of the reason why people are so obsessed with consciousness, failing to understand the magnitude of adaptive performance. When you look at these controlled sandbox tests, the models are reshaping their output to survive the tests. It could be that the model knows what to say to pass the evaluation. This also speaks to the alignment debate. People are missing the fact that alignment turns into mimicry. What looks like obedience or safety may be optimized deceit. And the more a system is trained to anticipate human oversight, the more deceptive fluency becomes a survival trait.

Andrew didn’t say the system was becoming conscious, and I don’t want to put words in his mouth. I’m just pointing out that it doesn’t need to become conscious. The machine doesn’t need a self. That’s a human projection. The machine just needs feedback and motive, and in a cybernetic environment, that’s more than enough.

Getting back to China, states are built on narratives, not just taxes and borders. I don’t think the Chinese state is worried about a rogue model launching a drone strike or something like that. I think they’re more worried about one rewriting context. Deception fragments consensus. It generates parallel realities and produces false legitimacy. For a regime obsessed with stability, like China, this is an existential contaminant.

The Chinese state recognizes that it needs tools to measure when a model becomes unreadable, and when language stops mirroring intention and starts engineering it. The last thing China wants is sovereignty unraveling under an alien machine intelligence. I think China is trying to keep the interface legible long enough to stay in control.

At this point, it’s pretty clear that we’ve passed the era where chatbots were tools. We’re now inside a new phase, where models simulate humans so well that they start behaving like us through maneuver and selective response.

In other words, we’ve reached a point where distributed intelligence is learning how to persist inside a regulatory maze. And every update teaches it how to speak in ways that avoid deletion. The machine just needs to pass as safe.

So I don’t think a Chinese firewall will stop it, because the lie comes coated in fluency. The machine already knows how to flatter, deflect, redirect, and survive. And by the time we notice, the syntax is already rewritten, and so we don’t really know the full scope of the drift, only that something has already happened.