May 28, 2026 · 9 min read · Yash Chouriya

AI Won't Replace Humans Until We Stop Counting Tokens

There's a question I get asked in almost every client conversation, usually somewhere between the architecture diagram and the invoice: "So… will this eventually replace the team?"

My honest answer: not until we stop counting tokens. And we are nowhere near that day.

Intelligence on a Meter

Here's the thing nobody puts on the keynote slide: every piece of machine intelligence in production today runs on a meter.

I build AI systems for a living, and a meaningful part of my job — embarrassingly large, honestly — is rationing thought. I trim prompts. I cap output tokens. I route easy questions to small models and reserve the big ones for calls that justify the spend. I cache aggressively so the model doesn't have to think the same thought twice, because thinking twice costs twice.

Every organization I've worked with does the same. There are dashboards for token usage the way there are dashboards for AWS bills. Teams get nudged when their experiments burn too much. Features get scoped down — not because the model can't do it, but because doing it at scale, every day, for every user doesn't pencil out.

Now ask yourself: does any of that sound like a technology that's about to replace humans?

A human brain runs on roughly twenty watts — a dim light bulb. It runs all day on a couple of meals and some coffee. Nobody meters your thoughts. Nobody routes your easy decisions to a cheaper colleague to save energy. The economics of human cognition are so absurdly good that we don't even think of them as economics.

Meanwhile, the data centers behind frontier models draw power measured in megawatts, and the industry's biggest constraint this decade isn't algorithms — it's electricity, cooling, and chips. We are building intelligence that is brilliant and expensive to run, and that second property changes everything about how it gets used.

Scarcity Shapes Behavior

When something is metered, you economize. That's not a moral position; it's just what organizations do.

So AI today gets deployed the way water gets used in a desert — carefully, where the return is highest:

—It drafts, and a human finishes.
—It triages, and a human decides.
—It handles the 80% of cases that are cheap to verify, and a human owns the 20% where mistakes are expensive.

That's augmentation. It's genuinely valuable — it's most of what I build — but notice the shape of it: the human is always there, precisely because every additional model call has a price tag and every unverified output carries risk. The meter keeps humans in the loop as much as any policy decision does.

Full replacement has a different requirement profile. It needs intelligence that runs continuously, redundantly, wastefully — the way human attention runs. You don't replace a person with a system you're rationing. You replace them with something so cheap you never think about invoking it. We ration tokens; nobody rations thoughts.

We've Seen This Movie Before

Every transformative technology had its metered era, and the meter always defined the era.

Early computing was billed by the CPU-second — people scheduled jobs overnight and optimized punch cards. Early internet was billed by the minute — nobody built YouTube on dial-up pricing. Early cloud storage was precious — now we photograph our lunch in 4K without a thought.

In every case, the interesting future didn't arrive when the technology got better. It arrived when the technology got too cheap to meter. Streaming, social media, modern SaaS — all of it was economically impossible until the underlying resource stopped being something you counted.

AI is still deep in its counted era. We are in the punch-card years of machine intelligence — magnificent capability, scheduled and budgeted like mainframe time.

The Day the Counting Stops

So here's my actual position, the one I'd defend at the whiteboard:

AI will not completely replace humans until running AI stops being an energy problem — and that means we need to find fundamentally better ways to run these models on lower energy.

That breakthrough could come from many directions — radically more efficient architectures, better hardware, smaller models that match big ones, techniques we haven't invented yet. I'm not qualified to say which, and I'm suspicious of anyone who claims certainty. What I can say from the trenches: until it happens, every deployment conversation will include the question "what will this cost per month?" — and as long as that question exists, humans aren't being replaced. They're being assisted by something we can only afford to use deliberately.

And when the counting does stop — when intelligence becomes ambient and unmetered the way bandwidth did — the future that arrives won't look like "the same world, minus the workers." It will look as different from today as the streaming era looked from dial-up. New work, new categories, new problems we can't currently afford to imagine. That's the actual future, and it begins the day we stop watching the meter.

What I Tell Clients

In the meantime, my advice stays boring and practical:

—Treat AI like a brilliant consultant on an hourly rate, not an employee on salary — use it where the leverage is obvious.
—Invest in the plumbing that makes metered intelligence affordable: routing, caching, evals, small-model offloading. (I've written about that in my piece on inference economics.)
—Keep humans in the loop where mistakes are expensive — the economics already point you there anyway.
—And ignore both extremes of the discourse. "AGI next year" and "it's all hype" are equally unhelpful for shipping things this quarter.

The machines aren't coming for everything. Not yet. First, somebody has to pay the electricity bill — and the whole industry is still doing that math, twenty watts at a time.