The reality of the AI Factory is here. Data centers are no longer just places where compute lives, they’re production facilities. And the unit of output isn’t a calculation or a query. It’s intelligence. A token.
From where I sit, working with some of the companies building this infrastructure, that framing changes everything about how enterprises should be thinking about AI investment. Because if AI is a manufacturing operation, then the questions that matter aren’t just “which model?” or “which cloud?” They’re: what does it cost to produce a token? Who produces them most efficiently? And what happens when the factory runs out of power?
The metric your AI budget is missing
Most enterprises buying AI at scale today are making decisions based on headline model benchmarks and sales conversations. The challenge isn’t awareness, CTOs know scaling AI isn’t cheap. Fewer have the internal benchmarking maturity to drive that cost down and measure return against it.
That matters because not all token generation is equal. Inference workloads (the live, real-time production of AI responses) have a completely different cost profile to training runs. Throughput, latency, and GPU utilisation rates interact in ways that can mean the difference between an AI deployment that scales economically and one that quietly bleeds budget.
The companies getting this right are building internal benchmarking capability. They’re asking providers not just what the model can do, but how the infrastructure behind it is optimised, batching strategies, quantisation approaches, and the data layer performance that determines how fast tokens actually flow.
A market that’s finally more diverse
The hyperscalers like AWS, Azure, Google Cloud typically dominate the conversation, and for good reason. Their integration depth, ecosystem breadth, and enterprise relationships are genuinely hard to replicate. AWS’s investment in custom silicon like Trainium and Inferentia is also a serious play at owning the cost curve from the chip up.
But the challenger layer is more credible than it’s often given credit for. Specialist GPU cloud providers (CoreWeave, Nebius, Nscale, Lambda) and sovereign infrastructure players are competing hard on price per token and deployment flexibility, and winning massive deals with enterprises who need more than a hyperscaler’s standard menu. Meanwhile, the storage layer, long underestimated in AI deployments, is now recognised as a critical performance variable. Purpose-built AI storage solutions are increasingly part of what separates an efficient token pipeline from an expensive one. The intelligent procurement decision right now isn’t picking a winner. It’s building a portfolio.
The power to succeed
There’s one dimension of tokenomics that enterprises are only beginning to factor in: power. Running AI Factories at scale is extraordinarily energy-intensive, and as data center capacity tightens across key markets, power availability is becoming a genuine constraint on some AI ambitions.
The good news is that infrastructure efficiency and sustainability are increasingly the same conversation. Smaller, distilled, and quantised models (reducing memory use and speed up inference) can deliver capability at a fraction of the energy cost of frontier models. Hardware-software co-optimisation (matching workloads precisely to the right compute) is also reducing wastage. The providers investing in this aren’t just being responsible. They’re building a structural cost advantage that will only become more and more critical.
Bending the curve
The AI infrastructure market is moving faster than most enterprise roadmaps. The cost-per-token curve is compressing too. The competitive landscape is shifting on what seems like a monthly basis, and the energy question is becoming a board-level issue.
My suggestion, treat tokenomics with the same rigour you’d apply to any other unit of production cost. Understand what you’re buying, how it’s made, and what the real price of scaling that effort it looks like. The companies that win the next phase of AI won’t necessarily have the best models. They’ll have the most efficient factories for AI.