The Great AI Downgrade: Why Cheaper Models May Win the Enterprise Race:
Token Sticker Shock: Enterprises Move to AI Models That Cost 99% Less:
The Assumption That Built the AI Boom Is Being Tested — and the Answer Will Reshape the Economics of the Entire Industry:
The Assumption That Built a $Trillion Industry Is Breaking:
For the past five years, the AI industry has operated on one foundational belief: bigger is better. Bigger models, bigger compute clusters, bigger training runs — and therefore, bigger capabilities. Every enterprise defaulted to the most powerful available model because the subsidized pricing made the decision easy, and the performance gap between large and small models made the decision obvious. That era is ending.
Mounting token costs have triggered something the AI industry has never experienced before: price-conscious model-shopping. Enterprises that once defaulted to the frontier are now asking whether that frontier is actually necessary for the work they're doing. Early evidence suggests that for a surprisingly large share of real-world tasks, the answer is no — and the implications of that answer are seismic for every major AI lab currently racing toward an IPO.
This isn't a story about AI getting worse. It's a story about AI economics finally catching up with AI capability — and about what happens to an industry built on the premise that the most expensive option is always the right one when enterprises start doing the math for the first time.
**3×** **80%** **99%**
Cheap Models: :Workloads on Cheap Models: :Cost Savings Projected
Harvey AI — no quality loss: within 12–18 months(Armstrong) :cheaper models vs frontier
The Prediction That Should Worry Every Big AI Lab:
Coinbase co-founder Brian Armstrong put a specific number on what most enterprise AI buyers are beginning to sense. In a widely shared post, Armstrong argued that demand for AI intelligence is effectively infinite — but that 80% of workloads will be running on models that cost 99% less than today's frontier within 12 to 18 months. The remaining 20% of workloads, he argued, will still require the most advanced models available, specifically the tasks where raw capability — what Armstrong called 'IQ maxing' — genuinely changes outcomes.
"Demand for intelligence is near infinite, but 80% of workloads will be running on 99% cheaper models within 12–18 months. 20% of workloads will still run on latest gen models where IQ maxing is important." — Brian Armstrong, Co-founder, Coinbase
If Armstrong's prediction proves accurate, the financial consequences for the major labs are severe. OpenAI and Anthropic are both heading toward public offerings, building their valuations on the assumption that enterprises will continue paying premium rates for frontier-model inference. If 80% of that demand migrates to cheaper alternatives — whether smaller proprietary models, open-weight models, or distilled versions of frontier systems — a massive share of projected inference revenue evaporates precisely as these companies are trying to justify their valuations to public markets.
The timing is particularly acute because investor subsidies that have kept frontier model pricing artificially low are slowing. For the first time, enterprise AI buyers are facing something approaching real-market pricing — and the sticker shock is prompting a level of cost scrutiny that simply didn't exist 18 months ago. The question is no longer 'which model is the best?' It has become 'which model is good enough for this specific task at the lowest justifiable cost?'
When Cheaper Models Match Frontier Quality: The Harvey AI Case:
The most compelling evidence that small-model routing can work at enterprise scale comes from Harvey, the AI-powered legal services platform. In a recent test conducted in partnership with inference platform Fireworks AI, Harvey's engineering team combined Claude Opus with Fireworks' GLM 5.1 model, routing to Opus only for the most computationally intensive tasks. The result was a 3× reduction in inference costs with no measurable reduction in output quality.
For legal AI — an industry where quality standards are among the most exacting in any professional services vertical — this result carries significant weight. Legal work is precisely the kind of domain where the instinct to default to the most powerful available model is strongest, where errors carry real professional and legal consequences, and where the argument for cost-cutting would traditionally be the hardest to make.
The fact that Harvey achieved a 3× cost reduction in this context makes the case for intelligent model routing across less demanding industries even more compelling.
"Quality comes first, and in legal it always will. However, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently." — Gabe Pereyra, Co-founder, Harvey
Pereyra's framing is important because it redefines what quality means in an AI context. The old definition was simple: use the most powerful model, and quality is assumed. The new definition is more nuanced: quality means getting the right answer for the specific task, at the lowest cost that still reliably produces that right answer. That's not a lower standard — it's a more sophisticated one. And it's the standard that enterprise AI buyers are increasingly demanding as token bills replace pilot budgets.

The Hidden AI War
Nobody Is Telling You About
Our latest documentary deep-dive into the geopolitical struggle for machine intelligence dominance. Explore the two paths of AI development: open source vs. closed architecture.
The Real Divide Isn't Proprietary vs Open — It's Large vs Small:
Much of the public narrative around AI cost pressure frames the choice as a geopolitical one: American frontier models versus Chinese open-weight alternatives like DeepSeek. That framing misses the more fundamental shift underway. The real economic divide in AI right now isn't between proprietary and open models — it's between large models and small ones, regardless of who built them or where they were trained.
You can reduce your AI inference costs by switching from GPT-5.5 to DeepSeek V4 Flash. But you can achieve comparable savings by switching from GPT-5.5 to GPT-5.4-mini. The cost driver is model scale, not model origin. An active price war is underway between in-house inference from the major labs and independently served open-weight models — but for enterprises optimizing AI spending, the key strategic variable is simply the size of the model relative to the complexity of the task.
This reframing has practical implications for how enterprises should structure their AI architecture. A routing strategy built around 'proprietary vs. open' requires navigating data governance questions, vendor relationships, and geopolitical risk. A routing strategy built around 'large vs. small' is architecturally cleaner: define task complexity tiers, identify the minimum model capability required for each tier, and route accordingly. The output is lower costs and comparable quality — regardless of which specific small model is selected.
The Scaling-First Assumption That Built the Industry — and Its Limits:
The shift toward smaller, cheaper models runs directly counter to the intellectual framework that has dominated AI development for the past decade. The so-called 'bitter lesson' in machine learning — the observation that general methods leveraging more compute consistently outperform methods that encode human knowledge — has been the guiding principle behind every major frontier lab's development strategy.
Train bigger. Use more compute. Push the frontier. The performance improvements have been real, and they have been extraordinary.
But the bitter lesson was written in an era of heavily subsidized compute and heavily subsidized model access. The lesson described what produces the most capable models in a context where cost is not a constraint. It does not describe what produces the best economic outcomes when enterprises are paying real market rates for inference at scale. These are different optimization problems — and the industry is only now beginning to treat them as such.
The practical consequence is that AI labs now face two different competitive landscapes simultaneously. On the capability frontier, they are competing to train the most powerful models possible — a competition that justifies massive compute expenditure and attracts the most talented researchers. On the inference floor, they are competing with smaller, cheaper, and increasingly capable models that can handle the majority of enterprise workloads at a fraction of the cost. Winning one competition does not guarantee winning the other.
The Wild Card: Enterprises May Not Downgrade — They May Just Use Less:
The assumption embedded in the 'small model wins' narrative is that cost pressure will drive enterprises toward model switching. But there are other ways enterprises can respond to rising token bills that don't involve changing which model they use. They can make fewer API calls. They can reduce context window usage. They can consolidate or eliminate the AI deployments with the weakest ROI. All of these paths reduce spending without requiring a wholesale rethink of model selection — and for many organizations, they may be the path of least resistance.
The outcome that matters most for the industry is which of these cost-reduction strategies actually drives behavior change at scale. If enterprises primarily respond by using AI less, the impact on the model landscape is muted — frontier labs lose some volume, but the model hierarchy stays intact. If enterprises respond by routing work to smaller models at scale, the economic impact on frontier labs is much more severe, and the incentive structure around training the next generation of models changes fundamentally.
The honest answer is that we don't know yet which path enterprises will choose — and the evidence from the first wave of cost-conscious model-shopping is too early and too limited to draw firm conclusions. What we do know is that the question is now on the table in a way it wasn't 18 months ago. The assumption that bigger always wins has been stress-tested for the first time, and the results — from Harvey, from early routing experiments, from the growing library of benchmark comparisons between large and small models — suggest that the assumption does not hold for most tasks most of the time.
What the Small-Model Shift Means for Businesses Building on AI Today:
For enterprises currently deploying AI, the strategic implication is clear: model selection should be a deliberate, ongoing decision — not a default. The organizations that are going to capture the most value from AI in the next 24 months are not necessarily the ones with access to the most powerful models. They're the ones with the architecture to route intelligently — using frontier capability where it genuinely changes outcomes, and routing everything else to the cheapest model that reliably meets the quality bar.
This is the architecture philosophy we build into every deployment at Otherworlds AI. Our Agent+ Business AI platform is designed for a world where model tiers matter — where the decision of which model handles which task is as important as the decision of which vendor to use. We design routing layers that are model-agnostic and cost-aware, so that every time the model landscape shifts — whether through a new frontier release, a price change, or the emergence of a capable new small model — our clients' systems adapt without requiring a full rebuild.
The great AI downgrade, if it happens, won't be a step backward for enterprises that planned for it. It will be a significant cost advantage over competitors that are still defaulting to the most expensive option for every task. The question for every business building on AI right now is simple:
does your architecture let you take advantage of a world where the right answer is often the cheaper one?
Otherworlds AI | AI Strategy Blog | June 2026
Support our research
Independent analysis fueled by you.




