AI vendor lock-in is the deepest in the history of enterprise software. Here's why a deliberate multi-model AI strategy is the platform decision of the decade, why "just use a gateway" is not the answer people think it is, and why it's harder than it looks to implement.
A platform decision hiding inside a default API key
About 15 years ago now, I co-founded a company called CloudEndure, which we later sold to AWS. It was a critical time in cloud, as many companies that had mostly ignored cloud or attempted to build it in-house now found themselves realizing that there was a better way.
Indeed, every technology wave has a moment where a handful of architectural choices quietly decide who wins and who gets left behind. The companies that treated the public cloud as a strategy became Amazon, Netflix, Airbnb. The ones that treated mobile as a feature instead of a platform were plowed over by companies that didn't. The decision looked technical but was actually existential.
The choice of how your company depends on AI models is that decision, happening now, and on a faster clock than either prior wave. Enterprise AI adoption has gone from under 5% in 2023 to over 80% by 2026, and enterprise spend on generative AI hit $37B in 2025, more than tripling year over year. Running more than one model is already the norm: in a16z's survey of 100 enterprise CIOs, 37% now run five or more models in production, up from 29% a year earlier.
And yet most organizations are making the dependency decision by accident by defaulting to whichever model their first prototype used. They wire it into the core of the product and only later discover that they have bet the company on a single vendor they cannot leave.
The deepest lock-in we have seen
"We can always move later, it's just a migration cost."
I heard this a lot back in my time at CloudEndure, and while perhaps costly and risky, it wasn’t untrue. It's worth being precise about why this wave is different, because the instinct from the cloud era is dangerously wrong here.
Fundamentally, the public cloud was an easier way to run software you could already run. It removed the need to own hardware and data centers by letting you consume databases, queues, and storage as services. But the escape hatch was always there. You could move from one cloud to another, or even back to on-prem, whenever you liked. It was painful and expensive, but it was a question of cost, as the workload itself was portable.
AI breaks that. With complex and mission critical AI workflows, you pretty much have no choice but to use the frontier AI models, such as the ones offered by OpenAI or Anthropic. If you have AI at the core of your SaaS product, even the switch between the two is extremely hard, and trying to self-host the Chinese open-source models simply does not give you the same functionality as using top-of-the-line commercial models, which additionally are getting better and better all the time. So this is a lock-in of the highest degree. Once you've incorporated AI at the core of your product, there is pretty much no choice left, which is very different from what we saw in cloud.
The market is starting to name the same fear. 94% of IT leaders now cite vendor lock-in as a material concern and switching costs are not falling as the technology matures. They're rising. a16z's CIO survey found that the move to agentic workflows has made models harder, not easier, to swap: "all the prompts have been tuned for OpenAI… changing models is now a task that can take a lot of engineering time." The strategist Daniel Amundson puts the board-level version bluntly: "AI vendor concentration is the new single-cloud dependency", except most boards built explicit multi-cloud policies years ago and have no equivalent for AI.
Show Image
And "just self-host an open model" escape valve is a pipe dream. The open-weight gap has closed dramatically, as it's now single-digit percentage points on everyday work and roughly 6–9 months behind the frontier, with DeepSeek V4 and Qwen 3.7 Max genuinely competitive on many tasks. But "behind on benchmarks" is not "behaves identically in your product," and the operational burden of self-hosting frontier-class models is real. Keeping the self-hosted models up to date with the latest version is extremely time consuming and most organizations can't afford doing this, thus growing the gap between what they have and best of breed more and more as time passes.
So the exit exists on paper and narrows exactly where you need it most: the hardest, most defining work at the core of the product.
Are models interchangeable? The debate and the synthesis
There is a genuine, current argument about this, and getting it right is the whole game.
One camp says models have commoditized. In a widely shared essay from December, Jens Eriksvik argued that "AI models may have reached their disk-drive moment. The fantasy that a single model will protect your business like a medieval moat has officially expired." His conclusion is sharp: "The real question isn't 'which LLM should we bet on,' it is 'how fast can we swap it out.' When models become commodity components, strategy shifts from ownership to orchestration." Echoing it, others argue context, not the model, is the moat. And that the durable advantage is the architecture around the model, not the model itself. The economics back the commoditization pressure: inference for a given level of capability has been falling roughly 10x per year in what they call "LLMflation", with Epoch AI measuring a median ~50x annual price drop across benchmarks.
The other camp (and the hard data) says not so fast. a16z's CIO survey reaches the opposite conclusion at the layer that matters: "it's clear that the enterprise model layer has not become commoditized," and "model differentiation by use case is the main reason enterprises buy models from multiple vendors." Enterprises report Anthropic is stronger for coding and writing, OpenAI for complex question-answering, and Gemini for cost. Differences in tone, reasoning style, and edge-case reliability don't show up in a headline benchmark but absolutely show up in production.
My take is that models may be commoditized, but still behave significantly differently from each other. AI models of different vendors are not easily interchangeable. For simple use cases such as AI-assisted coding or a chatbot, it's not the end of the world if you suddenly switch to a different model and it behaves differently. But when a core functionality of your AI-based product, which you carefully adjusted the prompts, the harness and the logic for, starts behaving differently, it can be a disaster.
And even when it’s not powering your product, the cost of AI switching can be quite high as your teams build more and more around a particular tool. In this case, it seems more akin to the cost of moving to the cloud, where it’s ultimately a cost question. But an individual or team’s growing reliance on specific tools, be it a particular mix of models that the creative team uses to produce videos or a tool that lives within Word for the legal team, will mean this shift isn’t limited to how the IT team operates but will affect the entire company.
So models are hot-swappable commodities for the low-stakes, well-defined tasks where any capable model clears the bar. But inference platforms are not interchangeable, especially not for the core functionality that defines your product, where behavior is the experience.
Show Image
Why a swap breaks things: the model changes under you
Here is what turns "not interchangeable" from a benchmark footnote into an operational risk. Even if you never decide to switch, the vendor switches for you.
A production prompt, as one widely cited 2026 migration guide put it, is really a contract. "A set of assumptions about how a specific model version interprets language, structures output, handles ambiguity, and manages edge cases. When the model changes, the contract is voided, even if the words in the prompt haven't moved." And models change constantly:
- Silent versioning is routine and it's an unmanaged change-control problem. Providers continuously update weights, swap quantization, and change inference engines behind a stable endpoint name. On April 25, 2025, OpenAI pushed a silent GPT-4o behavior change with no announcement and JSON-extraction prompts started failing, which was discovered only when customers felt the brunt of it. For a CISO, that's the alarming part. A third party altered production behavior with no change ticket, no rollback, and no audit trail, breaking the most basic assumption of change management.
- Deprecation is relentless. Anthropic deprecated Claude 3.5 Sonnet in August 2025 with roughly two months' warning; OpenAI's GPT-5 line iterated through five distinct versions by early 2026; Google blocked new access to Gemini 2.0 Flash in March 2026. The model you validated against may simply not be available next quarter.
- The degradation is often invisible until it's expensive. One enterprise saw factual correctness fall 52% over four months with zero prompt changes.
- When the change hits a core experience, users revolt. When OpenAI replaced GPT-4o with GPT-5 in August 2025, users who had built emotionally meaningful relationships with the older model reacted as though a partner had died and been replaced by a stranger; a "keep GPT-4o alive" campaign forced a reinstatement within ~24 hours. The drama is beside the point; the lesson is that a vendor changed the model and a core experience broke, and the people who built on it had no control and no warning.
As one practitioner summary of the 2024–2026 build cycle put it: "demos work, production breaks", and the failure is rarely the model in isolation. It's the eval harness, the orchestration, and the observability nobody built because the demo didn't need them.
This is the paradox at the heart of the strategy: the lock-in is deepest precisely because the models aren't interchangeable, and their non-interchangeability is exactly why you can't naively rely on swapping them. You are locked in. You cannot safely treat the alternatives as drop-in replacements. Both are true at once and any honest multi-AI strategy has to hold both.
"Multi-AI" is not what most people think it is
When people hear "multi-AI strategy," they usually picture optionality on tap. Start using an LLM gateway like OpenRouter, LiteLLM, or Portkey and flip to whichever one is cheapest that week and can still perform the task. That capability is real and every serious AI org should have it.
But sold as freedom, it's a half-truth.
A gateway gives you a single API and a kill-switch; it does not give you the thing you actually need, which is confidence that a different model will behave correctly in your specific context.
A real multi-AI strategy is not "we can switch instantly." It is something more disciplined: we have deliberately reduced the cost and risk of our model dependency, and we route work across models on purpose with the evaluation, governance, and fallbacks to do it safely. As Eriksvik frames the upside, strategy shifts "from ownership to orchestration". But the goal isn't frictionless switching. It's that no single vendor's roadmap, price change, outage, or silent update can take your business hostage.
This reframes multi-AI from a cost play into a resilience play, which is why it belongs on the CISO's desk and in the boardroom as much as the CTO's. 41% of enterprises now deliberately run multiple agent platforms specifically to avoid concentration. It's the same logic the rest of the enterprise already applies to critical suppliers: you don't single-source a component your product can't function without.
What a real multi-AI strategy actually looks like
Five moves, in order. See it, evaluate it, route it, govern it, hedge it.
Show Image
1. Abstract: make model choice a configuration. Put a gateway or internal abstraction layer between your application and every provider, so the model behind any call is a setting, not hardcoded plumbing. Enterprises that built abstraction layers into their first deployment report adding or switching providers with materially less migration effort than those wired directly to one vendor's API. It won't make models interchangeable, but it removes the engineering cost of switching so that only the behavioral cost remains, which is what the next move addresses. (Relatedly: a16z found that avoiding heavy fine-tuning also reduces lock-in, because prompts port across models far more easily than fine-tuned weights.)
2. Evaluate continuously: own a regression suite for model behavior. The single highest-leverage investment is a golden test set: a battery of real, representative cases for each critical use case, with known-good outputs, run against every candidate model and every version of your current one. This is your defense against silent versioning and your prerequisite for ever swapping safely. Enterprises increasingly use external benchmarks as a Gartner-style first filter, but the buyers themselves stress that internal golden datasets and real trials are what actually decide it.
3. Route deliberately: run a model portfolio. With abstraction and evals in place, send each kind of work to the right model. Frontier models (Claude Opus 4.8, GPT-5.5, Gemini 3 Pro) for the hardest reasoning; cheaper or open-weight models (Qwen 3.7 Max at ~$3.75/M, MiniMax 3 at ~$0.28/$0.42) for high-volume, well-defined tasks; self-hosted open weights for the most sensitive data. With inference costs falling ~10x a year, the savings from routing the low-stakes layer to commodity models compound fast. This is the spending barbell of AI spend expressed as a model portfolio. Bet big where capability decides the outcome, hold the line everywhere else.
4. Govern in-path: consistent controls across every model. The moment you run more than one model, two questions get hard and stay hard: which model handled this request? and are my safety, privacy, and quality controls applied consistently regardless of which one did? You cannot answer either from six vendor dashboards, let alone the hundreds or thousands enterprises are likely to be using in the very near future. You need an in-path control layer in front of all of them that is providing visibility and accountability (which model, which version, on which data) while also enforcing the same guardrails, redaction, and validation everywhere. This is the philosophy behind our view of data governance. You don't make AI safe by blocking it, you make it safe by governing how it's used.. Multi-model makes that layer non-optional, because it's the only place the consistency can live.
5. Hedge: treat models like a supply chain. Apply the procurement discipline you already use everywhere else: never single-source a critical input. Maintain a primary, a viable secondary, and an open-weight fallback for the paths your business can't lose, and keep at least two providers genuinely tested, so switching is practiced, not theoretical. Own your data, embeddings, and fine-tuning pipelines in portable formats so the IP travels with you rather than living inside one vendor's infrastructure. Run a periodic risk assessment across the models and tools actually in use, including the ones that arrived through shadow adoption, so you know your real exposure before a price hike or deprecation forces the question.
The honest limit
A multi-AI strategy reduces your dependency risk. It does not eliminate it, and anyone selling "total model freedom" is selling you a fantasy. For the core, capability-defining functionality of your product, some stickiness is irreducible.The best model is the best model, and you will want it. What a real strategy buys is leverage and survivability. You are no longer captive, you can route on purpose, you can switch the parts that are switchable, and you'll see a forced change coming instead of waking up to a broken product.
Further reading from Verax
- The Hidden Costs of Internal AI — what AI actually costs once you look below the invoice.
- Why We Believe in DLE: When blocking data stops working — governing how data and AI are used, in context and in-path, instead of bluntly blocking.
- Launching the Verax AI Risk Assessment — mapping the models and tools actually in use across the enterprise.
Sources
- Leonid (Leo) Feinberg, Verax AI — internal commentary on AI lock-in and model non-interchangeability.
- How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 — a16z (multi-model norm, 37% run 5+ models, "not commoditized," rising switching costs, benchmarks-as-filters).
- 2025: The State of Generative AI in the Enterprise — Menlo Ventures ($37B enterprise spend; provider market share).
- What Your Board Doesn't Know About AI Vendor Concentration Risk — Amundson Strategic (Substack) (94% fear lock-in; 41% multi-platform; "the new single-cloud dependency").
- The Architecture Is the Moat — Jens Eriksvik (Medium); Context Is the Moat, and LLMs Are Commodities — Pallavi Thakur (Medium).
- Your Prompts Are Technical Debt — Rajasekar Venkatesan (Medium) ("a prompt is a contract"); Why Do LLM Applications Fail in Production? — The GenAcademy (Substack).
- LLMflation — a16z; LLM inference price trends — Epoch AI.
- The Silent Versioning Problem in AI Inference — DigitalOcean; LLM drift monitoring & deprecation timeline — Galileo.
- Why GPT-4o's sudden shutdown left people grieving — MIT Technology Review; #keep4o campaign — TechRadar.
- LLM gateways: OpenRouter, LiteLLM alternatives — Eden AI, Portkey vs LiteLLM vs OpenRouter — ToolHalla.
- LLM enterprise adoption statistics — Index.dev.


