NEWS
Verax AI Risk Assessment is live. See what's exposed
blog

Why Multi-AI Is Harder Than It Looks

AI vendor lock-in is the deepest in the history of enterprise software. Here's why a deliberate multi-model AI strategy is the platform decision of the decade, why "just use a gateway" is not the answer people think it is, and why it's harder than it looks to implement.

A platform decision hiding inside a default API key

About 15 years ago now, I co-founded a company called CloudEndure, which we later sold to AWS. It was a critical time in cloud, as many companies that had mostly ignored cloud or attempted to build it in-house now found themselves realizing that there was a better way. 

Indeed, every technology wave has a moment where a handful of architectural choices quietly decide who wins and who gets left behind. The companies that treated the public cloud as a strategy became Amazon, Netflix, Airbnb. The ones that treated mobile as a feature instead of a platform were plowed over by companies that didn't. The decision looked technical but was actually existential.

The choice of how your company depends on AI models is that decision, happening now, and on a faster clock than either prior wave. Enterprise AI adoption has gone from under 5% in 2023 to over 80% by 2026, and enterprise spend on generative AI hit $37B in 2025, more than tripling year over year. Running more than one model is already the norm: in a16z's survey of 100 enterprise CIOs, 37% now run five or more models in production, up from 29% a year earlier.

And yet most organizations are making the dependency decision by accident by defaulting to whichever model their first prototype used. They wire it into the core of the product and only later discover that they have bet the company on a single vendor they cannot leave. 

The deepest lock-in we have seen

"We can always move later, it's just a migration cost." 

I heard this a lot back in my time at CloudEndure, and while perhaps costly and risky, it wasn’t untrue. It's worth being precise about why this wave is different, because the instinct from the cloud era  is dangerously wrong here.

Fundamentally, the public cloud was an easier way to run software you could already run. It removed the need to own hardware and data centers by letting you consume databases, queues, and storage as services. But the escape hatch was always there. You could move from one cloud to another, or even back to on-prem, whenever you liked. It was painful and expensive, but it was a question of cost, as the workload itself was portable.

AI breaks that. With complex and mission critical AI workflows, you pretty much have no choice but to use the frontier AI models, such as the ones offered by OpenAI or Anthropic. If you have AI at the core of your SaaS product, even the switch between the two is extremely hard, and trying to self-host the Chinese open-source models simply does not give you the same functionality as using top-of-the-line commercial models, which additionally are getting better and better all the time. So this is a lock-in of the highest degree. Once you've incorporated AI at the core of your product, there is pretty much no choice left, which is very different from what we saw in cloud.

The market is starting to name the same fear. 94% of IT leaders now cite vendor lock-in as a material concern and switching costs are not falling as the technology matures. They're rising. a16z's CIO survey found that the move to agentic workflows has made models harder, not easier, to swap: "all the prompts have been tuned for OpenAI… changing models is now a task that can take a lot of engineering time." The strategist Daniel Amundson puts the board-level version bluntly: "AI vendor concentration is the new single-cloud dependency", except most boards built explicit multi-cloud policies years ago and have no equivalent for AI.

Show Image

And "just self-host an open model" escape valve is a pipe dream. The open-weight gap has closed dramatically, as it's now single-digit percentage points on everyday work and roughly 6–9 months behind the frontier, with DeepSeek V4 and Qwen 3.7 Max genuinely competitive on many tasks. But "behind on benchmarks" is not "behaves identically in your product," and the operational burden of self-hosting frontier-class models is real. Keeping the self-hosted models up to date with the latest version is extremely time consuming and most organizations can't afford doing this, thus growing the gap between what they have and best of breed more and more as time passes.

So the exit exists on paper and narrows exactly where you need it most: the hardest, most defining work at the core of the product.

Are models interchangeable? The debate and the synthesis

There is a genuine, current argument about this, and getting it right is the whole game.

One camp says models have commoditized. In a widely shared essay from December, Jens Eriksvik argued that "AI models may have reached their disk-drive moment. The fantasy that a single model will protect your business like a medieval moat has officially expired." His conclusion is sharp: "The real question isn't 'which LLM should we bet on,' it is 'how fast can we swap it out.' When models become commodity components, strategy shifts from ownership to orchestration." Echoing it, others argue context, not the model, is the moat. And that the durable advantage is the architecture around the model, not the model itself. The economics back the commoditization pressure: inference for a given level of capability has been falling roughly 10x per year in what they call "LLMflation", with Epoch AI measuring a median ~50x annual price drop across benchmarks

The other camp (and the hard data) says not so fast. a16z's CIO survey reaches the opposite conclusion at the layer that matters: "it's clear that the enterprise model layer has not become commoditized," and "model differentiation by use case is the main reason enterprises buy models from multiple vendors." Enterprises report Anthropic is stronger for coding and writing, OpenAI for complex question-answering, and Gemini for cost. Differences in tone, reasoning style, and edge-case reliability don't show up in a headline benchmark but absolutely show up in production.

My take is that models may be commoditized, but still behave significantly differently from each other. AI models of different vendors are not easily interchangeable. For simple use cases such as AI-assisted coding or a chatbot, it's not the end of the world if you suddenly switch to a different model and it behaves differently. But when a core functionality of your AI-based product, which you carefully adjusted the prompts, the harness and the logic for, starts behaving differently, it can be a disaster.

And even when it’s not powering your product, the cost of AI switching can be quite high as your teams build more and more around a particular tool. In this case, it seems more akin to the cost of moving to the cloud, where it’s ultimately a cost question. But an individual or team’s growing reliance on specific tools, be it a particular mix of models that the creative team uses to produce videos or a tool that lives within Word for the legal team, will mean this shift isn’t limited to how the IT team operates but will affect the entire company. 

So models are hot-swappable commodities for the low-stakes, well-defined tasks where any capable model clears the bar. But inference platforms are not interchangeable, especially not for the core functionality that defines your product, where behavior is the experience. 

Show Image

Why a swap breaks things: the model changes under you

Here is what turns "not interchangeable" from a benchmark footnote into an operational risk. Even if you never decide to switch, the vendor switches for you.

A production prompt, as one widely cited 2026 migration guide put it, is really a contract. "A set of assumptions about how a specific model version interprets language, structures output, handles ambiguity, and manages edge cases. When the model changes, the contract is voided, even if the words in the prompt haven't moved." And models change constantly:

As one practitioner summary of the 2024–2026 build cycle put it: "demos work, production breaks", and the failure is rarely the model in isolation. It's the eval harness, the orchestration, and the observability nobody built because the demo didn't need them.

This is the paradox at the heart of the strategy: the lock-in is deepest precisely because the models aren't interchangeable, and their non-interchangeability is exactly why you can't naively rely on swapping them. You are locked in. You cannot safely treat the alternatives as drop-in replacements. Both are true at once and any honest multi-AI strategy has to hold both.

"Multi-AI" is not what most people think it is

When people hear "multi-AI strategy," they usually picture optionality on tap. Start using an LLM gateway like OpenRouter, LiteLLM, or Portkey and flip to whichever one is cheapest that week and can still perform the task. That capability is real and every serious AI org should have it. 

But sold as freedom, it's a half-truth. 

A gateway gives you a single API and a kill-switch; it does not give you the thing you actually need, which is confidence that a different model will behave correctly in your specific context.

A real multi-AI strategy is not "we can switch instantly." It is something more disciplined: we have deliberately reduced the cost and risk of our model dependency, and we route work across models on purpose with the evaluation, governance, and fallbacks to do it safely. As Eriksvik frames the upside, strategy shifts "from ownership to orchestration". But the goal isn't frictionless switching. It's that no single vendor's roadmap, price change, outage, or silent update can take your business hostage.

This reframes multi-AI from a cost play into a resilience play, which is why it belongs on the CISO's desk and in the boardroom as much as the CTO's. 41% of enterprises now deliberately run multiple agent platforms specifically to avoid concentration. It's the same logic the rest of the enterprise already applies to critical suppliers: you don't single-source a component your product can't function without.

What a real multi-AI strategy actually looks like

Five moves, in order. See it, evaluate it, route it, govern it, hedge it.

Show Image

1. Abstract: make model choice a configuration. Put a gateway or internal abstraction layer between your application and every provider, so the model behind any call is a setting, not hardcoded plumbing. Enterprises that built abstraction layers into their first deployment report adding or switching providers with materially less migration effort than those wired directly to one vendor's API. It won't make models interchangeable, but it removes the engineering cost of switching so that only the behavioral cost remains, which is what the next move addresses. (Relatedly: a16z found that avoiding heavy fine-tuning also reduces lock-in, because prompts port across models far more easily than fine-tuned weights.)

2. Evaluate continuously: own a regression suite for model behavior. The single highest-leverage investment is a golden test set: a battery of real, representative cases for each critical use case, with known-good outputs, run against every candidate model and every version of your current one. This is your defense against silent versioning and your prerequisite for ever swapping safely. Enterprises increasingly use external benchmarks as a Gartner-style first filter, but the buyers themselves stress that internal golden datasets and real trials are what actually decide it. 

3. Route deliberately: run a model portfolio. With abstraction and evals in place, send each kind of work to the right model. Frontier models (Claude Opus 4.8, GPT-5.5, Gemini 3 Pro) for the hardest reasoning; cheaper or open-weight models (Qwen 3.7 Max at ~$3.75/M, MiniMax 3 at ~$0.28/$0.42) for high-volume, well-defined tasks; self-hosted open weights for the most sensitive data. With inference costs falling ~10x a year, the savings from routing the low-stakes layer to commodity models compound fast. This is the spending barbell of AI spend expressed as a model portfolio. Bet big where capability decides the outcome, hold the line everywhere else.

4. Govern in-path: consistent controls across every model. The moment you run more than one model, two questions get hard and stay hard: which model handled this request? and are my safety, privacy, and quality controls applied consistently regardless of which one did? You cannot answer either from six vendor dashboards, let alone the hundreds or thousands enterprises are likely to be using in the very near future. You need an in-path control layer in front of all of them that is providing visibility and accountability (which model, which version, on which data) while also enforcing the same guardrails, redaction, and validation everywhere. This is the philosophy behind our view of data governance. You don't make AI safe by blocking it, you make it safe by governing how it's used.. Multi-model makes that layer non-optional, because it's the only place the consistency can live.

5. Hedge: treat models like a supply chain. Apply the procurement discipline you already use everywhere else: never single-source a critical input. Maintain a primary, a viable secondary, and an open-weight fallback for the paths your business can't lose, and keep at least two providers genuinely tested, so switching is practiced, not theoretical. Own your data, embeddings, and fine-tuning pipelines in portable formats so the IP travels with you rather than living inside one vendor's infrastructure. Run a periodic risk assessment across the models and tools actually in use, including the ones that arrived through shadow adoption, so you know your real exposure before a price hike or deprecation forces the question.

The honest limit

A multi-AI strategy reduces your dependency risk. It does not eliminate it, and anyone selling "total model freedom" is selling you a fantasy. For the core, capability-defining functionality of your product, some stickiness is irreducible.The best model is the best model, and you will want it. What a real strategy buys is leverage and survivability. You are no longer captive, you can route on purpose, you can switch the parts that are switchable, and you'll see a forced change coming instead of waking up to a broken product.

Further reading from Verax

Sources

Get started

Understand your AI risk.  Prevent data exposure.

Stay updated
with Verax insights

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.