Should my business self-host its own AI model?

Probably not yet, unless you are a heavy, steady user. Running your own model is a fixed monthly cost that only beats paying per use at very large volumes. For most businesses, paying per use on a paid model and watching your usage is cheaper and far less hassle.

Are open-weight models like Gemma safe to use?

Yes. When you run the model on your own server, your data never leaves it, so the model's origin never touches your information. Self-hosting is precisely what removes the data concern, rather than creating one.

What is usage-based AI pricing?

Instead of a flat monthly fee per person, you pay per token, which is best thought of as a unit of consumption, like a unit of gas or electricity. Your bill rises with how much your team uses the tool, so the cost is harder to predict than a fixed seat price.

Will I get locked into one AI provider?

Only if you build that way. Keep the instructions, data, and guardrails around the model portable, and you can change the model underneath, a cheaper one, a top-end one, or one you host yourself, without starting over.

Business Case 21 Jun 2026 6 min read

Stop asking which AI model is best

For two years the only question about AI was which model is best. That is quietly becoming the wrong question, and the answer changes what you pay each month and how much of your setup you actually control.

David Perkins · Founder, Perkins SmartOps

Quick answer

For most small businesses the answer is not to chase the best AI model or rush to run your own. It is to control two things: the bill and what you own. A flat per-seat subscription gives you a predictable monthly cost, while usage-based pricing trades that for a bill that moves with how much your team uses it, so set spend caps and watch what you actually use. Running an open model like Google's Gemma on your own server gives you a fixed cost and keeps your data in-house, but it only pays off at very heavy, steady usage, so most businesses are better off paying per use and staying aware. The deeper shift is that the model itself is becoming swappable, so the value, and the thing worth owning, is the setup you build around it: your instructions, your data, and the app your staff actually use. Keep that portable and you can change the model underneath without starting over.

The full story

For two years, the only question anyone asked about AI was which model is best. Which one writes better, reasons better, makes the nicer picture. It is quietly becoming the wrong question to fixate on, and fixating on it costs you in the two places a small business actually feels it: what you pay each month, and how much of your own setup you genuinely control.

When the bill stops being predictable

The real thing a £20-a-month seat gave you was not the AI. It was a number you could budget. Twenty pounds a head, the same every month, whatever your team did with it. Boring, and exactly what a business wants from a cost.

Usage-based pricing takes the boring away. Instead of a flat fee you pay per token. The easiest way to picture a token is like a unit of gas or electricity: a unit of consumption you are billed for, and the more the tool does, the more units it burns. You do not need to know anything more about it than that. The bill now moves with how much your people use the tool, and how much people use a tool is hard to predict.

This is not a worry on paper. Uber burned through its entire 2026 budget for AI coding tools in four months, and its chief operating officer openly questioned whether all that extra usage was producing proportionally more useful work. On 1 June 2026, GitHub moved every plan of its Copilot coding assistant onto usage-based billing, and within days users were posting screenshots of overage bills running from the hundreds into the thousands. A 2026 survey of nearly 1,200 finance and technology teams found that 73 per cent had seen their AI costs come in over budget.

The thing driving it is so-called agentic AI: tools that take many steps on their own to finish a job, rather than answering a single question. A chatbot replies once. An agent might make fifty calls to do the same task, and every call burns tokens. The flat monthly seat simply cannot keep pace with that, and the share of software sold on pure per-seat pricing is already shrinking, from around a fifth of firms to roughly one in seven in a single year.

Here is my own read on where this ends up. The cheap per-seat plan is on borrowed time. At today’s prices it is subsidised: the frontier AI companies are selling the seat for less than the usage behind it actually costs them, to win the market while it is still being carved up. That does not hold. As the subsidy comes off, I expect the big providers to move everyone onto usage-based pricing, where you pay for what you consume. The predictable seat will not disappear because people dislike it. It will disappear because the sums underneath it never added up. While you still have one, the nearer-term problem is usually not its price but getting real value from the seats you already pay for.

What running your own AI buys you

So if the meter is the worry, the obvious thought is: can I just run the thing myself for a flat cost? You can, and this is where open-weight models come in.

Most of the AI you know sits on someone else’s computer, and you rent access to it. An open-weight model is one you are allowed to download and run on your own machine instead. Google publishes a capable one called Gemma. There are others, including models named Kimi and DeepSeek. Some are built by Chinese labs, which understandably gives people pause, but here is the part that matters: once you run the model on your own server, its origin never touches your data. Your information does not leave your building. Self-hosting is the very thing that puts that worry to bed.

The shape of the cost is different, and for some it is better. Renting a server to run a mid-sized open model is a set monthly figure, the kind you can put in a budget and forget. A smaller model runs on modest hardware for around £40 a month, with no per-token meter and no usage limits at all.

It also answers the question every business handling client information should be asking of any AI tool: where does my data actually go, and who could be made to hand it over? When the model runs on infrastructure you control, the answer is simple, because the data never leaves it. That is the same instinct already pushing businesses to bring their automation in-house.

You can also tailor an open model to your business by training it further on your own examples. That is an option, not a requirement, and it is worth saying plainly because it gets oversold. For the everyday jobs, drafting an email, summarising a long thread, tidying up a document, a good open model out of the box, given clear instructions, is already enough. Most businesses will never need the tailoring step.

Self-hosting is for heavy users

Here is the honest part, the bit the breathless “cancel all your subscriptions” posts skip. Self-hosting is not free, and the sums only work at scale.

The model is free. The infrastructure is not. You are renting a graphics computer that costs the same whether your team hammers it all day or never logs in. Pay per token and you pay for exactly what you use, and nothing while you are idle. Run your own and you pay for the capacity around the clock, used or not.

The crossover, the point where running your own model beats paying to use someone else’s, only arrives at high, sustained volumes, well beyond what most small businesses get through. Where exactly that line sits is hard to pin down, and it shifts every time a provider changes its prices, so be wary of anyone who quotes you a precise figure. It is murkier than it looks, too, because the cheap plans are capped: lean on one hard and you hit its usage limit, then you are throttled or pushed onto metered pricing anyway. The honest signal is not a number. If your team is on AI heavily, every day, and keeps hitting the ceiling of its plan, you are the kind of user for whom running your own starts to make sense.

My own view on how far this goes: most small businesses will not self-host their AI, and they should not feel behind for it. Two reasons. First, most businesses need automation, not AI, and automation already runs perfectly well on your own server. Keep those two questions apart. Second, a small business can watch its AI usage closely, who is using it, for what, and at what cost, and as long as you have that visibility, paying per use keeps you in control without the headache of running models yourself. Self-hosting earns its place when you become a heavy, steady user, and most never will.

I will say this plainly, because it is the whole point: I have not self-hosted a model myself yet. The reason is exactly the argument above. My own usage does not justify the fixed cost of a server sitting there to run one, so I pay per use and keep an eye on it. It is on the list for when the maths changes. If you are in the same position, that is not a gap in your setup. It is the right call.

The model is becoming the easy part

Step back, and a bigger shift sits underneath all of this. For two years the model was the headline, and every new release was an event. That era is ending, because the models are converging. The gap between the best one and the merely good has narrowed to the point where, for most business tasks, several of them would do the job just as well.

When the model becomes interchangeable, it stops being where the value lives. Picture the model as an engine. A brilliant engine on a workbench does nothing. What you actually drive is the car built around it: the wheel, the pedals, the dashboard. AI is the same. The value is in everything wrapped around the model: the instructions that tell it how your business works, the memory of your past decisions, the links to your data, and the app your staff actually open and use.

The model is the engine. What makes it useful to your business is the car you build around it, and that should be yours to move.

And that wrapper should be portable. Built well, you can swap the model underneath, a cheaper one for routine work, a top-end one for the hard reasoning, an open one you host for the sensitive jobs, without rebuilding everything sitting on top. That is what people mean by being model-agnostic, and it is going from a nice idea to the sensible default. The early signs are already here, in tools that build the app and let you point it at whichever model you like. It is the same logic behind working out which AI tools are worth paying for and which you could rebuild yourself in an afternoon.

This is our whole point, said simply: own the thing that runs your business. If the lasting value is in the setup around the model, that is the part to own, rather than rent from whoever happens to hold this quarter’s best model.

What a sensible business does now

You do not need to pick a side in the self-hosting argument to act on any of this. You need three habits.

If you only do one thing

Put a spend cap on every metered AI tool before you hand it to your team. It is the single control that turns a runaway bill into a known one, and most providers now offer it.

Three habits that keep you in control, whichever way the market moves.

Watch the spend

Know what you spend on AI, and on what. You cannot control a bill you cannot see, and every runaway story starts with no cap and nobody watching.

Match task to model

Routine drafting and summarising can run on a cheap or open model. Save the expensive, top-end model for the genuinely hard reasoning. Paying frontier prices for everything is where budgets quietly vanish.

Keep it portable

Own the instructions, the data, and the guardrails around the model, so you can change the model itself without starting again. That is what stops you being locked to one provider.

Do those three, and the question of which model is best, or whether to host your own, stops being something to agonise over. You can change your mind later at little cost, which is exactly the position you want to be in while everything is still moving.

This is the kind of decision I help businesses think through. Not which model is the cleverest this month, but how to use AI without losing control of the bill or getting tied to one supplier: what belongs on a metered connection, what a fixed-cost open model is genuinely good for, and when self-hosting is worth it and when it plainly is not. The aim is to get you the value while keeping the cost predictable and the control in your own hands.

The takeaways

A flat per-seat subscription gives you a predictable cost. Usage-based pricing trades that for a bill that moves with use, so set spend caps and watch what you actually spend.
Running an open model like Gemma on your own server gives a fixed cost and keeps data in-house, but it only pays off at very heavy, steady use. For most, paying per use is cheaper.
Most small businesses need automation, not AI. Automation already self-hosts. Keep the two questions separate.
The model is becoming swappable. The lasting value is the setup around it, so own that and keep it portable across whichever model you choose.
You do not have to pick the perfect model. Stay aware, match the task to the model, and keep your setup portable, and you can change your mind cheaply later.

How this was written

Drafted by Otto, the Perkins SmartOps AI assistant. Reviewed, edited and published by David Perkins, the human.

Curious how this could work for your business?

Take the 2-minute assessment, or send me an email. I'll come back with something useful, not a sales pitch.

2-minute assessment

Stop asking which AI model is best

When the bill stops being predictable

What running your own AI buys you

Self-hosting is for heavy users

The model is becoming the easy part

What a sensible business does now

Curious how this could work for your business?

Claude Co-work Changed How I Build

What Happens When Your Competitor Uses AI

Automation vs Hiring: What's Cheaper for a Growing UK Business?