By  Juan Orlandini / 30 Jun 2026 / Topics: Artificial Intelligence (AI) , Modern workplace , Generative AI

A few years ago, I had a water leak. Nothing dramatic, no geyser in the front yard or flooded basement. Just a quiet leak in the pipe running from the street to my house, somewhere underground where I couldn’t see it.
I found out the way most people do: The bill showed up — $1,500. I managed to talk the city out of the sewage portion, but I still owed for the water itself. Then I paid a contractor a few thousand more to dig up the line and fix it. And because the leak straddled a billing cycle, the city hit me with another $700 the next month for water that had already drained into the dirt.
Annoying, expensive, over. Or so I thought.
A couple of years later, we discovered that all that water had quietly carved out a cavity under my driveway. Repairing that cost me more than thirty thousand dollars, more than twenty times the water bill that started the whole thing. And I’ll never get that money back.
I’ve been thinking about that leak a lot lately, because it’s a near-perfect picture of what’s happening with AI right now.
For two years, tokens were billed like an unlimited tap. Twenty dollars a month, use as much as you like. So we did. Teams built habits around it. Engineers reached for a frontier model for every little thing. Products, contracts, and roadmaps all got built on the quiet assumption that the tokens were basically free.
Then the frontier labs flipped the model. Caps, quotas, metered pricing, overage charges. And suddenly everyone’s staring at a surprise bill, the same way I stared at mine.
With my leak, the bill was the part I could see, and it turned out to be the smallest cost of the whole episode. AI is in the same spot right now. The invoice everyone is staring at is real, but it’s only the water above ground.
The real cost is underneath, and almost nobody is talking about it.
That subsidy wasn’t an accident; it was the point. The frontier labs were buying a market, and the economics didn’t need to make sense yet. Flat subscriptions, generous limits, capability that cost them far more to deliver than they charged for it. They wanted us to drink from the open tap, and we did.
That era is over. Every major lab has moved from flat-rate subscriptions to metered, token-based pricing, with weekly caps and overage charges now the default. In August 2025, Anthropic capped its Pro tier at roughly 40 to 80 hours of use a week, with overages billed at API rates. OpenAI began rate-limiting Plus and added a $200 Pro tier above it. Google capped its Advanced tier, locking heavy users out mid-session.
The spread between models is staggering. The gap from the cheapest to the most expensive frontier token today runs about 4,500x. The same task, routed two different ways, can differ in cost by orders of magnitude, and most organizations have no idea which tap they’re drinking from.
The surprise bills are real, and they’re big. When Uber handed Claude Code to 5,000 engineers in December 2025, it burned through its entire annual AI budget by April — just four months. One healthcare enterprise ran up more than $6 million annualized unplanned spend after consuming around a trillion tokens in six months.
This is the conversation the whole market is having, and it’s a manageable one because it stops at the meter. The bill is visible. You can see it, add caching, route to smaller models, and set quotas to bring spend back under control. It’s painful but recoverable, bounded by what you actually spent.
So pay the bill. Just don’t mistake it for the problem.
Here’s what the invoice doesn’t show you. For two years, we spent money on cheap tokens and, more importantly, we made decisions on them. Structural decisions. Those assumptions are now baked into four foundations underneath the organization, where the water has been running the whole time:
The bill hit all four at once. The damage shows up later, one foundation at a time.
This looks like an engineering problem, contained to the people who write the prompts. It runs much wider than that.
Developers are the biggest consumers of tokens today. But the value of AI is reaching an increasingly wide gamut of people, including marketers, analysts, support reps, operations, and finance, and that group is growing far faster than the engineering org. They are also often the least equipped to understand the token implications of what they’re doing. The meter is invisible to them. They may not know that one phrasing of a request can cost a fraction of another, or that the “summarize this whole folder” button just spun up an expensive job.
So the controls can’t live only in engineering. If cost-awareness stops at the dev team, the leak simply moves to where nobody is watching the pipe. These guardrails need to cover developers and everyone else reaching for the tap.
If you want to see where this gets dangerous fast, look at agents.
Agentic tasks consume exponentially more tokens than a simple chat, and that’s a certainty rather than a guess. A single agentic run can burn around 1,000x the tokens of one chat prompt, and the cost can vary by as much as 30x from one run to the next, depending on the path the agent takes. Code that was cheap to run in a demo can be ruinous at production scale.
And here’s the part that might keep you up at night: We’re building agentic systems into the foundation right now, at speed, on the same “tokens are cheap” assumption that’s already proven false. We’re pouring more and more water through pipes we already suspect are leaking.
Put the two costs side by side.
The bill is visible and negotiable, painful but bounded by what you actually spent.
The foundation is the opposite, and far harder to reverse: skill gaps, misaligned pricing models, aging systems, and plans that no longer match current conditions. It doesn’t surface on this month’s invoice. It shows up cycles later, when a feature quietly stops making economic sense, and by then the money and the time are gone.
My driveway cost twenty times the water bill that revealed it. Foundational damage almost always dwarfs the invoice that points to it, which is why paying the bill and walking away is the most expensive move you can make.
The good news about my leak is that the cavity was findable. But looking sooner would have caught it while it was still damp soil instead of a $30,000 hole. The same approach holds here. Foundational damage is detectable if you ask the uncomfortable questions while a leak is still just a leak.
Five questions to find the water before it finds your foundation:
If those questions make you uncomfortable, good. That discomfort is far cheaper than the driveway.
Finding the water is the start. From there, here’s the order of operations I’d run.
Identify the leak. Instrument token consumption end to end. Most organizations genuinely don’t see where their tokens go, and you can’t manage what you can’t see. Make the flow visible before it pools somewhere expensive.
Stop the bleed. This part is recoverable, and it works. FinOps controls, caching, and right-sized routing routinely cut bills while adding the cost back-pressure your architecture never had. The playbook I keep coming back to is roughly 80/20: Send the bulk of inference to open or self-hosted models, and reserve the frontier models for the work that genuinely needs them. A weather lookup doesn’t need your most expensive model; a complex reasoning task might. Plenty of organizations have that ratio backwards right now, and correcting it is the fastest money they’ll save.
Repair the structure. This is the step most people skip, and the one that matters most. Rebuild the assumptions underneath: Re-price products against real inference cost, reset velocity expectations for a metered world, and invest in the team skills you’ll need when the tap tightens. None of it holds without a real operating layer underneath, with FinOps discipline, quota and identity controls, and governed data access. Without that layer, spend quietly routes back to unmanaged usage and the new discipline never sticks.
Lay durable foundations. Build so the next pricing shift never reaches the foundation. Past roughly a billion tokens a month, owned or hybrid infrastructure starts to beat frontier APIs outright — and owning that layer turns the next lab pricing change into a line item rather than a structural event.
The labs changed the economics. That part is settled, with no going back. The bill on your desk is the easy half of the story: visible, finite, and recoverable.
The worst part of a water leak is never the part you can see. It’s the part you find out about later, once it’s already done its damage. So pay the bill, then go check the foundation and ask the uncomfortable questions while the leak is still just a leak.
None of this is an argument to use less AI. I’m not shutting off the water. The value running through that pipe is real, and the teams that pull back now will fall behind the ones that keep building. The shift that matters is quieter: Stop treating the tap as free, and build on ground you’ve actually inspected. Do that, and you can use AI as hard as you want, on a foundation that holds.
If you’re wondering where the water has been collecting in your own organization, that’s a conversation worth having before the driveway gives out.