The Jagged AI Problem

Your team switched to a better LLM. Everything got worse. Here's why better models don't guarantee better results, and what actually separates winning teams from the rest.

Dellon S.

May 13, 20268 min read

[Cover image: Marketing manager frustrated with three monitors showing different LLM outputs - user to add]

The Upgrade Trap

Your team switched from GPT-4 to Claude Opus. Everything should be better, right? Wrong.

Yesterday's content calendar came out 30% worse. Your copywriter noticed your email subject lines stopped converting. The ad copy got too verbose. Customer support ticket resolution tanked.

This is the jagged AI problem: upgrading your model doesn't guarantee better results. Sometimes it guarantees worse ones. The reason isn't that the new model is bad. It's that you're trying to solve a complex task with a single, undifferentiated tool. Better reasoning doesn't fix bad architecture.

10x

Response latency increase

92%

Cost savings via routing

CTR drop from overthinking

The Competence Cliff

Reasoning models like OpenAI's o1 and newer Claude variants can solve harder problems than their predecessors. They think longer, check their work, and catch edge cases you'd miss.

They also take 10x longer to respond (bad for real-time tasks), cost 5-10x more per request, and overthink simple tasks. Your customer doesn't want the model's reasoning process. They want a subject line that converts.

The best model for the job isn't the newest or most capable. It's the right one for the specific task.

[Editorial image: Laptop keyboard with AI routing diagram on screen - user to add]

The Routing Problem Nobody Talks About

Here's what separates good AI implementations from bad ones: routing.

✕Bad: "We have Claude, so Claude does everything."

✓Good: "We use Claude for strategy, GPT-4 for speed, and route each request accordingly."

A fintech company discovered they were spending $8,000/month on o1 to write database queries. By routing to smaller models, they cut costs by 92% and got faster responses. Marketing teams are doing the same thing backwards, sending subject lines through their most expensive model.

What Breaks When You Don't Route Right

Speed penalty: Reasoning models are slow. Email automation waiting 40 seconds loses time-sensitive conversions.
Cost penalty: Paying $0.15/token for a task that costs $0.001/token hollows out your marketing budget.
Quality penalty: Overthinking damages specificity. A 60-second subject line isn't a feature, it's overhead.
Consistency penalty: Bounce between models and your voice drifts. Customers notice.

[UGC image: Speed vs cost split screen with confused marketer - user to add]

The Real Problem

The fundamental mistake is thinking models are fungible. They're not.

A SaaS company tested Claude Opus for product copy (worked great) and ad copy (CTR dropped 8%). The same model that improved one task made another worse. They reverted ad copy to GPT-4. Cost per conversion went down. Quality went up.

Most teams don't measure this. They just say "Claude is better" and move on. Then they wonder why performance tanked.

How to Fix It

Audit current model usage. Where is Claude going? Where is GPT-4? Where are open-source models? Most teams have no idea.
Measure outcome quality by task. Email response rate, conversion rate, time-to-completion. Don't assume. Measure.
Map cost to outcome. If Claude costs 50x more but only improves performance 2%, that's a bad tradeoff.
Route intentionally. Strategic tasks to reasoning models. Tactical tasks to fast, cheap models. Real-time to local models.
Version by use case. Your marketing team benefits from different models than customer support. Lock it in per workflow.

The Bottom Line

Your team probably upgraded to a better model and didn't measure whether it actually improved anything. You just assumed it would. It probably made some things better and worse. You noticed the worse parts.

The teams winning right now aren't upgrading to bigger models. They're building smarter routing. They're using the right tool for the job instead of trying to hammer every nail with their shiniest hammer.

Ready to audit your AI stack?

Start a Strategy Call