The Jagged AI Problem
Your team switched to a better LLM. Everything got worse. Here's why better models don't guarantee better results, and what actually separates winning teams from the rest.
The Upgrade Trap
Your team switched from GPT-4 to Claude Opus. Everything should be better, right? Wrong.
Yesterday's content calendar came out 30% worse. Your copywriter noticed your email subject lines stopped converting. The ad copy got too verbose. Customer support ticket resolution tanked.
This is the jagged AI problem: upgrading your model doesn't guarantee better results. Sometimes it guarantees worse ones. The reason isn't that the new model is bad. It's that you're trying to solve a complex task with a single, undifferentiated tool. Better reasoning doesn't fix bad architecture.
Response latency increase
Cost savings via routing
CTR drop from overthinking
The Competence Cliff
Reasoning models like OpenAI's o1 and newer Claude variants can solve harder problems than their predecessors. They think longer, check their work, and catch edge cases you'd miss.
They also take 10x longer to respond (bad for real-time tasks), cost 5-10x more per request, and overthink simple tasks. Your customer doesn't want the model's reasoning process. They want a subject line that converts.
The best model for the job isn't the newest or most capable. It's the right one for the specific task.
The Routing Problem Nobody Talks About
Here's what separates good AI implementations from bad ones: routing.
A fintech company discovered they were spending $8,000/month on o1 to write database queries. By routing to smaller models, they cut costs by 92% and got faster responses. Marketing teams are doing the same thing backwards, sending subject lines through their most expensive model.
What Breaks When You Don't Route Right
- Speed penalty: Reasoning models are slow. Email automation waiting 40 seconds loses time-sensitive conversions.
- Cost penalty: Paying $0.15/token for a task that costs $0.001/token hollows out your marketing budget.
- Quality penalty: Overthinking damages specificity. A 60-second subject line isn't a feature, it's overhead.
- Consistency penalty: Bounce between models and your voice drifts. Customers notice.
The Real Problem
The fundamental mistake is thinking models are fungible. They're not.
A SaaS company tested Claude Opus for product copy (worked great) and ad copy (CTR dropped 8%). The same model that improved one task made another worse. They reverted ad copy to GPT-4. Cost per conversion went down. Quality went up.
Most teams don't measure this. They just say "Claude is better" and move on. Then they wonder why performance tanked.
How to Fix It
- Audit current model usage. Where is Claude going? Where is GPT-4? Where are open-source models? Most teams have no idea.
- Measure outcome quality by task. Email response rate, conversion rate, time-to-completion. Don't assume. Measure.
- Map cost to outcome. If Claude costs 50x more but only improves performance 2%, that's a bad tradeoff.
- Route intentionally. Strategic tasks to reasoning models. Tactical tasks to fast, cheap models. Real-time to local models.
- Version by use case. Your marketing team benefits from different models than customer support. Lock it in per workflow.
The Bottom Line
Your team probably upgraded to a better model and didn't measure whether it actually improved anything. You just assumed it would. It probably made some things better and worse. You noticed the worse parts.
The teams winning right now aren't upgrading to bigger models. They're building smarter routing. They're using the right tool for the job instead of trying to hammer every nail with their shiniest hammer.
Ready to audit your AI stack?
Start a Strategy Call