AI Strategy

Reasoning Models Don't Reason

OpenAI o1, DeepSeek, chain-of-thought inference. The hype promises strategic intelligence. The reality is slower, pricier, and no more accurate than the faster models you already have.

Dellon S.

May 25, 2026 · 9 min read

Software engineer frustrated by slow AI model response times on dual monitors in a dark office

30s+

o1 avg. response time

15x

cost premium over GPT-4o

200ms

paid search auction window

$47K

extra annual API cost per 1K agentic tasks

The Promise That Does Not Deliver

The hype is deafening. OpenAI's o1. DeepSeek's reasoning variants. “Chain-of-thought” and “extended inference” and “agentic reasoning.” All of it promising to finally crack the hard problems: genuine insight, nuanced customer understanding, strategy-level decision-making.

The problem is simple: reasoning models cannot reason. Not in marketing. Not for the problems you actually need to solve.

What they can do is take 10x longer to produce the same mediocre output. And in a business where speed determines ROI, mediocrity that arrives three hours later is worthless.

The Reasoning Illusion

Here is what a reasoning model actually does: it thinks step-by-step. Sounds revolutionary until you realize that “thinking step-by-step” is just code for producing longer, more verbose outputs that use up more tokens.

OpenAI's o1 spends 30+ seconds thinking about a problem a GPT-4o solves in 2 seconds. The output is marginally better. Sometimes. For certain tasks like advanced math or competitive programming, the difference is real. You are solving novel problems in domains where there is a clear right answer.

Marketing problems have no clear right answer. An audience segmentation strategy is not provably correct. A campaign narrative is not mathematically derivable. A product positioning is not discoverable through chain-of-thought inference.

Reasoning models will tell you they are thinking harder about these problems. They are not. They are just generating more tokens. Token output is not insight.

Close-up of hands typing on a keyboard with LLM cost comparison charts visible on monitors in background — The token cost compounds fast in agentic workflows. What looks like a model upgrade becomes a budget line item.

Why It Feels Like Reasoning

The magic trick: a reasoning model writes out its thinking process, and humans interpret that output as evidence of actual reasoning.

“The model considered three different approaches and rejected two of them because...” That is not reasoning. That is pattern completion trained on text where humans explained their reasoning. The model learned to mimic the linguistic signature of reasoning, not the actual cognitive process.

It is indistinguishable from reasoning if you squint. But it breaks immediately under pressure.

Test case: ask a reasoning model to do basic numerical counting in a large dataset. Standard o1 fails. Ask it to track a narrative across 100K tokens. It hallucinates. Ask it to understand why a specific customer cohort behaves differently. It makes up plausible-sounding stories.

Ask a faster model (GPT-4o, Claude 3.5 Sonnet, Gemini 2.0) the same questions. Same failure rate. But 10x faster and cheaper. The reasoning overhead buys you length, not accuracy.

The Cost Paradox

An o1 API call costs roughly 10-15x more per token than GPT-4o. And it is slower. You are paying more to wait longer for the same likelihood of being wrong.

In agentic marketing workflows where a system runs 100 decision loops per day, that math explodes. One thousand tasks using o1 instead of GPT-4o costs $50,000 instead of $3,000. The output quality difference is noise.

Worse: a reasoning model's slowness makes it unusable for real-time decision-making. Paid search auctions close in 200ms. Programmatic display bids in 50ms. Email send windows last 2 hours. A reasoning model that takes 30 seconds to “think” about whether to place a bid is already 29.8 seconds too late.

Most marketing decisions have real-time or near-real-time constraints. Reasoning models violate every one.

Marketer on their phone scrolling through AI model pricing at a coffee shop, looking frustrated — The sticker shock hits differently when your AI stack bill arrives at $47K more than last quarter for the same output quality.

When Reasoning Models Actually Help

There are two legitimate use cases, and neither is your marketing stack:

Strategic planning with time to spare. Once per quarter, write a white-paper strategy document. You have a week. A reasoning model can produce a more thorough 10,000-word analysis than a standard model. But you could also brief a human strategist in an hour. The reasoning model is slower and less actionable.

Novel problem-solving in non-marketing domains. Pure R&D. Scientific research. Algorithm design. Competitive programming. These are domains where right answers exist and can be verified. Reasoning models win here. Your ad campaign is not one of these domains.

What's Actually Happening in Your Stack

Your LLM usage in marketing falls into three buckets, and reasoning models fail all three:

Copy generation

Write an ad, email, product description. Reasoning model: overkill. GPT-4o: perfect.

Pattern detection in existing data

Find high-intent signals in customer behavior, analyze past campaign performance, suggest optimizations. Reasoning models: no better than standard models. Both hallucinate. Both make up patterns that do not exist.

Decision-making under uncertainty

Should we bid on this keyword? Pause this ad set? Adjust this audience? Reasoning models are worse here. They take too long. The decision window closes.

The dirty secret: your best LLM results come from simple pattern completion at scale, not sophisticated reasoning. You run 1,000 variants. LLMs write copy for all 1,000. The winner won by ~12%. A reasoning model would write 100 variants, very slowly. Same 12% lift. Three weeks late.

“Token output is not insight. Reasoning overhead buys you length, not accuracy. Your marketing decisions cannot afford the wait.”

The Honest Assessment

Reasoning models are a tax on marketing teams who believe in AI magic. They look impressive. They produce verbose, confident-sounding output. They use a lot of tokens, which makes vendors rich. But in the specific domain of marketing decision-making, they are slower, more expensive, and no more accurate than their faster counterparts.

Your AI model cost stack should be tiered by decision speed, not by model prestige:

Real-time decisionsGPT-4o mini, Claude 3.5 Haiku, Gemini 2.0 Flash

Batch decision-makingGPT-4o, Claude 3.5 Sonnet, Gemini 2.0

Reasoning modelsNot in your stack

Spend the time you are not waiting for o1 to think. Talk to your customers instead. That is still something no model can replace.

Back to all posts