The Yandex Leak: What the Code Exposed and What Still Applies
In January 2023, a former employee leaked 44GB of Yandex source code , including 17,853 ranking factors. This is what we learned, and what still holds up three years later.
April 23, 2026 · 9 min read

- January 2023: a former Yandex employee leaked 44GB of source code, exposing 17,853 ranking factors
- Yandex confirmed authenticity. SEOs worldwide spent weeks reverse-engineering the signal list
- Key findings: user behavior is a real ranking signal, domain age matters, ads hurt rankings, backlinks are less important than believed
- The Google API leak (2024) validated several Yandex findings, making the original leak far more credible in hindsight
- In 2026, the playbook from this data is still directly applicable , especially on clicks, freshness, and site architecture
January 27, 2023. A disgruntled former Yandex employee uploaded 44 gigabytes of source code to a Russian developer forum. In the file: the most complete picture of a live search ranking algorithm the industry had ever seen. The SEO world went sideways for about two weeks.
Yandex, the fourth-largest search engine on the planet, confirmed the leak was real. What was initially reported as 1,922 ranking factors turned out to be 17,853 once researchers dug into the full file structure. Not all of them were active , 988 were deprecated, 244 categorized as unused , but even the live subset was more transparency than Google had ever offered.
The obvious pushback was: "Yandex isn't Google. Why does this matter?" Fair question. Here's why it matters anyway: Yandex employs hundreds of ex-Googlers. Its search results overlap with Google's by roughly 70%. It uses PageRank. It uses BM25. It was engineered, in many ways, by people who built the same systems at Mountain View. The companies are competitors, not strangers.
More importantly, when Google's own internal API documentation leaked in May 2024, it confirmed several of the same signals Yandex had exposed 16 months earlier. That second leak gave the first one its teeth.
What the Code Actually Showed

The three categories of Yandex ranking signals exposed in the leak
Yandex's ranking factors fell into three buckets: static factors tied to your site, dynamic factors tied to the query, and user behavior factors. That third category is where things got interesting.
User behavior is a real signal
For years, Google publicly denied using click data as a direct ranking factor. Yandex had no reason to make the same denial. The leak confirmed that CTR, session duration, return visits within a month, and "last destination" behavior , whether a user stopped searching after visiting a page , all influenced rankings.
"Last destination" is the most underrated concept in the leak. Yandex rewarded pages that ended the search journey , pages so good the user never came back to hit refine.
Domain age and site history matter
New domains face a ranking ceiling until they've built trust signals. Content older than 10 years gets a freshness penalty. Sites with a long, consistent publishing history get a baseline boost. This was widely speculated before , the leak made it structural.
Ads kill rankings
Ad density was the highest-weighted negative factor in the model. Not just a little negative. The highest. Number of ad placements, background clickable ads, ad-to-content ratio , all penalized. Pages optimized for ad revenue at the expense of user experience were systematically demoted.
Traffic diversity signals quality
Pages that received traffic from only one source , say, entirely organic , were treated with suspicion. Yandex expected real pages to attract visitors from a mix of sources: organic, direct, social, referral. A page with no direct traffic looked like a ghost site that existed only for search bots.
Content must be cited to rank
One of the subtler signals: Yandex rewarded text that had been cited or linked by external domains , meaning the content was considered worth referencing, not just worth publishing. It also factored down pages where poor-quality content on the same site pulled down the overall quality score.
- User clicks and CTR: confirmed direct ranking factor
- Session duration and "last destination": rewarded pages that ended the search journey
- Domain age: newer domains face a trust ceiling, content older than 10 years gets penalized
- Ad density: highest-weighted negative factor in the model
- Traffic diversity: pages with single-source traffic flagged as suspicious
- Content citations: external references to your content boost its authority score
- Crawl depth: important pages must be within two clicks of the homepage
- User reviews: pages featuring them got prioritized in results

Click patterns, session depth, and return visits , all confirmed ranking signals in the Yandex code
Backlinks: Still There, Just Not the Point
Backlinks showed up throughout the Yandex data , Yandex absolutely uses them. But the weighting told a different story than the SEO industry had built its entire career around. Referral chains that artificially inflate popularity were penalized. Paid links and low-quality inbound links triggered demotions. And compared to user behavior signals, link equity weighed noticeably less.
This didn't mean backlinks were dead. It meant the link-first worldview that had dominated SEO for two decades was overdue for a correction. A well-linked page with poor user signals still ranked below a moderately-linked page that kept people on it.
Then the Google Leak Happened
In May 2024, thousands of pages of internal Google API documentation leaked. The details differed , it was a different company, a different codebase, a different audience. But several core themes matched what Yandex had exposed 16 months earlier:
- User engagement metrics influencing rankings
- Machine learning models (RankBrain, Navboost) incorporating behavioral signals
- Content quality and topical authority weighted heavily
- Backlinks present but diminishing in relative weight
- Localization and personalization as first-class ranking inputs
When two separate companies with separate codebases expose similar internal logic, you stop treating it as coincidence. The Yandex leak stopped being a curiosity about a Russian search engine and became a working model for how modern search actually operates.
What's Still Actionable in 2026
The code was from July 2022. Both companies have shipped significant algorithm updates since. So the question isn't "is this document current?" It isn't. The question is: which signals are structural enough to have survived the updates?
Structural signals don't flip overnight. The things that mattered in 2022 , user satisfaction, content depth, domain trust , are the same things that are harder to game in 2026.
Here's what holds up:
Build pages that end the search journey. If someone Googles a question and your page gives them a complete answer, they don't come back to refine the query. That behavioral signal , query satisfaction , was explicit in Yandex and implied throughout the Google leak. Thin content that bounces users back to the SERP is not just bad UX. It's a ranking signal going the wrong direction.
Freshness matters, but only if the update is real. Slapping a new date on a three-year-old page without updating the substance is detectable. Content that was published recently and links to current data, cites recent events, and answers what people are actually searching right now , that's what freshness signals reward.
Get traffic from more than one source. If your site exists only in search , no social mentions, no direct visitors, no referrals from other sites , it looks thin to algorithmic quality assessment. Brand-building and answer engine optimization aren't separate from SEO. They feed the same trust signals.
Crawl depth is architecture, not just SEO. The leak was explicit: important pages should be within two clicks of the homepage. That's basic information architecture, and it pays dividends in both discoverability and ranking weight.
Ad density is a real cost. If you're running a content site with heavy ad placements, you're paying a ranking penalty that the data now confirms. The highest negative weight in the Yandex model wasn't spammy links or thin content. It was ads.
Build something worth citing. Original data, proprietary research, strong takes on industry topics , these are the things other people link to organically. The leak confirmed that citations signal authority in ways that paid link profiles don't.
- Write pages that fully answer the query , don't make people go back to Google
- Update content with substance, not just timestamps
- Diversify traffic sources , social, direct, and referral matter algorithmically
- Keep important pages within two clicks of your homepage
- Cut ad density on pages you want to rank
- Create content worth citing, not just worth publishing

Links still matter , but their relative weight has been declining as behavioral signals rise
What the Leak Doesn't Settle
A few things remain genuinely unclear. The code was from 2022, and both Yandex and Google have released significant core updates since. How much the relative weights have shifted is unknown. Whether Google's incorporation of AI Overviews and answer engine behavior has changed the user signal model is speculative.
What's also unresolved: the extent to which AI-generated content has changed what "content quality" means in the model. The Yandex leak predates the generative AI explosion. The signals it described were designed for a world where humans wrote everything. That world ended in late 2022, and neither company has been transparent about how their systems adapted.
The leak gave us a map. But the terrain has shifted since it was drawn. The smart move is to treat the structural signals as durable , user satisfaction, content authority, site architecture , and stay skeptical about applying the exact weightings to systems that have been updated a dozen times since.
Sources: Search Engine Land Yandex Leak Analysis · Search Engine Journal deep dive · SISTRIX ranking factor breakdown