Yandex’s source code was leaked, so we now know about the most important ranking signals for this search engine. I won’t link these documents here, but a quick search on Google or social media should help you quickly find them.
Even if you don’t care about Yandex (primarily used in Russia), this news is significant. It’s direct insight into the inner workings of a fully-fledged Google competitor.
Let’s see what we can learn from this leak about how to do better SEO. I’ll discuss some of the most exciting variables I found and how they can inform our thinking about search.
Yandex collects user information
The job of a search engine like Google, Yandex, or Bing is to answer a user’s query.
But to answer that query, it has to be understood. And the user’s specific intent must be inferred from everything the search engine knows about the user.
That’s why search engines collect as much user information as possible, such as previous searches, location, or device.
Yandex is no different, and we find evidence for it in the leaked data. For instance, Yandex collects the FI_REQUEST_IS_FROM_IOS variable, which checks if a given user is on an iOS device.
Yandex collects tons of website data
Yandex, just like Google and Bing, has an index of pages that can potentially answer their users’ needs. But to find pages best suited to help their users, they must analyze them thoroughly.
The leak surfaces tons of page- and domain-related variables used by Yandex as ranking signals.
Below are some examples which I found the most exciting or surprising.
- Yandex checks if a page has any map service implemented (FI_PAGE_HAS_MAPS_API),
- Yandex judges the quality of a given using the overall quality of the host – the website (FI_PAGE_QUALITY_HOST),
- Yandex checks if there is no NSFW content, including text, images, and videos,
- Yandex checks if a document contains user feedback/comments,
- Yandex judges the page by the last modification date and the number of known duplicates,
- Yandex pays attention to social posts from verified accounts that link to a given page.
There are over 18k various factors in the Yandex leak. I think they are worth studying, but it’s beyond the scope of this post. I only want to bring your attention to the vast range of analyses Yandex makes to categorize all pages in its index.
Yandex is using user behavior metrics
Like Bing, Yandex is using behavioral metrics to signal page quality.
Time spent on time matters:
- FI_BROWSER_HOST_CNT_DWELL_TIME_LOG checks the average time spent by a user on a specific website – this data is segmented per localization and country,
- FI_MORE_90_SEC_VISITS_SHARE checks the percentage of visits longer than 90 seconds,
- FI_MORE_160_SEC_VISITS_SHARE checks the percentage of visits longer than 160 seconds.
Yandex also uses immediate popularity as a ranking factor. It measures the average number of visits within three hours.
They also consider how deeply the average user interacts with the website (average session depth).
This points to similarities between Yandex and Bing.
Let me quote Bing’s documentation:
“Bing also considers how users interact with search results. To determine user engagement, Bing asks questions like: Did users click through to search results for a given query, and if so, which results? Did users spend time on these search results they clicked through or quickly return to Bing? Did the user adjust or reformulate their query?”
Yandex is using algorithms similar to Google’s
The leak also shows several factors that directly or indirectly correspond to some of the mechanisms we know Google is using.
- Both Google and Yandex use BERT.
- Both Google and Yandex use sitewide quality signals instead of only page-level signals (such as FI_PAGE_QUALITY_HOST).
- Both Google and Yandex use PageRank.
- Yandex also has rules for specific websites. For instance, Yandex treats Wikipedia links differently. Yandex also has rules for particular websites. For example, there is a factor named FI_DSSM_SUNHOME_POPULARITY. It checks the probability that sunhome.ru is a popular host for this query.
- Both Google and Yandex have a notion of YMYL pages. Yandex has a specific algorithm to detect host quality for medical websites (FI_MEDICAL_HOST_QUALITY_METRIC”). It also has neural models to detect content quality for financial and legal topics (FI_FIN_LAW_URL_QUALITY).
- Both search engines can annotate different parts of the content (so they understand page layout). We know Google uses a centerpiece annotation mechanism to differentiate between main content, supplementary content, and ads.
- Both Google and Yandex share some common basic ranking factors (such as mobile friendliness, which Yandex measures with the FI_IS_MOBILE_BEAUTY_HOST variable).
How to use the Yandex leak to be a better SEO
When you know the quality and relevance signals a search engine uses to surface the best content, it’s pretty easy to improve your rankings.
First, you check which ranking factors have the highest impact (they don’t all have equal weight in the ranking algorithm). Then you select the factors that are actionable for you and easy to improve on your end. Focus on improving these factors on your website and measure the impact.
I don’t expect Yandex will rewrite its codebase to prevent people from gaming with its algorithm. So if you want to improve your Yandex rankings, it’s now easier than ever – technically speaking.
But when it comes to Google, things aren’t this easy.
If you compare the search results for the same queries between Google, Yandex, and Bing, you’ll quickly notice significant differences. This points to the fact that even if they use similar ranking signals, they weigh them differently or use them for different query types.
But the Yandex leak is a tremendous opportunity to reverse-engineer how people running one of the most successful search engines in the world think. Study these documents to understand how a search engine sees your business and what you can do to improve your search visibility.
Lesson 1: Ranking signals, not ranking factors
There is a discussion among SEOs about what is a ranking factor and what is not.
We must change our thinking to reflect that we’re in the machine-learning era.
Let me discuss two examples: grammar errors and word count. Google officially denies both to be ranking factors.
Because they aren’t ranking factors. But they possibly contribute to your SEO success.
Google published a research paper about detecting high-quality content. The sample was extraordinary – 500M documents. The algorithm described took into account features like word count and grammar correctness. Surprised?
Word count is not a ranking factor in the sense that documents with a higher word count will get a better position.
But it can obviously be used as a ranking signal. Depending on the query and user, the ranking algorithm may or may not use it as a factor in sorting the search results.
Lesson 2: Search is more complex than we think
We chase after specific, measurable ranking factors. And we keep looking for straightforward answers to simple questions like “Is word count a ranking factor?”
According to the leak, Yandex is using 18000 different ranking signals. Similarly to Bing and Google, it’s a state-of-the-art search engine.
Do you expect Google or Bing to use just 200 ranking factors? And do you expect any single Google employee even to remember them all?
Chasing a handful of measurable metrics probably won’t make you successful.
Instead, think of how to be a good SEO. On the road to self-improvement, you cannot only focus on a single thing. Instead, adopt the search engine perspective to understand the place your pages can and should occupy on the SERPs. Then, make those pages unforgettable so that you don’t just acquire traffic but also make it work towards your ultimate business goals.