Googler’s Deposition Offers View Of Google’s Ranking Systems
A Google engineer’s redacted testimony published online by the U.S. Justice Department offers a look inside Google’s ranking systems, offering an idea about Google’s quality scores and introduces a mysterious popularity signal that uses Chrome data.
The document offers a high level and very general view of ranking signals, providing a sense of what the algorithms do but not the specifics.
Hand-Crafted Signals
For example, it begins with a section about the “hand crafting” of signals which describes the general process of taking data from quality raters, clicks and so on and applying mathematical and statistical formulas to generate a ranking score from three kinds of signals. Hand crafted means scaled algorithms that are tuned by search engineers. It doesn’t mean that they are manually ranking websites.
Google’s ABC Signals
The DOJ document lists three kinds of signals that are referred to as ABC Signals and correspond to the following:
- A – Anchors (pages linking to the target pages),
- B – Body (search query terms in the document),
- C – Clicks (user dwell time before returning to the SERP)
The statement about the ABC signals is a generalization of one part of the ranking process. Ranking search results is far more complex and involves hundreds if not thousands of additional algorithms at every step of the ranking process, from indexing, link analysis, anti-spam processes, personalization, re-ranking, and other processes. For example, Liz Reid has discussed Core Topicality Systems as part of the ranking algorithm and Martin Splitt has discussed annotations as a part of understanding web pages.
This is what the document says about the ABC signals:
“ABC signals are the key components of topicality (or a base score), which is Google’s determination of how the document is relevant to the query.
T* (Topicality) effectively combines (at least) these three signals in a relatively hand-crafted way. Google uses to judge the relevance of the document based on the query terms.”
The document offers an idea of the complexity of ranking web pages:
“Ranking development (especially topicality) involves solving many complex mathematical problems. For topicality, there might be a team of engineers working continuously on these hard problems within a given project.
The reason why the vast majority of signals are hand-crafted is that if anything breaks Google knows what to fix. Google wants their signals to be fully transparent so they can trouble-shoot them and improve upon them.”
The document compares their hand-crafted approach to Microsoft’s automated approach, saying that when something breaks at Bing it’s far more difficult to troubleshoot than it is with Google’s approach.
Interplay Between Page Quality And Relevance
An interesting point revealed by the search engineer is that page quality is independent of query. If a page is determined to be high quality, trustworthy, it’s regarded as trustworthy across all related queries which is what is meant by the word static, it’s not dynamically recalculated for each query. However, there are relevance-related signals in the query that can be used to calculate the final rankings, which shows how relevance plays a decisive role in determining what gets ranked.
This is what they said:
“Quality
Generally static across multiple queries and not connected to a specific query.However, in some cases Quality signal incorporates information from the query in addition to the static signal. For example, a site may have high quality but general information so a query interpreted as seeking very narrow/technical information may be used to direct to a quality site that is more technical.
Q* (page quality (i.e., the notion of trustworthiness)) is incredibly important. If competitors see the logs, then they have a notion of “authority” for a given site.
Quality score is hugely important even today. Page quality is something people complain about the most…”
AI Gives Cause For Complaints Against Google
The engineer states that people complain about quality but also says that AI aggravates the situation by making it worse.
He says about page quality:
“Nowadays, people still complain about the quality and AI makes it worse.
This was and continues to be a lot of work but could be easily reverse engineered because Q is largely static and largely related to the site rather than the query.”
eDeepRank – A Way To Understand LLM Rankings
The Googler lists other ranking signals, including one called eDeepRank which is an LLM-based system that uses BERT, which is a language related model.
He explains:
“eDeepRank is an LLM system that uses BERT, transformers. Essentially, eDeepRank tries to take LLM-based signals and decompose them into components to make them more transparent. “
That part about decomposing LLM signals into components seems to be a reference of making the LLM-based ranking signals more transparent so that search engineers can understand why the LLM is ranking something.
PageRank Linked To Distance Ranking Algorithms
PageRank is Google’s original ranking innovation and it has since been updated. I wrote about this kind of algorithm six years ago . Link distance algorithms calculate the distance from authoritative websites for a given topic (called seed sites) to other websites in the same topic. These algorithms start with a seed set of authoritative sites in a given topic and sites that are further away from their respective seed site are determined to be less trustworthy. Sites that are closer to the seed sets are likelier to be more authoritative and trustworthy.
This is what the Googler said about PageRank:
“PageRank. This is a single signal relating to distance from a known good source, and it is used as an input to the Quality score.”
Read about this kind of link ranking algorithm: Link Distance Ranking Algorithms
Cryptic Chrome-Based Popularity Signal
There is another signal whose name is redacted that’s related to popularity.
Here’s the cryptic description:
“[redacted] (popularity) signal that uses Chrome data.”
A plausible claim can be made that this confirms that the Chrome API leak is about actual ranking factors. However, many SEOs, myself included, believe that those APIs are developer-facing tools used by Chrome to show performance metrics like Core Web Vitals within the Chrome Dev Tools interface.
I suspect that this is a reference to a popularity signal that we might not know about.
The Google engineer does refer to another leak of documents that reference actual “components of Google’s ranking system” but that they don’t have enough information for reverse engineering the algorithm.
They explain:
“There was a leak of Google documents which named certain components of Google’s ranking system, but the documents don’t go into specifics of the curves and thresholds.
For example
The documents alone do not give you enough details to figure it out, but the data likely does.”
Takeaway
The newly released document summarizes a U.S. Justice Department deposition of a Google engineer that offers a general outline of parts of Google’s search ranking systems. It discusses hand-crafted signal design, the role of static page quality scores, and a mysterious popularity signal derived from Chrome data.
It provides a rare look into how signals like topicality, trustworthiness, click behavior, and LLM-based transparency are engineered and offers a different perspective on how Google ranks websites.
Featured Image by Shutterstock/fran_kie