Google’s Internal Search Documents Leaked
By Eric Maas on 31 May, 2024
A bombshell report came out this week, Google has accidentally leaked internal documentation for their Search API called, Content Warehouse API. This repository contained detailed instructions on how Google’s search systems operate. This is a big one for search engine optimization professionals and this leak has significant implications, as it reveals much about Google’s search algorithms and internal processes. Before we go further as of writing this we should note that Google has now verified the documents and responded to comments. We have had three days to look over the data and there is a lot to unpack from this leak, however the big takeaway for Fuelist Digital, our clients, seo agencies, and Digital Marketing as a whole is that it really does not change much. However, this does confirm a lot of our internal working theories on search optimization and it helps us fill in a lot of the mechanical blanks. It should be noted that this is not a search algorithm leak, this is just API documentation, but it is significant. While there are some areas where it is clear how aspects are weighted there are a lot of areas that are not known still. So try to think of these leaks more along the lines of someone leaking the ingredients to Coca Cola, you still do not know the quantities, cook times, cook temperatures, or any other process specifics that make the recipe. That said there are several things that are worth noting.
A bombshell report came out this week, Google has accidentally leaked internal documentation for their Search API called, Content Warehouse API. This repository contained detailed instructions on how Google’s search systems operate. This is a big one for search engine optimization professionals and this leak has significant implications, as it reveals much about Google’s search algorithms and internal processes. Before we go further as of writing this we should note that Google has now verified the documents and responded to comments. We have had three days to look over the data and there is a lot to unpack from this leak, however the big takeaway for Fuelist Digital, our clients, seo agencies, and Digital Marketing as a whole is that it really does not change much. However, this does confirm a lot of our internal working theories on search optimization and it helps us fill in a lot of the mechanical blanks. It should be noted that this is not a search algorithm leak, this is just API documentation, but it is significant. While there are some areas where it is clear how aspects are weighted there are a lot of areas that are not known still. So try to think of these leaks more along the lines of someone leaking the ingredients to Coca Cola, you still do not know the quantities, cook times, cook temperatures, or any other process specifics that make the recipe. That said there are several things that are worth noting.
- This validates a lot of what we at Fuelist Digital have believed for a long time. Some of our more controversial theories have been confirmed. Here are a few of them:
- Building sites for Google’s Experience as well as the end user
- Start with page speed optimization
- Leaning into structured data
- Forums and personal/hobby sites down ranked most likely due to low potential for ads purchasing.
- There are a lot of ranking factors Google uses to rank search results. 14,014 attributes, which can be considered ranking signals, spread across 2,596 modules. These signals include various metrics related to content quality, user interactions, backlinks, and more. The extensive list of attributes provides a detailed look at the factors Google considers when determining search rankings.
- Internal documents revealed how some aspects of Google’s algorithms are weighted and processed, offering a clearer picture of what influences search rankings.
- Our long term strategy of “spoon feeding” data to Google, proves to be an accurate way to manage site data for search optimization.
Take What Google Says with a Grain of Salt
We have been saying to take Google’s messaging around their search ranking factors with a grain of salt for over a decade now. These documents really hammer that point home. While this concept is not much of a surprise to most SEO professionals the extent of the misdirections are significant and much larger than we think most were expecting. Here are the more significant ones. The leaked documents revealed that data from Google Chrome is indeed used as part of Google’s search ranking algorithms, contrary to previous public statements denying this practice. Here’s how Chrome data is utilized:- Despite Google’s denial, the leak confirmed the existence of a “sandbox” where new or less trusted sites are segregated based on specific criteria.
- Contrary to previous public statements, the leaked documents confirmed that Google uses click data and user interactions to adjust rankings.
- Google representatives have repeatedly stated that they do not use “domain authority” as a ranking factor.The leaked documents revealed the existence of a metric called “siteAuthority” that Google uses internally to influence rankings.
- The internal documents contain detailed descriptions and methodologies that contradict the simplified explanations provided to the public, suggesting selective transparency.
- Google has positioned the disavow tool as a significant mechanism for webmasters to address bad backlinks. The leaked documents did not show direct integration of disavow data in the core ranking systems, suggesting its actual impact might be limited or used differently than publicly stated.
- Google has often dismissed the significance of specific user behavior metrics like dwell time and click through rate (CTR). The documents confirmed that Google tracks and uses various user behavior metrics extensively to refine search results and rankings.