Google’s Internal Search Documents Leaked

By Eric Maas on 31 May, 2024

A bombshell report came out this week, Google has accidentally leaked internal documentation for their Search API called, Content Warehouse API. This repository contained detailed instructions on how Google’s search systems operate. This is a big one for search engine optimization professionals and this leak has significant implications, as it reveals much about Google’s search algorithms and internal processes. Before we go further as of writing this we should note that Google has now verified the documents and responded to comments. We have had three days to look over the data and there is a lot to unpack from this leak, however the big takeaway for Fuelist Digital, our clients, seo agencies, and Digital Marketing as a whole is that it really does not change much. However, this does confirm a lot of our internal working theories on search optimization and it helps us fill in a lot of the mechanical blanks. It should be noted that this is not a search algorithm leak, this is just API documentation, but it is significant. While there are some areas where it is clear how aspects are weighted there are a lot of areas that are not known still. So try to think of these leaks more along the lines of someone leaking the ingredients to Coca Cola, you still do not know the quantities, cook times, cook temperatures, or any other process specifics that make the recipe. That said there are several things that are worth noting.  
  1. This validates a lot of what we at Fuelist Digital have believed for a long time. Some of our more controversial theories have been confirmed. Here are a few of them:
    • Building sites for Google’s Experience as well as the end user
    • Start with page speed optimization
    • Leaning into structured data
    • Forums and personal/hobby sites down ranked most likely due to low potential for ads purchasing.
  2. There are a lot of ranking factors Google uses to rank search results. 14,014 attributes, which can be considered ranking signals, spread across 2,596 modules. These signals include various metrics related to content quality, user interactions, backlinks, and more. The extensive list of attributes provides a detailed look at the factors Google considers when determining search rankings.
  3. Internal documents revealed how some aspects of Google’s algorithms are weighted and processed, offering a clearer picture of what influences search rankings.
  4. Our long term strategy of “spoon feeding” data to Google, proves to be an accurate way to manage site data for search optimization.

Take What Google Says with a Grain of Salt

We have been saying to take Google’s messaging around their search ranking factors with a grain of salt for over a decade now. These documents really hammer that point home. While this concept is not much of a surprise to most SEO professionals the extent of the misdirections are significant and much larger than we think most were expecting. Here are the more significant ones.  The leaked documents revealed that data from Google Chrome is indeed used as part of Google’s search ranking algorithms, contrary to previous public statements denying this practice. Here’s how Chrome data is utilized:
  1. Despite Google’s denial, the leak confirmed the existence of a “sandbox” where new or less trusted sites are segregated based on specific criteria.
  2. Contrary to previous public statements, the leaked documents confirmed that Google uses click data and user interactions to adjust rankings.
  3. Google representatives have repeatedly stated that they do not use “domain authority” as a ranking factor.The leaked documents revealed the existence of a metric called “siteAuthority” that Google uses internally to influence rankings.
  4. The internal documents contain detailed descriptions and methodologies that contradict the simplified explanations provided to the public, suggesting selective transparency.
  5. Google has positioned the disavow tool as a significant mechanism for webmasters to address bad backlinks. The leaked documents did not show direct integration of disavow data in the core ranking systems, suggesting its actual impact might be limited or used differently than publicly stated.
  6. Google has often dismissed the significance of specific user behavior metrics like dwell time and click through rate (CTR). The documents confirmed that Google tracks and uses various user behavior metrics extensively to refine search results and rankings.

Holistic Approach 

Our practice has been built on a holistic approach to SEO, and what these documents show is that we are not going to change much, if anything, as this is still the best path forward. A holistic approach means treating a website as a product and SEO as product management. Integrating multiple aspects of website management, design, development, systems, content creation, and user experience to optimize search engine rankings effectively. This strategy starts with a technical SEO foundation, ensuring a clear site structure, mobile optimization, fast page speeds, robust security, and error-free functionality.  High-quality content is critical to drive value, relevance, freshness, and natural keyword integration, with comprehensive coverage of topics to meet user intent. User engagement is prioritized by improving click-through rates (CTR), dwell time, and bounce rates, alongside tracking user interaction signals. Building site authority through a strong backlink profile, enhancing domain authority, and showcasing expertise. A user-friendly design, accessibility, and interactive elements contribute to a superior user experience.  Local SEO is addressed by optimizing local listings and creating relevant local content. Data-driven optimization through analytics, A/B testing, and continuous refinement of site elements ensures performance improvements. Finally, ethical practices, prioritizing transparency, honesty, and a user-centered approach, are fundamental to prevent penalties and ensure long-term success. This comprehensive strategy ensures that all aspects of a website work together harmoniously to achieve and maintain high search engine rankings.

So what is missing? 

We really do not know all the weighted structures and we really do not know the exact business motivations behind these ranking factors. At this point I think it is silly to think that Google does not have a business interest in their search results, which leads us back to speculation, educated guesses, hypotheses, and testing to figure out what they are doing.  The primary motivation for Google’s actions remains clear: return on investment (ROI). By improving site efficiency and enhancing user experience, websites can achieve higher rankings, which in turn benefits Google’s bottom line. Ultimately, more efficient sites lead to better search rankings, reinforcing the importance of optimizing every aspect of your site for both Google and end users.