How Bad Was Google’s Deindexing Bug?
On Friday, April 5, after many website owners and SEOs reported pages falling out of rankings, Google confirmed a bug that was causing pages to be deindexed:
MozCast showed a multi-day increase in temperatures, including a 105° spike on April 6. While deindexing would naturally cause ranking flux, as pages temporarily fell out of rankings and then reappeared, SERP-monitoring tools aren’t designed to separate the different causes of flux.
Can we isolate deindexing flux?
Google’s own tools can help us check whether a page is indexed, but doing this at scale is difficult, and once an event has passed, we no longer have good access to historical data. What if we could isolate a set of URLs, though, that we could reasonably expect to be stable over time? Could we use that set to detect unusual patterns?
Across the month of February, the MozCast 10K daily tracking set had 149,043 unique URLs ranking on page one. I reduced that to a subset of URLs with the following properties:
- They appeared on page one every day in February (28 total times)
- The query did not have sitelinks (i.e. no clear dominant intent)
- The URL ranked at position #5 or better
Since MozCast only tracks page one, I wanted to reduce noise from a URL “falling off” from, say, position #9 to #11. Using these qualifiers, I was left with a set of 23,237 “stable” URLs. So, how did those URLs perform over time?
Here’s the historical data from February 28, 2019 through April 10. This graph is the percentage of the 23,237 stable URLs that appeared in MozCast SERPs:
Since all of the URLs in the set were stable throughout February, we expect 100% of them to appear on February 28 (which the graph bears out). The change over time isn’t dramatic, but what we see is a steady drop-off of URLs (a natural occurrence of changing SERPs over time), with a distinct drop on Friday, April 5th, a recovery, and then a similar drop on Sunday, April 7th.
Could you zoom in for us old folks?
Having just switched to multifocal contacts, I feel your pain. Let’s zoom that Y-axis a bit (I wanted to show you the unvarnished truth first) and add a trendline. Here’s that zoomed-in graph:
The trend-line is in purple. The departure from trend on April 5th and 7th is pretty easy to see in the zoomed-in version. The day-over-day drop on April 5th was 4.0%, followed by a recovery, and then a second, very similar, 4.4% drop.
Note that this metric moved very little during March’s algorithm flux, including the March “core” update. We can’t prove definitively that the stable URL drop cleanly represents deindexing, but it appears to not be impacted much by typical Google algorithm updates.
What about dominant intent?
I purposely removed queries with expanded sitelinks from the analysis, since those are highly correlated with dominant intent. I hypothesized that dominant intent might mask some of the effects, as Google is highly invested in surfacing specific sites for those queries. Here’s the same analysis just for the queries with expanded sitelinks (this yielded a smaller set of 5,064 stable URLs):
Other than minor variations, the pattern for dominant-intent URLs appears to be very similar to the previous analysis. It appears that the impact of deindexing was widespread.
Was it random or systematic?
It’s difficult to determine whether this bug was random, affecting all sites somewhat equally, or was systematic in some way. It’s possible that restricting our analysis to “stable” URLs is skewing the results. On the other hand, trying to measure the instability of inherently-unstable URLs is a bit nonsensical. I should also note that the MozCast data set is skewed toward so-called “head” terms. It doesn’t contain many queries in the very-long tail, including natural-language questions.
One question we can answer is whether large sites were impacted by the bug. The graph below isolates our “Big 3” in MozCast: Wikipedia, Amazon, and Facebook. This reduced us to 2,454 stable URLs. Unfortunately, the deeper we dive, the smaller the data-set gets:
At the same 90–100% zoomed-in scale, you can see that the impact was smaller than across all stable URLs, but there’s still a clear pair of April 5th and April 7th dips. It doesn’t appear that these mega-sites were immune.
Looking at the day-over-day data from April 4th to 5th, it appears that the losses were widely distributed across many domains. Of domains that had 10-or-more stable URLs on April 4th, roughly half saw some loss of ranking URLs. The only domains that experienced 100% day-over-day loss were those that had 3-or-fewer stable URLs in our data set. It does not appear from our data that deindexing systematically targeted specific sites.
Is this over, and what’s next?
As one of my favorite movie quotes says: “There are no happy endings because nothing ever ends.” For now, indexing rates appear to have returned to normal, and I suspect that the worst is over, but I can’t predict the future. If you suspect your URLs have been deindexed, it’s worth manually reindexing in Google Search Console. Note that this is a fairly tedious process, and there are daily limits in place, so focus on critical pages.
The impact of the deindexing bug does appear to be measurable, although we can argue about how “big” 4% is. For something as consequential as sites falling out of Google rankings, 4% is quite a bit, but the long-term impact for most sites should be minimal. For now, there’s not much we can do to adapt — Google is telling us that this was a true bug and not a deliberate change.