Sign In Sign Up Free

How to fix Discovered Currently not indexed 

“Discovered – Currently not indexed” is a common indexing issue in Google Search Console. It means Google knows your page exists, but hasn’t visited it yet. 

This article explains why this happens and provides solutions.

Four reasons for “Discovered – currently not indexed”

There are four reasons for “Discovered – currently not indexed”: 

  1. Your website is slow and Google limits crawling to avoid overloading your servers. 
  2. The page is very new so Google hasn’t had a chance to visit and index it yet. Google may crawl and index it soon. 
  3. Your entire website is not a priority for Google
  4. One page of your website is not a priority for Google

Let’s explore the reasons and solutions for this problem.

Reason: your website is slow

Google tries to limit crawling on slow websites to avoid overloading servers. 

As a result, many pages can be classified as: “Discovered – currently not indexed”. 

Why does this happen? As Google’s Webmaster Trends Analyst, Gary Illyes, pointed out, Google aims to be a “good citizen of the web.” Faster websites allow for more crawling and indexing. 

The rule is simple: 

How to analyze website performance 

To analyze Performance we recommend using ZipTie.dev. As you can see on the screenshot below, this website has massive problems with performance – many parameters including response time are very weak, much worse than other websites in our database.

This is something that can negatively affect indexing. As a result, Google may classify many pages as “Discovered  – currently not indexed”.

Google’s documentation is wrong

Google’s Page Indexing Report documentation illustrates that the typical cause of the “Discovered – currently not indexed” message is due to Google not wanting to overload the site. 

Based on my observation, this is extremely inaccurate. In fact, it’s just one of many factors. 

On many occasions, I’ve seen websites with fast servers that were still struggling with the “Discovered –  currently not indexed errors.

Why? Because either crawl demand was extremely low, or Google was busy crawling other pages. 

Reason: your entire website is not a priority for Google 

Your website competes with the entire internet for Googlebot’s attention. This applies to both new and established websites. For new websites, Google needs time to collect signals about your site. However, even well-established websites can struggle with crawl demand.

Which signals make Google crawl more? Google is not willing to release the details of its secret sauce. However, based on public information provided by Google, we can compile the following list of signals

So you should find a way to optimize your website for these signals. 

Reason: your PAGE is not a priority for Google

It may also be the case that while the crawl budget for your overall website is high, Google might choose not to crawl specific pages due to either a lack of link signals, predictions of low quality, or because of duplicate content.

Further explanations for each of these causes are provided below:

  1. A page lacks the required signals to convince Google that it is important, for instance, if there are no links pointing to the page.
  2. Google’s algorithms predict that a page won’t be helpful for users. I.e., it falls under the pattern of low-quality pages causing Google to reject visiting it.  
  3. A page is a duplicate. Google recognizes patterns of duplicative URLs and tries to avoid crawling and indexing that type of content.  As Google’s Search Advocate, John Mueller stated: ” […] if we’ve discovered a lot of these duplicate URLs, we might think we don’t actually need to crawl all of these duplicates because we have some variation of this page already in there” 

What you should do

Step 1: Assess the Severity

The first priority is to check if the problem with “Discovered – currently not indexed” is severe. 

Go to Search Console and Indexing -> Pages and then scroll down to the Why pages aren’t indexed” section.   Then check how many pages are affected.  Is it just a small percentage of your website, or a massive part of it? 

Step 2: Review Affected Pages

Then review a sample of affected pages.  You can do this by clicking on “Discovered – currently not indexed”. 

If there are just a few important pages classified as “Discovered – currently not indexed”, make sure that there are internal links pointing to them, as well as whether these pages are included in sitemaps. This easy fix should work in most cases. 

Step 3: Address the Issue Thoroughly

If there are many important pages classified as “Discovered – currently not indexed”  that means the issue is broader.  Here’s your action plan: 

  1. Ensure your server is healthy.
  2. Focus on improving your website’s quality. Enhance the existing content and ensure that Google indexes only high-quality content. As explained in The Hidden Risk of Low-Quality Content, Google assesses quality based on indexed pages. To maintain a good reputation, only allow the indexing of high-quality content.
  3. Review your crawl budget. Make sure Google isn’t wasting resources on crawling low-quality pages.

How to analyze the crawl budget using ZipTie.dev? 

To analyze your crawl budget you can use ZipTie.dev. In this case just 1% of URLs are indexable, meaning 99% of URLs Google can visit are not intended for indexing. It’s a total waste of crawl budget: as for every 1 indexable URL, Google has to visit 99 non-indexable URLs. 

  1. Show Google that a page is important by creating internal links that point to the page.

FAQ

What is “Discovered Currently Not Indexed”?

“Discovered Currently Not Indexed” means Google knows about your page (through sitemaps or internal links) but hasn’t added it to its search index. This is different from not being crawled – the page may have been crawled but Google decided not to index it.

What causes pages to be “Discovered Currently Not Indexed”?

Common causes include low-quality or duplicate content that Google chooses not to index, limited crawl priority for new or less authoritative sites, technical issues preventing proper indexing, and content that doesn’t meet Google’s quality thresholds.

How can I check which pages are affected?

In Google Search Console, go to “Pages” under “Index” section, then find “Discovered – currently not indexed” under “Why pages aren’t indexed”. This shows all affected URLs.

Why isn’t Google indexing my important pages?

Google might not index pages if they’re similar to existing indexed content, the content quality is deemed insufficient, technical issues make indexing difficult, or the site’s overall authority is still developing.

How can I fix this issue?

Focus on these key areas: improve unique, high-quality content, ensure proper technical setup (clean URLs, proper internal linking), build site authority through quality backlinks, remove or fix low-quality and duplicate content, and use tools like Google PageSpeed Insights to check technical health.

When should I be concerned?

Be concerned if many important pages remain unindexed, the issue persists for several weeks, your competitors’ similar content is being indexed, or you’re losing traffic due to non-indexed pages.

Wrapping up

By following these steps, you can address the “Discovered – Currently not indexed” issue in Google Search Console and improve your website’s visibility in the search results. Keep monitoring your website’s performance in the Search Console and make necessary adjustments to maintain optimal visibility.

How to fix Crawled – currently not indexed

Crawled – currently not indexed is one of the most common reasons that a page isn’t indexed. Let’s explore how to fix this issue on your website!

When Google shows “Crawled – currently not indexed” in Search Console, it means Google has seen your page but decided not to include it in search results. This typically happens when Google finds issues with your content quality or technical setup. You can fix this by making your content better, solving any technical problems, and linking to the page from other pages on your site. Tools like Google Search Console and ZipTie.dev can help you track and improve how well Google indexes your pages.

How to check if a page is classified as “Crawled –  currently not indexed” 

To find affected pages, go to Google Search Console and click on Indexing -> Pages. 

Click on “Crawled – currently not indexed” to see a sample of up to 1,000 URLs. 

Three most common reasons for “Crawled – Currently not Indexed”

These are the three most common reasons:

  1. Google considers the page low-quality.
  2. There are problems caused by JavaScript SEO issues (i.e. Google doesn’t see content generated by JavaScript).  Yes, it still happens in 2024!
  3. The most surprising reason: Google isn’t convinced about the website overall.

Let’s look deeper into these three reasons

Let’s start with the most surprising reason:  Google isn’t convinced about the website overall.

It might be surprising, but a page might not be indexed because Google isn’t convinced about the overall quality of the whole website. 

Let me quote John Mueller: 

As ZipTie.dev shows, in the case of Growly.io the vast majority of pages are indexed, so we assume this website is not suffering from quality issues.

2.  Website suffers from JavaScript SEO issues

Sometimes a page might be high quality, but Google can’t see its value because it can’t render the website properly. This could be due to JavaScript SEO issues. 

So let’s imagine your main content is generated by JavaScript. If Google cannot render your JavaScript content, then your main content won’t be visible to Google and Google will wrongly(!) judge your pages as low quality. 

This can eventually lead to both indexing and ranking problems. 

Step 1:  Use Ziptie.dev to check JavaScript dependencies 

You can use ZipTie to fully check JavaScript dependencies. This way you will see a list of pages with the highest JavaScript dependencies. 

Then if you check them in Google Search Console and see they are classified as Crawled Currently not indexed, it’s a sign it may be caused by JavaScript SEO issues. 

Let’s take it step by step. 

While you create an audit, set an option for JavaScript rendering. 

Then you can see the average JS dependency:

In this case, JavaScript has a medium impact on the website which indicates the website may have indexing problems due to JavaScript. 

Then we can navigate to the list pages with the highest JS dependency: 

Step 2: Check if Google can properly render your page. 

Now we know which elements rely on JavaScript the most. Now it’s time to check if Google can properly render your page. 

I explain this in my article: How to Check if Google properly renders your JavaScript content. 

3. Reason: A page is unhelpful or low-quality

Google aims to provide users with relevant and high-quality content. If a page is low quality or outdated, it might not be indexed. 

To objectively assess your content’s quality, use Google’s list of Content and Quality questions.

What I do quite often is visit the Content Analysis section of ZipTie.dev.

And then I scroll through this report to see the differences between pages that are indexed and pages that are not indexed. 

As we can see on the screenshot above, it seems the average number of paragraphs for pages that aren’t indexed is 13 which is 44% lower than in the case of pages that are indexed.  We can then formulate the hypothesis that Google is not willing to index pages with lower amounts of content in the case of this website. 

Using Ziptie to find patterns of unindexed pages 

Also, GSC shows limited data – up to 1000 URLs. What I like is to use ZipTie and set filters: “Indexable: yes”; “Indexed: no”. 

Then I analyze patterns by looking at the URLs. As we can see on the screenshot below what tends to be not indexed is: 

Another option is to sort by main content word count. Pages with the lowest number of main content words are likely to be not indexed in Google (Google is not willing to index pages with little to no content).

If you want to make them indexed, follow the workflow I present in the latter part of the article. 

The two-step workflow to fix “Crawled – Currently not Indexed”.  

At a glance, the workflow to fix “Crawled – but not Indexed” is very easy: 

  1. Identify the reason why the page isn’t indexed (e.g., low-quality content, JavaScript issues, or website quality). To perform this step, I use a mix of Ziptie and Google Search Console. 
  2. Address the specific issue (e.g., improve the content, fix JavaScript problems, or enhance overall website quality).

All the steps are presented on the diagram below: 

First, you need to check if your content is good enough and if there are any technical problems. Make your content better by adding unique, helpful information. Also check if JavaScript is causing any issues. Once you’ve made these fixes, go to Google Search Console and ask Google to index your page.

Take control of your deindexed pages

In this article, I explained that one of the reasons for the status “Crawled – but not Indexed” is when a previously indexed page gets removed from the index. This occurrence is quite common, particularly during core updates.

Luckily, ZipTie.dev provides an Indexing Monitoring module that can help you with this. By using ZipTie, you can easily identify the specific URLs that have been deindexed by Google.

Don’t hesitate to give it a try and see the benefits for yourself! Check out our 14-day trial FREE OF CHARGE. 

The Pitfalls of Dynamic Rendering: How Disqus.com Lost 90% of Traffic

A few years ago, in 2018, I noticed a significant decline in traffic for Disqus.com, a website for a popular commenting plugin. Intriguingly, this drop coincided with Google’s March 2018 update. They haven’t recovered since then.

However, they could implement a simple fix to get back on track.

A Misguided Recovery Effort

You might assume that the site was impacted by the core update, and therefore, following Google’s recommendations for such situations, this would be the way forward. Google advises focusing on content quality after a core update.

The natural response would be for Disqus.com to invest in high-quality copywriters and content editors. 

However, even if they hired the best in the business, it would not have made a significant difference. 

Why? Because their problem was technical, not content related.

The technical challenge

Disqus.com uses a process called dynamic rendering. This means they present different versions of their site to Googlebot and to users. Googlebot gets a simplified, prerendered version, while users see a fully-featured version.

Unfortunately, the prerendering process failed, resulting in Googlebot receiving empty content. The HTML page body lacked any substance. This can be verified using the Mobile-Friendly Tester.

The Root Cause of the Problem

The root of the problem lies in Disqus.com’s use of dynamic rendering. It is likely that the mechanism that generates a static version for Googlebot malfunctioned. I’ve seen similar cases before, which resulted in a 50-90% decrease in organic traffic. The causes usually involved exceeding RAM usage or from errors during the rendering process.

A Possible Solution

Given that Disqus.com is a relatively small but popular website, the simplest solution might be to remove the dynamic rendering and observe how Google renders the content. Currently, their broken dynamic rendering system is sabotaging their Google traffic.

We contacted Disqus about this very issue but they didn’t respond. 

Google’s Perspective on Dynamic Rendering

In 2019, Google recommended dynamic rendering as a temporary fix for rendering problems. However, they recently updated their documentation to suggest that dynamic rendering is not a long-term solution. Instead, they now advocate server-side rendering.

Lessons to Learn from Disqus.com‘s Mistakes

As SEO professionals, we often rely on SEO crawlers to identify issues. While these tools are certainly useful, we sometimes overlook the basics, such as viewing websites from Google’s perspective.

A simple, quick check can be performed as follows:

This straightforward check can quickly help identify potential problems with how Google sees your content and could aid in resolving some indexing issues. Stay tuned for more insights!

Fixing “Discovered – currently not indexed” in Google Search Console

“Discovered – Currently not indexed” is a common issue in Google Search Console. It means Google knows about your page through links or via a sitemap, but hasn’t visited it yet. 

This article explains why this happens and provides solutions.

Four reasons for “Discovered – currently not indexed”

There are four reasons for “Discovered – currently not indexed”: 

  1. Your website is new. 
  2. The page is very new so Google hasn’t had a chance to visit and index it yet. Google may crawl and index it soon. 
  3. Your website is slow and Google limits crawling to avoid overloading your servers. 
  4. Your website is not a priority for Google. 

Let’s explore the reasons and solutions for this problem.

Your website is slow

Google limits crawling on slow websites to avoid overloading servers. 

As a result, many pages can be classified as: “Discovered – currently not indexed”. 

Why does this happen? As Google’s Webmaster Trends Analyst, Gary Illyes, pointed out, Google aims to be a “good citizen of the web.” Faster websites allow for more crawling and indexing. 

The rule is simple. 

Website speed is just one of the factors

Google’s Page Indexing Report documentation illustrates that the typical cause of the “Discovered – currently not indexed” message is due to Google not wanting to overload the site. 

Based on my observation, this is extremely inaccurate. On many occasions, I’ve seen websites with fast servers that were still struggling with the “Discovered –  currently not indexed errors.

Why? Because either crawl demand was extremely low, or Google was busy crawling other pages. 

Your website is not a priority for Google 

Your website competes with the entire internet for Googlebot’s attention. This applies to both new and established websites. For new websites, Google needs time to collect signals about your site. However, even well-established websites can struggle with crawl demand.

Which signals make Google crawl more? Google is not willing to release the details of their secret sauce. However,  based on public information provided by Google, we can compile the following list of signals

Your page is not a priority for Google

It may also be the case that while the crawl budget for your overall website is high, Google might choose not to crawl specific pages due to either a lack of link signals, predictions of low quality, or because of duplicate content.

Further explanations for each of these causes are provided below:

  1. A page lacks the required signals to convince Google that the page is important, for instance, if there are no links pointing to the page.
  2. Google’s algorithms predict that a page won’t be helpful for users. I.e., it falls under the pattern of low-quality pages causing Google to reject visiting it.  
  3. A page is a duplicate. Google recognizes patterns of duplicative URLs and tries to avoid crawling and indexing that type of content.  As Google’s Search Advocate, John Mueller stated: ” […] if we’ve discovered a lot of these duplicate URLs, we might think we don’t actually need to crawl all of these duplicates because we have some variation of this page already in there” 

We can simplify all of this detail with the below calculation:

What you should do

Step 1: Assess the Severity

The first priority is to check if the problem with “Discovered – currently not indexed” is severe. 

Go to Search Console and Indexing -> Pages and then scroll down to the Why pages aren’t indexed” section.   Then check how many pages are affected.  Is it just a small percentage of your website, or a massive part of it? 

Step 2: Review Affected Pages

Then review a sample of affected pages.  You can do this by clicking on “Discovered – currently not indexed”. 

If there are just a few important pages classified as “Discovered – currently not indexed”, make sure that there are internal links pointing to them, as well as whether these pages are included in sitemaps. This easy fix should work in most cases. 

Step 3: Address the Issue Thoroughly

If there are many important pages classified as “Discovered – currently not indexed”  that means the issue is broader.  Here’s your action plan: 

  1. Ensure your server is healthy.
  2. Focus on improving your website’s quality. Enhance the existing content and ensure that Google indexes only high-quality content. As explained in The Hidden Risk of Low-Quality Content, Google assesses quality based on indexed pages. To maintain a good reputation, only allow the indexing of high-quality content.
  3. Review your crawl budget. Make sure Google isn’t wasting resources on crawling low-quality pages.
  4. Show Google that a page is important by creating internal links that point to the page.

By following these steps, you can address the “Discovered – Currently not indexed” issue in Google Search Console and improve your website’s visibility in the search results. Keep monitoring your website’s performance in the Search Console and make necessary adjustments to maintain optimal visibility.

Google Search Console is not enough to solve your indexing problems

People frequently ask us about the differences between ZipTie.dev and the indexing reports provided by Google Search Console (GSC). While GSC is a useful tool, it has limitations that can make it challenging to use for certain tasks. Here are five ways in which GSC might fall short:

GSC Doesn’t Show all Unindexed URLs

GSC’s indexing reports such as “Crawled currently not indexed” or “Discovered currently not indexed” only display up to 1000 URLs. 

This might be enough for small websites, but for larger ones dealing with thousands or even millions of unindexed URLs, it’s insufficient.

In contrast, ZipTie.dev provides a comprehensive list of all unindexed URLs. This allows you to:

Lack of Detailed Information in GSC

GSC provides scant information about a particular URL; it doesn’t even display the title. Picture this: Google de-indexes 30% of your pages. When you go to GSC to pull up a sample of pages, you’re uncertain about their business relevance. You have to download a sample and run it through your SEO crawler, such as Screaming Frog, to glean more information. This process is time-consuming and inconvenient.

In contrast, ZipTie.dev provides detailed information about URLs. If you need more insights, you can easily export a detailed CSV file or click on ‘See Details’ to view things like:

  1. GSC Doesn’t Specify which URLs were Recently Deindexed

Consider this scenario: your business is experiencing a drop in traffic, and you notice that Google has deindexed 30% of your URLs. However, GSC doesn’t specify which URLs were deindexed due to its 1000 URL limit.

But there’s a solution! ZipTie provides Index Monitoring, allowing you to easily compare data between two indexing checks and identify the exact list of URLs that Google has deindexed. This knowledge is the first step towards addressing indexing issues.

Below you can see screenshots of how it works. As you can see, the website encountered a massive indexing drop near the date of the Google Spam update. 

Then you can see the exact list of URLs that were deindexed by Google, as presented in the screenshot below: 

Knowing which URLs got deindexed is the first step to fixing indexing problems. 

4. GSC Reports many Irrelevant URLs

It’s frustrating when you see that 30% of your URLs aren’t indexed, but you don’t know how many of these are actually valuable to your business. GSC tends to report URLs with parameters and random URLs that Google has discovered, which aren’t always useful.

With ZipTie, you can precisely check the indexing status of your URLs. You can export a full list of unindexed URLs and then: 

Random URLs Google Search Console reportsZipTie’s check
example.com/products?refid=45454
example.com/products?refid=43434343
example.com/products?refid=5454445
example.com/products?refid=6636
example.com/products?refid=45454
example.com/products?refid=45454
example.com/products/product1
example.com/products/product2
example.com/products/product3
example.com/products/product4
example.com/products/product5
example.com/products/product6

5. GSC Provides Outdated Information

Often, when we inspect URLs reported as “not indexed” in GSC, we find that they are, in fact, indexed. This is because GSC sometimes shows outdated information in their Index Coverage section.

Let me show you an example from Onely.com.  Here, Google Search Console reported the following URL: https://www.onely.com/blog/difference-between-featured-snippets-and-rich-results-explained/  as “crawled – currently not indexed”.

When I clicked on “URL inspection” it turned out that the URL is indexed. 

I double-checked it using the “site” command – it was clear that the page is indexed in Google.

Google was just showing outdated information in their Index Coverage section. 

Wrapping up

Indexing issues are becoming increasingly common and affect all kinds of websites, from small personal blogs to large e-commerce platforms. I’ve highlighted five main reasons why relying solely on GSC isn’t enough to address these issues.

Don’t waste time trying to tweak suboptimal indexing in GSC. Use ZipTie to take control of your indexing, save time, enhance SEO, and help you effectively solve your indexing problems.

Try our 14-day free trial today!

“Crawled – Currently not Indexed” Finally Explained

Crawled but not indexed is a common reason that a page isn’t indexed. Let’s explore how to fix this issue for your website!

What do we know about Crawled – Currently not Indexed? This could mean one of two things:

  1. The page was indexed beforehand, but later, it was deindexed. This is still quite common.
  2. The page has never been indexed. It was crawled, but Google chose not to include it in its index.

Three most common reasons for “Crawled – Currently not Indexed”

There are three most common reasons:

  1. Google considers the page irrelevant or low-quality.
  2. There are problems caused by JavaScript SEO issues (i.e. Google doesn’t see content generated by JavaScript). 
  3. The most surprising reason: Google isn’t convinced about the website overall.

Let’s look deeper into these three reasons

  1. Google isn’t convinced about the website overall.

It might be surprising, but a page might not be indexed because Google isn’t convinced about the overall quality of the website. John Mueller from Google explains that when Google’s algorithms aren’t sure about the overall quality of a website, they might crawl the URL but decide not to index it.

Let me quote John Mueller: 

2.  Website suffers from JavaScript SEO issues

Sometimes a page might be high quality, but Google can’t see its value because it can’t render the website properly. This could be due to JavaScript issues. 

To check if this is the problem, use the URL Inspection tool in Google Search Console to see how Google renders your content. Make sure important elements are visible and accessible.

To illustrate the problem I like to use the Disqus.com case. Users get a fully-featured website. 

And Google is getting an empty page. This is something that you will notice by using the URL Inspection tool in Google. 

How users see the pageHow Googlebot sees the page

If you see your page as high quality, but still the page is classified as “Crawled – currently not indexed”, check in the URL Inspection Tool to see how Google renders your content.

3. A page is unhelpful or low quality

Google aims to provide users with relevant and high-quality content. If a page is low quality or outdated, it might not be indexed. 

To objectively assess your content’s quality, use Google’s list of Content and Quality questions.

How to check if a page is classified as “Crawled –  currently not indexed”? 

To find affected pages, go to Google Search Console and click on Indexing -> Pages. 

Click on “Crawled – currently not indexed” to see a sample of up to 1,000 URLs. 

Remember that the report might show outdated information, so double-check using the URL Inspection tool.

The workflow to fix “Crawled – Currently not Indexed”.  

Here’s a workflow to fix “Crawled – but not Indexed” issues:

  1. Identify the reason why the page isn’t indexed (e.g., low-quality content, JavaScript issues, or website quality).
  2. Address the specific issue (e.g., improve the content, fix JavaScript problems, or enhance overall website quality).

Take control of your deindexed pages

In this article, I explained that one reason for the status “Crawled – but not Indexed” is when a previously indexed page gets removed from the index. This occurrence is quite common, particularly during core updates.

Luckily, ZipTie.dev provides an Indexing Monitoring module that can help you with this. By using ZipTie, you can easily identify the specific URLs that have been deindexed by Google. Don’t hesitate to give it a try and see the benefits for yourself! Check out our 14-day trial!

How to deal with “Duplicate, Google chose different canonical than user”

In this article, we will simplify a commonly seen Google-indexing problem: “Duplicate, Google chose different canonical than user.” 

In simple terms, this status means that Google didn’t agree with your canonical hints and chose a different page to the canonical version.  As a result, the page is not in Google’s index.

Canonical proposed by the websitePage selected by Google as canonical 
Page APage BPage C 

There are three common reasons for this status, two of them are less obvious for most SEOs, yet they occur quite frequently. 

How to navigate “Duplicate, Google chose different canonical than user”

First, I will show you some basics on how to navigate the report, how to check the number of affected pages, as well as how to check a sample of the affected pages.  

Then we will discuss how to fix this indexing problem.  

Finding Out How Many Pages Are Affected

First, you need to find out how many pages are affected.  To do this, go to Google Search Console and click on  Indexing -> Pages. 

In this case, almost half a million pages are affected: 

Looking at Examples of Affected Pages

When you click on “Duplicate, Google chose different canonical than user”, you can view a sample of affected pages (up to 1k URLs).

Figuring Out Which Pages Google Chose as the Canonical One

To figure out which page Google thinks is the canonical one, you need to use the “Inspect URL” feature.

To do this, hover over the URL and click on the icon of the magnifying glass.

Then you will get information that a page is not on Google:

Finally, scroll down to the ‘Page indexing’ section to see which URL Google treats as the main one. 

How to handle pages classified as “Duplicate Google choose different canonical”

Once you’ve figured out which page Google considers to be the main one, compare it to your chosen page.

Case 1: The Pages Look Similar

If the pages look pretty much the same, it makes sense that Google chose one of them as a duplicate; Google tries to keep its index clean and simple. 

If you want both pages to show up in Google, you will need to make them clearly different.

Case 2: The Pages Don’t Look Similar

Sometimes, Google gets it wrong and thinks two totally different pages are duplicates. 

For example, as I explained in the article: “Google’s duplicate content detection is wrong”, Google once thought that e-commerce pages offering an iPhone and a JBL speaker were the same.

So why is it even possible that Google might classify two totally unrelated pages as duplicates? 

This could be due to:

The Problem Might Be JavaScript

Gary Illyes from Google has warned that websites using a lot of JavaScript might run into duplicate-content issues. 

As Gary explained, in the case of JavaScript-heavy websites, Google can’t usually render a JavaScript website properly. 

As a result, Google won’t see any content – that can obviously make Google think these pages are in fact duplicates. 

This issue with JavaScript can cause other problems too, like ranking issues and problems with “soft 404” errors. 

As part of every audit, you need to make sure that Google can correctly understand your JavaScript content. Shortly we will publish an about how to audit JavaScript SEO so stay tuned! 

Patterns of Duplicate Content 

We’ve already talked about two possible reasons for duplicate content: problems with JavaScript and pages that look too similar. 

But there’s another possible reason: Google’s predictive approach. 

As John Mueller from Google explains: 

Fixing duplicate-content issues caused by Google’s pattern learning is a bit more complicated and depends on your website and its URL pattern. If you think this is happening to your website, reach out to us.

Wrapping up

“Discovered – currently not indexed” can cause serious damage to the visibility of your business in Google, especially when Google makes a mistake by wrongly classifying a page as duplicate content. 

You should regularly check Google Search Console to see which pages are affected by this very issue. Knowing which pages are affected is the first step to solving the indexing issues.

How to fix Soft 404 errors using Ziptie.dev

Soft 404 errors might not be as well known as other indexing statuses, such as “Crawled – but Currently Not Indexed” or “Discovered – Currently Not Indexed”, but they can significantly harm your website’s visibility on Google.

In this article I will show you how to fix them. 

The Impact of Soft 404 Errors

If a page is flagged as a soft 404, it means it isn’t indexed in Google. 

The problem is that quite often even high-quality pages are mistakenly identified by Google as soft 404s. 

In this article, I show you how to identify such problems.

Here are the three most common reasons for Soft 404:

  1. Low-Quality Content: This should be obvious, Google doesn’t want to index content that doesn’t meet the quality standards. 
  2. Technical Problems with Your Website: Surprisingly, even high-quality, relevant content can be mistaken as a soft 404 due to technical issues, such as problems with JavaScript SEO.
  3. Google’s Misinterpretation: Sometimes, Google can deindex thousands of pages because of some weird issues I discuss later in the article. 

Case Study 1: An E-commerce Website’s Deindexing – (50% drop) 

For example, one of our e-commerce clients implemented a layout change. They thought it was just a small change and didn’t need to be reviewed by the SEO team. That’s when most SEO horror stories start 🙂 

As a result, Google deindexed 800k URLs, totaling almost half of the website. The website lost around 40% of its traffic. 

We started our SEO analysis and found a short sentence in the code that wasn’t included in the old layout.  

There was a tiny, innocent, boilerplate phrase (“no product available”) in their code. The phrase wasn’t even visible on the screen, yet was included in EVERY product page. 

Easy fix

Getting rid of this boilerplate sentence helped our client get back on track (see the chart around August 2022). It took a while for Google to fully process the changes, but eventually, we got rid of these issues after Google’s March Update.

Case 2: 

Another client case – they noticed that Google can’t index any blogpost pages. Google Search Console was reporting soft 404.  The client thought it is a Google bug, and checked the pages in ZipTie.dev. 

ZipTie noticed that all these pages are soft 404, confirming Google Search Console’s findings.  

The client was totally surprised that both ZipTie and GSC showed the Soft 404 errors. 

Then I checked the source code, and there was a part of JavaScript code that contained the text: “Page not found”. 

Once the client deleted this fragment of code, Google started indexing the website properly. 

These two cases show that even high-quality pages can have problems with indexing due to a very tiny thing. 

Google’s Explanation

Google’s Martin Splitt shed light on this issue, noting that sometimes Google’s error page handling system can misinterpret a valuable page based on the keywords used, classifying it as a Soft 404 error page.

“This can also lead to very funny bugs, I would say, where, for example, you are writing an article about error pages in general, and you can’t, for your life, get it indexed.  And that’s sometimes because our error page handling systems misdetect your article, based on the keywords that you use, as a soft error page.” 

And that’s the reason on why two clients of ZipTie.dev had soft 404 issues even if their website quality was high.   Fortunately enough, in both cases, ZipTie spotted the issue so the websites could fix their indexing issues.

If they didn’t use ZipTie, they would probably still have this very issue and continue losing their traffic. 

The Role of JavaScript in Soft 404 Errors

JavaScript can sometimes cause issues with Google’s ability to see your content and eventually may cause your pages to be treated as a soft 404. 

Despite 80% of popular websites using JavaScript to generate essential content, Google still struggles with rendering JavaScript content in 2023.

Google’s ability to render JavaScript content isn’t an exact science. However, based on observations, small, popular websites should generally be fine. Larger websites, especially those with numerous JavaScript files per page, may encounter crawling and indexing issues. 

On top of that, Gary Illyes of Google highlighted in December 2022, that Google still struggles with rendering content on JavaScript-heavy websites. What’s more, Google’s recently updated documents on Dynamic Rendering also suggest avoiding client-side JavaScript rendering.

How ZipTie can help in diagnosing JavaScript SEO issues

ZipTie can help you diagnose JavaScript SEO issues. 

First, it will show you average content JS dependency. 

Then, it will show you pages with the highest dependency. 

What you can do is to check a sample of pages with the highest JavaScript dependency in Google Search Console. Analyze if Google classifies them as soft 404. If so, it’s very likely that it’s caused by JavaScript. 

If you suspect that JavaScript causes your soft 404 problems, you can analyze how Google renders your content, by using the URL Inspection tool.

I wrote an article showing you step-by-step how to detect JavaScript SEO issues.

Soft 404s can be fixed

Soft 404 problems are definitely fixable. Here are just three examples from some of our clients: 

Client 1:

Client 2:

Client 3:

Here’s the basic logic for fixing Soft 404 errors 

Step 1: Check the quality of the affected pages 

First you need to click on Indexing -> Pages.

You’ll see the number of affected pages. In my case, it’s 1107. 

Then, to view a sample of affected pages, you need to click on “Soft 404”.

Choose a sample of important pages identified as soft 404s and try to objectively judge their quality.

Consider improving page quality

If the pages are low quality, consider improving them. However, if you’re certain that a page is of high quality and it’s still classified as “Soft 404”, you need to:

If your website struggles with indexing, we strongly recommend using ZipTie.dev. 

We also offer consulting services to help you detect and solve indexing issues to restore your traffic back to normal. Questions? Comments? We’d love to hear from you. Feel free to reach out to us!

?>