Indexing has been a popular topic in the SEO industry for a while.

Seeing so many posts, articles, and forums discussions about indexing had me realize that many aspects of indexing SEO are still confusing for webmasters.

This made me think about ways to help them understand their indexing issues and look for solutions.

This article is a list of the most common reasons your pages aren’t indexed.

If you struggle to get your valuable pages indexed and don’t know which site optimization aspects to focus on or where to start, this article is for you.

You will learn how to identify the issues causing your pages to not be indexed, why they occur, and what recommendations I have for fixing them.

Let’s start with the basics.

1. Your pages are non-indexable.

Google won’t index a given page if you clearly instruct it that a page shouldn’t be indexed. There are many ways to do that, some providing stronger signals to Google than others.

One way to make a page non-indexable is by adding a “noindex” meta tag to it – it would look like this: <meta name=”robots” content=”noindex”>

Google won’t index it. Period. 

Unfortunately, it’s common for webmasters to add “noindex” tags by mistake.

To make sure this isn’t the case for you, check the list of all pages with a “noindex” tag to ensure the tags are only placed on pages that really shouldn’t be indexed.   

Use a crawler like OnCrawl or Screaming Frog. After crawling the site, you will be able to see any “noindex” directives added to your URLs. You can export the crawl data and go through the URLs with “noindex” to see if they were mistakenly added to any valuable pages.

But there are other signals that tell Google your pages shouldn’t be indexed. However, these signals aren’t definite and, in some situations, Google could still index such pages.

Your pages may not be indexed if they are:

Note that it may have been your intention to make these pages non-indexable – but if some of your pages aren’t indexed, and it feels like a mistake, ensure it’s not because of the issues mentioned above.

Look at the Excluded section of the Index Coverage report. Pay attention to URLs appearing with the following statuses indicating that the specified pages can’t be indexed:

  • Blocked by robots.txt,
  • Excluded by the ‘noindex’ tag,
  • Alternate page with proper canonical tag,
  • Page with redirect,
  • Not found (404).

2. There is a JavaScript SEO issue on your page.

As Bartosz Góralewicz showed, Google had tremendous issues with rendering JavaScript in the past. 

The process of downloading, parsing, and executing JavaScript is time-consuming and resource-heavy for Google

Over the years, Google did an excellent job improving its rendering, but there is still a risk that Google won’t index your JavaScript content. 

Here is when Google might not index your JavaScript-based content:

  • If it doesn’t have enough crawl budget for your site,
  • If it doesn’t view the JavaScript elements as crucial to the main content on the page, 
  • If your JavaScript files are blocked in robots.txt,
  • If Google experiences errors or timeouts during rendering,
  • If Googlebot needs to scroll or click to view some content – Googlebot doesn’t interact with pages the way users do. For example, if you implement infinite scroll, the page won’t load additional content until a user scrolls down the screen. But Googlebot will likely only crawl and index what it first sees on the page. 

What exactly could happen if JavaScript is not rendered and crucial content on a page relies on it?

Here is an example of Angular.io: if JavaScript isn’t rendered, the only content Google will see is: “This website requires JavaScript.”

Another example of a site that is hurt by its implementation of JavaScript SEO is disqus.com. Disqus uses dynamic rendering in the form of prerendering, which would present Google with a static page version.

This solution is generally recommended by Google but, in this case, the page doesn’t get rendered correctly, likely due to faulty implementation.

The result? Googlebot is getting an empty page:

To mitigate JavaScript SEO issues, ensure your essential content can be accessed by Googlebot with JavaScript enabled and disabled. If Google has issues with JavaScript on your site and JavaScript is used to generate your key content, your JavaScript-heavy pages may not be indexed.

Usually, URLs with a JavaScript-related problem will be classified by Google’s Index Coverage as: 

Be sure to familiarize yourself with Google’s JavaScript SEO best practices.

3. Page is classified by Google as soft 404.

Google uses many tools to ensure the web pages it shows in search results are of the highest quality and provide a positive user experience.

One of the tools that Google utilizes is a soft 404 detector. If a page is detected as soft 404, it won’t get into Google’s index.

Soft 404s are not official response codes on websites. A 404 page returns a correct 200 status code, but its contents make it look like an error page, e.g. because it’s empty or contains thin content – or so Google thinks.

Barry Adams shared a very interesting example where a website’s pages were interpreted as soft 404s: 

https://twitter.com/badams/status/1476603401840611330

As you can see, a soft 404 detector, like every mechanism, is prone to false positives. It means that your pages may be wrongly classified.

Google could wrongly classify your pages in a few cases: 

  1. Google can’t properly render your JavaScript content. Ensure you’re not blocking JavaScript in robots.txt and that Googlebot can render your crucial resources.
  2. Google found some words it typically associates with soft 404 pages, such as: “page not found” or “product unavailable.” In this case, adjust your copy. Depending on the situation, you may want to redirect such pages or make them 404s. 
  3. The page should be a 404 page that mistakenly responds with a 200 status code. This could be the case if you decide to create a custom 404 page. You need to configure your server to respond with a 404 status code.
  4. A redirect has been implemented, but the target page isn’t thematically connected to the origin page. Redirect it to the closest matching alternative.

Google’s selection of pages as soft 404s was impacted by its Caffeine update – here is how Gary Illyes explained it:

“Basically, we have very large […] corpora of error pages, and then we try to match text against those. This can also lead to very funny bugs, I would say, where, for example, you are writing an article about error pages in general, and you can’t, for your life, get it indexed. And that’s sometimes because our error page handling systems misdetect your article, based on the keywords that you use, as a soft error page. And, basically, it prompts Caffeine to stop processing those pages.”

4. Your page is of low quality.

One of the most important ranking signals for Google is content quality. 

Over the years, Google introduced many algorithm changes to highlight how crucial it is for pages to create content that is: 

  • unique, 
  • comprehensive, 
  • relevant, 
  • truthful, and 
  • provides value to the user.   

That’s why we shouldn’t expect Google to index content that doesn’t follow these guidelines.

Moreover, if Google sees some of your low-quality content, it may view the whole website as low-quality and, subsequently, limit its crawling and indexing.

Usually, a page with low-quality content will be classified as: 

There are a few ways to tackle low-quality content issues on your site – consider:

  • Consolidating a few pieces of content, categories, or pages into one,
  • Rewriting and updating articles containing outdated or insufficient information,
  • Making low-quality content non-indexable, e.g., by implementing noindex tags or blocking access to it in robots.txt.

5. The page has duplicate content.

This is related to the previous point about low-quality content, but this issue refers to multiple pages containing the same or very similar content.

A page with duplicate content likely won’t be indexed in Google.  

The main dangers of having a lot of duplicate content on your site include:

  • You don’t know which page Google chooses to index and show in SERPs,
  • You give Google many more pages to crawl,
  • Ranking signals can be split between a few pages. 

Some examples of duplicate content include:

  • Generic product descriptions copied from other pages,
  • Pages created by filters with added parameters,
  • Different URL structures for the same content, e.g., www and non-www versions.

Duplicate content is a common indexing issue on eCommerce or other large websites, and it’s particularly severe for them. 

Usually, duplicate content will be classified by Google as:

  • Alternate page with proper canonical tag – this URL is a duplicate of a canonical page marked by the correct tag, and it points to the canonical page. Usually, you don’t need to do anything,
  • Duplicate without user-selected canonical – you haven’t selected the canonical version for a page, so make sure you choose it,
  • Duplicate, Google chose different canonical than user – you have selected a canonical page, but Google chose a different one. It could occur if Google views another page as representative of given content or hasn’t found enough signals pointing to the URL you selected.

There are two most common solutions to tackle duplicate content issues

6. Your pages are slow.

Having a slow website can negatively impact user experience, but it could also lead to indexing issues.

Let me elaborate: 

  • If your website is slow because of your web hosting, Google may crawl less and thus index fewer pages. 
  • When rendering your website is slow, it can negatively affect your crawl speed. As we read in Google’s documentation, “making your pages faster to render will also increase the crawl speed.” 

The critical aspect of improving your site’s performance with Google’s crawling and indexing processes in mind is optimizing your server. 

If your website is visibly slow for users who interact with it – for example, it fails the Core Web Vitals assessment – it’s still a problem that requires your attention. 

But what you should focus on is whether your server can handle Google’s crawl requests. For example, when you add new content and Google’s crawling increases, you may find that this content isn’t indexed because the server slowed down.

You need to make sure your website can handle traffic spikes from Google to be crawled and indexed at high rates.

7. There is an indexing bug on Google’s side.

Google is probably one of the most advanced systems in the world, and it has been actively (and successfully) maintained for over 20 years now.

However, every software has bugs. And some bugs on Google’s end can cause your pages to not be indexed or to be reported as such.

An example of a widely noticed Google indexing bug happened in October 2020:

It took Google 2 weeks to fix the bug and similar bugs happen from time to time. 

Here are some tweets detailing another internal Google error with reporting indexing issues:

Recently, I spotted another Google bug concerning URLs that should be indexed and have been visited by Google but remain stuck with the “Discovered – currently not indexed” status.

8.  Your page or website is too new.

No content is indexed immediately. In many situations, your pages will end up being indexed, but it will take some time.  

As John Mueller stated

“When a new page is published on a website, it can take anywhere from several hours to several weeks for it to be indexed. In practice, I suspect most good content is picked up and indexed within about a week.”

Two factors cause such indexing delays: 

  1. It takes time for Google to discover a new page.
  2. It takes time for a page to get to the top of Google’s crawling queue.  

Usually, a URL in Google’s crawling queue will be classified as Discovered, currently not indexed.

You may also experience delays in crawling and indexing content if you publish it on a new website.

In the last couple of months, I’ve seen a lot of posts written by SEOs that Google is not willing to index content on new websites. 

The pattern was so clear that I decided to ask other SEOs if they were facing similar issues:

Here is what Google’s documentation tells us:

“If your site or page is new, it might not be in our index because we haven’t had a chance to crawl or index it yet. It takes some time after you post a new page before we crawl it, and more time after that to index it. The total time can be anywhere from a day or two to a few weeks, typically, depending on many factors.”

In some cases, it’s specific sections of your site that aren’t indexed. It could occur if Google visited a few URLs from a section, assessed them as low-quality content, and put the whole section in the crawling queue. 

So, what can you do if your page is still not indexed after a few weeks?

If you have a new website, make sure you implement internal linking to show Googlebot which URLs are the most valuable. The links should reflect the importance of each page and how they relate to each other.

Also, don’t forget to ensure all your valuable URLs have been added to your sitemaps. And, though you’ve probably heard it enough: prioritize the quality and uniqueness of your content.

9.  Google refused to visit the page.

Google sometimes refuses to visit a page because it thinks it’s not worth crawling and indexing it.  

This may be the result of two things:

  1. Google is not convinced to visit a specific page because the page lacks relevant signals. For instance, if no links point to a given page, Google likely won’t visit and index your page. Another signal would be if Google couldn’t find the page in your sitemap.
  2. Google is not convinced to visit those URLs because they fall into a specific URL Pattern. For example, Google recognizes a given page pattern as related to some previously visited pages. It could be pages with duplicate content or, for example, author or user profiles. If Google sees other pages that appear to follow this pattern, it doesn’t need to waste time and resources crawling them.

Such URLs can be classified as Discovered, currently not indexed.  

Quoting Sam Marsden’s notes from Google Office Hours on May 30th, 2017:

“Google tries to establish URL patterns to focus on crawling important pages and choose which ones to ignore when crawling larger sites. This is done on a per-site basis and they don’t have platform-specific rules because they can be customised with different behaviour.”

Your next steps here revolve around solutions that I mentioned in other chapters:

  • Make sure your indexable content that Google sees is of high quality,
  • Implement well-planned internal linking with a focus on your most important pages,
  • Optimize your sitemaps to only contain valuable URLs.

Wrapping up

You can now see that some indexing issues may have little to do with your website and more with Google’s limited resources and bugs or errors. 

However, in most cases, your pages may be lacking quality or sufficient signals to get indexed. It’s also possible that you are preventing Googlebot from accessing some pages that should be indexed.

 Always remember to:

  • Maintain a sitemap containing only valuable URLs,
  • Know which URLs shouldn’t be crawled (disallow them in robots.txt) or indexed (make them non-indexable with noindex tags) – for more information, check out this guide on creating an indexing strategy,
  • Add correct canonical tags to specify main versions of pages and manage duplicate content,
  • Optimize your site architecture and create an informative internal linking structure.

3. Page is classified by Google as soft 404.

Google uses many tools to ensure the web pages it shows in search results are of the highest quality and provide a positive user experience. One of the tools that Google utilizes is a soft 404 detector. If a page is detected as soft 404, it won’t get into Google’s index.

Soft 404s are not official response codes on websites. A 404 page returns a correct 200 status code, but its contents make it look like an error page, e.g. because it’s empty or contains thin content – or so Google thinks.

Barry Adams shared a very interesting example where a website’s pages were interpreted as soft 404s:

As you can see, a soft 404 detector, like every mechanism, is prone to false positives. It means that your pages may be wrongly classified.

Google could wrongly classify your pages in a few cases:

  1. Google can’t properly render your JavaScript content. Ensure you’re not blocking JavaScript in robots.txt and that Googlebot can render your crucial resources.
  2. Google found some words it typically associates with soft 404 pages, such as: “page not found” or “product unavailable.” In this case, adjust your copy. Depending on the situation, you may want to redirect such pages or make them 404s.
  3. The page should be a 404 page that mistakenly responds with a 200 status code. This could be the case if you decide to create a custom 404 page. You need to configure your server to respond with a 404 status code.
  4. A redirect has been implemented, but the target page isn’t thematically connected to the origin page. Redirect it to the closest matching alternative.

Thank you for your time. What could you do next?

  1. Read more on our blog where we share our R&D findings, tips.
  2. Sign up and use Ziptie for 30 days for free.
  3. Contact us if you would like to talk in detail about that topic.
  4. Be a fellow SEO colleague and share that article: Facebook, LinkedIn, Twitter