SEO

Table of contents
  1. Resources
  2. Tools
  3. Minimum requirements
    1. URL Inspection tool
  4. URL best practices
    1. Issues that lead to high number of URLs

Resources

Tools

Minimum requirements

For Googlebot to index your site, the following things should be met

  1. Googlebot should not be blocked. If a page is private i.e. requires to log-in to view, Googlebot will not crawl it.
  2. Only pages that return HTTP 200 status code are indexed.
  3. The content on the page should be of valid type and not violate spam policies, for it to be indexed.

URL Inspection tool

URL best practices

RFC 3986 defines the URL standard.

  • Use simple, descriptive words, and the words should be in the audience’s language. Example of German URL
    https://example.com/lebensmittel/pfefferminz
    
  • Use UTF-8 encoding as necessary.

    Example of Arabic characters with UTF-8

    https://example.com/%D9%86%D8%B9%D9%86%D8%A7%D8%B9/%D8%A8%D9%82%D8%A7%D9%84%D8%A9
    
  • Do not use non-ASCII characters.
  • Do not add unreadable, long ID numbers in URL (tihs includes session ID). Do not do this
    https://example.com/index.php?id_sezione=360&sid=3a5ebc944f41daa6f849f730f1
    
  • Do not use URI fragments to change content of page, as URL fragments are not supported.

    Do not do this

    https://example.com/#/potatoes
    
  • For multi-regional websites, use locale-specific URLs.
    https://example.de
    https://example.com/de/
    
  • Use hypens to separate words in URLs and help users to identify concepts in the URL more easily.
  • Do not use underscores.
  • Do not combine words in URL, instead separate them using hyphen.
  • To specify query parameters use = to separate key-value pairs and add additional parameters with &. To specify multiple values for a key use ,.
    https://example.com/category?category=dresses&color=purple,pink,salmon&sort=low-to-high&sid=789
    

Issues that lead to high number of URLs

The following can cause high number of URLs to be generated, which can cause Googlebot to not completely index all the content on your site.

  • Additive filtering of a set of items. For example on sites you can provide latitude, longitude as search parameters to get more refined results.
  • Dynamic generation of documents, which can cause small changes in the document like timestamps, or advertisements.
  • Problematic parameters in the URL, like Session IDs.
  • Sorting parameters in the URL.
  • Irrelevant parameters in the URL, such as referral parameters.
  • Infinite calendar, will have links to future and previous dates with no restrictions on start and end dates.
  • Broken relative links can often cause infinte spaces (like inifinte calendar).

Steps to resolve

  • Use robots.txt to ignore these URLs.
  • Shorten URLs by trimming unnecessary parameters.
  • For infinite calendar, add nofollow attribute to links to dynamically created future calendar pages.