Crawling refers to the process in which a program or bot (such as search engine bots) browse through the web, starting from known websites in search of new or updated content. The content can vary from web pages to videos, images, etc., but regardless, they’re discovered by the bots through links.
Google bots, for instance, start the process with several known web pages and find new URLs by following the links on these pages. As they follow these links, they find new content, which they add to their index to be retrieved later when a user seeks information related to the content. Google has a seed list – a list of trusted sites with links to many other websites – where the bot starts the crawling process.
Internet crawling is an unending process for search engines. They have to find newly published pages and old pages that are updated. Yet, they don’t want to spend their resources and time on web pages that aren’t good enough for search results. So, there’s a priority for crawling pages.
Google prioritizes this process based on:
- Page popularity (how often it is linked)
- High-quality content
- Frequent updates
Newly published websites with high-quality content are given more priority.