{"id":1804,"date":"2019-12-11T13:51:52","date_gmt":"2019-12-11T13:51:52","guid":{"rendered":"https:\/\/www.seotesteronline.com\/?p=1804"},"modified":"2021-05-19T13:05:41","modified_gmt":"2021-05-19T13:05:41","slug":"crawling","status":"publish","type":"post","link":"https:\/\/www.seotesteronline.com\/blog\/seo-basics\/crawling\/","title":{"rendered":"What is crawling and why is it crucial for SEO?"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">To understand SEO and its dynamics, it is crucial knowing how a search engine <strong>analyzes<\/strong> and <strong>organizes<\/strong> the pieces of information it collects.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">One of the fundamental processes that make search engines to index content is the so-called\u00a0<\/span><strong>crawling<\/strong><i><span style=\"font-weight: 400;\">.\u00a0<\/span><\/i><span style=\"font-weight: 400;\">By this term, we mean the work the\u00a0<\/span><i><span style=\"font-weight: 400;\">bot\u00a0<\/span><\/i><span style=\"font-weight: 400;\">(also called\u00a0<\/span><i><span style=\"font-weight: 400;\">spider<\/span><\/i><span style=\"font-weight: 400;\">) does when it scans a webpage.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">How crawling works<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The search engines use <strong>crawling<\/strong> to access, discover, and scan pages around the web.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When they explore a website, they visit all the links contained in it and follow the instructions included in the\u00a0<\/span><i><span style=\"font-weight: 400;\">robots.txt <\/span><\/i><span style=\"font-weight: 400;\">file. In this file, you can find the <em>directions<\/em> for the search engine on how it should &#8220;crawl&#8221; the website.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Through the robots.txt file, we can suggest the search engine to ignore particular resources within our website. Through the <strong>sitemap<\/strong> (i.e., the list of the site URLs), instead, we can help the crawler navigate our website, providing it with a map of its resources.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Crawlers use <strong>algorithms<\/strong> to establish the <strong>frequency<\/strong> with which they scan a specific page and how many pages of the website it must scan.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">These algorithms help crawlers to tell a frequently updated page from one that doesn&#8217;t change over time: the crawler would scan the first one more regularly. A key concept, from this point of view, is the crawl budget.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Crawling of images, audio and video files<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Usually, search engines don&#8217;t scan and index every URL address that they meet on their way.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We are not saying that crawlers are not capable of interpreting content other than text files. They are, and they&#8217;re getting better at it. Anyway, it&#8217;s better to use <strong>filenames<\/strong> and <strong>metadata<\/strong> to help search engines reading, indexing, and ranking the content on the SERP.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Crawlers on links, sitemaps, and submit pages<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Crawlers discover new pages by scanning the already-existing ones and extracting the links to other pages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The crawler adds the addresses to the yet-to-be-analyzed file list and, then, the bot will download them.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this process, search engines will always find new webpages that, in their turn, will link to other pages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Another way search engines have to find new pages is to scan sitemaps. As we said before, a sitemap is a list of scannable URLs.<\/span><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-2914\" src=\"https:\/\/www.seotesteronline.com\/wp-content\/uploads\/2020\/01\/sitemap-xml-example-seo-tester-online.png\" alt=\"Sitemap xml example\" width=\"569\" height=\"257\" srcset=\"https:\/\/www.seotesteronline.com\/wp-content\/uploads\/2020\/01\/sitemap-xml-example-seo-tester-online.png 569w, https:\/\/www.seotesteronline.com\/wp-content\/uploads\/2020\/01\/sitemap-xml-example-seo-tester-online-300x136.png 300w\" sizes=\"(max-width: 569px) 100vw, 569px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">A third way is to send the URLs to search engines manually. It is a solution for telling Google that we have new content without waiting for the next scheduled scan.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We can ask for a scan via <a href=\"https:\/\/search.google.com\/search-console\/\" target=\"_blank\" rel=\"noopener noreferrer\">Google Search Console<\/a>.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, you should resort to it only when you want to make Google scan several pages (imagine submitting one page at the time). Conversely, Google prefers XML sitemaps for large URL volumes.<\/span><\/p>\n<p>How do Search Engines work?<\/p>\n<p>For sure, Search Engines are fascinating. Their algorithms are more and more complex each day and it&#8217;s not easy (sometimes even impossible) to fully understand how they work.<\/p>\n<p>If you want to know more, we suggest you read our article about the <a href=\"https:\/\/www.seotesteronline.com\/blog\/seo-basics\/crawling-indexing-ranking\/\">Crawling, Indexing and Ranking<\/a> phases of Search Engines.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Crawling of a website. What&#8217;s this? How does it work? Find out now in this short article!<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[131],"tags":[],"acf":[],"lang":"en","translations":{"en":1804,"it":1758},"pll_sync_post":[],"_links":{"self":[{"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/posts\/1804"}],"collection":[{"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/comments?post=1804"}],"version-history":[{"count":14,"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/posts\/1804\/revisions"}],"predecessor-version":[{"id":4546,"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/posts\/1804\/revisions\/4546"}],"wp:attachment":[{"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/media?parent=1804"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/categories?post=1804"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.seotesteronline.com\/wp-json\/wp\/v2\/tags?post=1804"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}