{"id":12922,"date":"2022-07-06T09:15:52","date_gmt":"2022-07-06T09:15:52","guid":{"rendered":"https:\/\/kwebby.com\/blog\/?p=12922"},"modified":"2025-01-03T10:49:48","modified_gmt":"2025-01-03T10:49:48","slug":"robotstxt-for-seo","status":"publish","type":"post","link":"https:\/\/kwebby.com\/blog\/robotstxt-for-seo\/","title":{"rendered":"Robots.txt for SEO in 2025 \u2013 A Comprehensive Guide"},"content":{"rendered":"\n<p>Well-structured robots.txt files help to direct those search engine bots to make their work smoother. Having control over the search engine can manipulate who crawls and indexes your sites.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">If you want your site to rank in the top lists of search engine results then you\u2019ll have to make it easy for search engine bots to explore it efficiently. You can take control of your search engine and yes it is possible!<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">This is one of the simplest files for websites, but if you did it correctly then it will be a boon, if you messed up with it, will cause chaos on your SEO and disturb all your search engines from accessing important content of your website.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">To make it stable in condition, the robots.txt file helps you out. The crawling behaviour is sometimes known as <a href=\"https:\/\/www.merriam-webster.com\/dictionary\/spidering\" rel=\"doFollow noopener\" target=\"_blank\">\u2018spidering\u2019<\/a> which may sound funny!&nbsp;&nbsp;<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Robots.txt is a text file which resides within your root directory that informs robots, dispatched by search engines, which pages to crawl and which to just overlook.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">By this, you have a little idea about how powerful this tool is! If you utilise it correctly then it will increase <a href=\"https:\/\/www.wordstream.com\/crawl-frequency\" target=\"_blank\" rel=\"noreferrer noopener\">crawl frequency<\/a> which can impact SEO efforts.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">So, How to use this file? What to avoid? How does this work? Here is the blog to find the answers to your questions.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is a robots.txt file?<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"612\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-13.57.16@2x-1024x612.png\" alt=\"\" class=\"wp-image-12925\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-13.57.16@2x-1024x612.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-13.57.16@2x-300x179.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-13.57.16@2x-768x459.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-13.57.16@2x.png 1184w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">When you create a new website, search engines will send their search bots to crawl through your website and make a <a href=\"https:\/\/developers.google.com\/search\/docs\/advanced\/sitemaps\/build-sitemap\" rel=\"noreferrer noopener\" target=\"_blank\">sitemap<\/a> of what that page contains. In this way, they\u2019ll know what exact pages to show when someone searches using the related keywords.&nbsp;<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\"><a href=\"https:\/\/www.bruceclay.com\/wp-content\/uploads\/2022\/03\/bruceclayinc-robots-exclusion-guide-2022.pdf\" rel=\"noreferrer noopener\" target=\"_blank\">Robot exclusion protocol<\/a> (REP), is a set of web standards that regulates how robots crawl to the web, index content and serve the content to users and robots.txt is a part of it. Robots.txt is an execution of this protocol that delineates the guideline that every robot should follow, including Google bots.&nbsp;<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Be careful while making changes in the robots.txt file, that file must have the potential to make big parts of your <a href=\"https:\/\/kwebby.com\/blog\/wordpress-sitemap\/\" data-wpil-monitor-id=\"1135\">website inaccessible to search engines<\/a>. a robots.txt is only valid if you have a full domain, including <a href=\"https:\/\/www.merriam-webster.com\/dictionary\/protocol\" rel=\"noreferrer noopener\" target=\"_blank\">protocol<\/a>.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">You can also take a quick sneak peek of any website by typing any URL and adding\/robots.txt file at the end, simple.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"478\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-13.57.50.gif\" alt=\"\" class=\"wp-image-12926\" title=\"\"><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">What does the robots.txt file do?<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/robots.txt-1024x1024.png\" alt=\"\" class=\"wp-image-12927\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/robots.txt-1024x1024.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/robots.txt-300x300.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/robots.txt-150x150.png 150w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/robots.txt-768x768.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/robots.txt-24x24.png 24w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/robots.txt-48x48.png 48w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/robots.txt-96x96.png 96w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/robots.txt.png 1080w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Robots.txt file controls the unnecessary crawler to get access to certain areas of your sites. If you ever disallow <a href=\"https:\/\/developers.google.com\/search\/docs\/advanced\/crawling\/googlebot\" rel=\"noreferrer noopener\" target=\"_blank\">Google bots<\/a> to crawl your site this might lead to extreme danger.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Some of the uses are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Preventing duplicate content that appears in <a href=\"https:\/\/www.wordstream.com\/serp\" target=\"_blank\" rel=\"noreferrer noopener\">SERPs<\/a> by using meta robots.<\/li>\n\n\n\n<li>Specifying the location of sitemaps<\/li>\n\n\n\n<li>Keeping the entire site safe and private<\/li>\n\n\n\n<li>Preventing search engines from indexing some files on your site<\/li>\n\n\n\n<li>Keep your internal search results pages from showing to the public<\/li>\n\n\n\n<li>Tells search engine crawls which URLs crawl to can access your sites\u00a0<\/li>\n<\/ul>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">In your site, if there is no area you want to control user agents&#8217; access, then you\u2019ll not need a robots.txt file.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Where should I put or find my robots.txt file?<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Robots.txt is a text-based file, so it is super easy to create. You can simply create it on your notepad, like a text editor. Open that sheet and just save it as a robots.txt file.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Then log in to your cPanel and locate a public HTML folder to access your site&#8217;s root directory. After that drag those files into it.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"669\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.11.48@2x-1024x669.png\" alt=\"\" class=\"wp-image-12928\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.11.48@2x-1024x669.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.11.48@2x-300x196.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.11.48@2x-768x501.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.11.48@2x.png 1452w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Then ensure that you have the correct permission for your file. As an owner, you need to read, write and execute the file and no one can get control over it.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">A file will display the \u201c644\u201d permission code. If it is not showing this code then you\u2019ll have to change it through file permission.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"669\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.12.43@2x-1024x669.png\" alt=\"\" class=\"wp-image-12929\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.12.43@2x-1024x669.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.12.43@2x-300x196.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.12.43@2x-768x501.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.12.43@2x.png 1452w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Now yes, your robots.txt file is ready!&nbsp;<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">How you can find your robots.txt file is another question. For that, you have to open FTP cPanel and there you&#8217;ll find the file of your public_html website directory.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Once you open the file you\u2019ll be greeted by some message, which will look like this.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How big sites use robots.txt?<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Some popular sites use robots.txt to guide the search engine crawlers that what they should do and what not.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">One of the examples is Facebook.com; if you open their <a href=\"https:\/\/facebook.com\/robots.txt\" rel=\"noreferrer noopener\" target=\"_blank\">robots.txt <\/a>file, You will see how they want different bots to navigate their website and for the most part they disallow various section of their website like &#8220;share&#8221;, &#8220;plugins&#8221; etc;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"600\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.13.49@2x-1024x600.png\" alt=\"\" class=\"wp-image-12930\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.13.49@2x-1024x600.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.13.49@2x-300x176.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.13.49@2x-768x450.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.13.49@2x-1536x900.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.13.49@2x-2048x1199.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Another example is Amazon.com; if you try to <a href=\"https:\/\/www.amazon.com\/robots.txt\" rel=\"noreferrer noopener\" target=\"_blank\">access their robots.txt<\/a> file, You will see different patter they use if we compare to the Facebook above, they use a global variable i.e. * for all bots and direct all of them to disallow certain directories which are mentioned in their file;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"599\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.14.03@2x-1024x599.png\" alt=\"\" class=\"wp-image-12931\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.14.03@2x-1024x599.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.14.03@2x-300x176.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.14.03@2x-768x449.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.14.03@2x-1536x899.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.14.03@2x-2048x1198.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Pros and cons of using robots.txt<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">There are some pros and cons of using the robots.txt file. Pros are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Help to prevent duplicate content<\/li>\n\n\n\n<li>If you are reworking your websites, you can use the robots.txt file to hide unfinished pages from being indexed.<\/li>\n\n\n\n<li>If you want to tell search engines where your sitemap is located.<\/li>\n\n\n\n<li>It helps to manage what pages are indexed and ignored by search engines.<\/li>\n\n\n\n<li>You can block any non-SEO pages and add a URL to your sitemap.\u00a0<\/li>\n<\/ul>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Cons are :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>It may not be supported by all search engines.<\/li>\n\n\n\n<li>Give the attacker the location of the site\u2019s directory and private data.<\/li>\n\n\n\n<li>If you set up your setting incorrectly, then it causes search engines to delete all indexed data.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Robots.txt syntax<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">A robots.txt file is made up of different directives and each of them begins with user agents. <\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">There are various search engines and different search engines interpret them differently. <\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">First, matching directives always wins, but the chance with Bing and Google is always high!<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">We can say robots.txt <a href=\"https:\/\/www.merriam-webster.com\/dictionary\/syntax\" target=\"_blank\" rel=\"noreferrer noopener\">syntax<\/a> is the language of the robots.txt file. There are five common terms you may come across &#8211; user agents, allow, disallow, crawl delay and sitemap. <\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Sitemap directive tells search engines where to find an <a href=\"https:\/\/yoast.com\/what-is-an-xml-sitemap-and-why-should-you-have-one\/\" target=\"_blank\" rel=\"noreferrer noopener\">XML sitemap<\/a>.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The user-agents<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Each <a href=\"https:\/\/kwebby.com\/blog\/privacy-focused-search-engines\/\" data-wpil-monitor-id=\"413\">search engine identifies itself by user agents as Google<\/a> identifies as Googlebot, Yahoo identifies as Slurp and Bing as BingBot and the list goes on. User agents mean a specific crawler bot to which you\u2019re instructing, this can be a search engine.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Any program on the internet has a &#8216;user agent&#8217; or we can say the assigned name. For human users, this contains information like operating system version and browsing type that does not contain personal information.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">If you see in the first few lines in the block, that is user agents. It will pinpoint specific bots. User agents identify which crawler the rules apply to.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The most common user agents for search engine spiders<\/h3>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">There are thousands of <a href=\"https:\/\/www.cloudflare.com\/learning\/bots\/what-is-a-web-crawler\/\" rel=\"noreferrer noopener\" target=\"_blank\">web crawlers<\/a> and user agents wandering the internet, but these are some most common ones:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Googlebot<\/strong>: one of the most popular ones, used for gathering the web page information used to supply SERP.<\/li>\n\n\n\n<li><strong>Bingbot<\/strong>: was created by Microsoft in 2010 to supply information to their Bing search engine.<\/li>\n\n\n\n<li><strong>Slurp Bot<\/strong>:\u00a0yahoo search results have come from the Yahoo web crawler Slurp, which collects content from partner sites<\/li>\n\n\n\n<li><strong>Duckduckbot<\/strong>: web crawler for DuckDuckGO, known for privacy<\/li>\n\n\n\n<li><strong>Baidspider<\/strong>: Chinese Baidu search engine\u2019s web crawling spiders, crawl web pages and gives an update to the Baidu index<\/li>\n\n\n\n<li><strong>Yandex Bot<\/strong>: it is the largest Russian search engine crawler, many different strings user agents can show, <a href=\"https:\/\/yandex.com\/support\/search\/robots\/user-agent.html\" target=\"_blank\" rel=\"noreferrer noopener\">\u00a0tap here<\/a><\/li>\n\n\n\n<li><strong>Sogou Spider<\/strong>: it is a leading Chinese search engine launched at 2004<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">The disallow directive<\/h3>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">If you see any second line in a block of directives then it&#8217;s called a <a href=\"https:\/\/www.deepcrawl.com\/knowledge\/hangout-library\/disallow\/\" rel=\"noreferrer noopener\" target=\"_blank\">Disallow directive<\/a>. You can use this to specify that user agents are not allowed to crawl this URL.&nbsp;<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Here&#8217;s an example of a The disallow directive Robots.txt file:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/cgi-bin\/<\/p>\n\n\n\n<p>Disallow: \/admin\/<\/p>\n\n\n\n<p>Disallow: \/private-directory\/<\/p>\n<\/blockquote>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">It means these particular areas are disallowed for bots. You can simply tell search engines not to access certain areas, files or pages.<strong>&nbsp;<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The allow directive<\/h3>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Some leading crawlers support an Allow directive, which can compensate for disallow directive. You can use this to specify that user agents are allowed to crawl this URL. It&#8217;s only applicable to <a href=\"https:\/\/raddinteractive.com\/2828-2\/\" rel=\"noreferrer noopener\" target=\"_blank\">Google bots<\/a>.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Here&#8217;s an example of a The allow directive Robots.txt file:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/cgi-bin\/<\/p>\n\n\n\n<p>Allow: \/cgi-bin\/forum\/<\/p>\n<\/blockquote>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">It is used to counteract a disallowed directory. It is supported by Bing and Google. Bing uses allow and disallow directive, whichever is more distinct, based on length, like Google.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The noindex directive<\/h3>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">The noindex directive is not a part of the Robots.txt standard, but it&#8217;s widely used by <a href=\"https:\/\/kwebby.com\/blog\/web-3-0-affect-business\/\" data-wpil-monitor-id=\"414\">webmasters<\/a>. It is a meta tag that can be placed in the HTML header of a page to instruct search engines not to index the page.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Here&#8217;s an example of a The noindex directive Robots.txt file:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/cgi-bin\/<\/p>\n\n\n\n<p>Noindex: \/private.html<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">The nofollow directive<\/h3>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">The Robots <a href=\"https:\/\/kwebby.com\/blog\/meta-tags-seo\/\" data-wpil-monitor-id=\"1136\">META Tag<\/a> has an attribute called &#8220;nofollow&#8221; that can be used to instruct some search engines that a hyperlink should not influence the link target&#8217;s search engine ranking.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Here&#8217;s an example of a The nofollow directive Robots.txt file:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/cgi-bin\/<\/p>\n\n\n\n<p>Nofollow: \/private.html<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">The sitemap directive<\/h3>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">The Sitemap directive is not part of the Robots.txt standard, but it&#8217;s widely supported by all major search engines. It allows webmasters to include a sitemap location in Robots.txt file.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Here&#8217;s an example of a The sitemap directive Robots.txt file:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/cgi-bin\/<\/p>\n\n\n\n<p>Sitemap: http:\/\/www.example.com\/sitemap.xml<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">The host directive<\/h3>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">The Host directive is not part of the Robots.txt standard, but it&#8217;s supported by all major search engines. It allows webmasters to specify the preferred domain name for a site that has multiple domain names.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Here&#8217;s an example of a The host directive Robots.txt file:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/cgi-bin\/<\/p>\n\n\n\n<p>Host: www.example.com<\/p>\n<\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">The crawl-delay directive<\/h3>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Crawl-delay directive means how many seconds the crawler can wait to load any content and crawl web page content. Yahoo, Yandex and Bing are good when it comes to crawling and they respond to <a href=\"https:\/\/www.contentkingapp.com\/academy\/robotstxt\/faq\/crawl-delay-10\/\" rel=\"noreferrer noopener\" target=\"_blank\">crawl-delay<\/a> directives.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Here&#8217;s an example of a crawl delay directive Robots.txt file:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/cgi-bin\/<\/p>\n\n\n\n<p>Crawl-delay: 10<\/p>\n<\/blockquote>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">This means you can make the search engine wait for 5 seconds to crawl your site or 5 seconds before re-accessing your site, it is the same, but just has slight differences depending upon the search engine.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Googlebot does not respond to this directive, but the crawl rate can be set in <a href=\"https:\/\/support.google.com\/webmasters\/answer\/9128668?hl=en\" rel=\"noreferrer noopener\" target=\"_blank\">Google Search Console<\/a>. It says that, avoid using crawl-delay directives for search engines.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How to use wildcard\/ regular expressions<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">So now we understand what Robots.txt file is and where you should put your file and their agents too. But now you might have questions like,\u2019\u201d I have my eCommerce website and I would like to disallow all the pages which contain question marks or any other unwanted signs in their URLs\u201d.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Here then comes our wildcard to help. There are some things you have considered. First, you don&#8217;t need to append wildcards to every string in your robots.txt file and the second thing is you have to know that there are two types of wildcards:<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\"><strong>*wildcard<\/strong>: The *wildcards will simply match any sequence character. This type of wildcard will be great for URLs which have the same pattern.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Example of Using *wildcard :<\/p>\n\n\n\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/*?<\/p>\n<\/blockquote>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">The above Robots.txt will disallow all the pages which contain question mark in their URLs. So, if your website has a lot of pages with question mark then this Robots.txt will be good for you. But what if your website has some pages with question mark and some pages without question mark?<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\"><strong>$wildcard:<\/strong> The $wildcard is used to show the end of the URL. If you see your robots are not allowing the bots from accessing the PDFs, then this wildcard might come in handy.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Example of using $Wildcard:<\/p>\n\n\n\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/*.pdf$<\/p>\n<\/blockquote>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">This Robots.txt will help you to disallow bots from accessing all the PDFs of your website. So, this is how you can use Robots.txt to control what search engine bots can and cannot access your site. Robots.txt is a helpful tool that can be used to improve your website&#8217;s SEO.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Wildcards are not just used for defining the user agents, but also can be used to match URLs.&nbsp;Wildcards are supported by Google, Yahoo and Ask. Remember, it&#8217;s always great to double-check your robots.txt wildcards before publishing them.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Validate your robots.txt<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">If you are using robots.txt then you have to test it in the robots.txt tester for whether it is blocking Google web crawls from any URLs on your sites. <\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">You can use the <a href=\"https:\/\/www.google.com\/webmasters\/tools\/robots-testing-tool\" target=\"_blank\">robots.txt tester tool <\/a>to check whether the Googlebot &#8211; Image crawler can crawl the file you want to block from Google Image Search.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"469\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.19.02@2x-1024x469.png\" alt=\"\" class=\"wp-image-12933\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.19.02@2x-1024x469.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.19.02@2x-300x138.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.19.02@2x-768x352.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.19.02@2x-1536x704.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.19.02@2x-2048x939.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">You can use any program that produces the text file or you can simply use an available option that is <a href=\"https:\/\/www.bigcommerce.com\/ecommerce-answers\/what-google-webmaster-tools\/\" target=\"_blank\" rel=\"noreferrer noopener\">Google Webmaster<\/a>. Once you create your robots.txt file then add this to your top directory of the server. After that make sure you set the correct permission for the visitors to read anytime.<\/p>\n\n\n\n<p>You can also test if your URL is blocked by Googlebot or not by using the form just below the tool;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"311\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.21.58@2x-1024x311.png\" alt=\"\" class=\"wp-image-12934\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.21.58@2x-1024x311.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.21.58@2x-300x91.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.21.58@2x-768x233.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.21.58@2x-1536x467.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.21.58@2x-2048x622.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Final code<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">The Robots.txt file is important for all websites, especially eCommerce sites which have a lot of products and categories. If you don&#8217;t have a Robots.txt file or if it&#8217;s not correctly configured, search engines might index all your pages, which can result in lower rankings because of duplicate content.A Robots.txt file should be placed in the root directory of your website. The Robots.txt file is a text file and must be named &#8220;robots.txt&#8221;.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Here&#8217;s an example of a Robots.txt file:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>User-agent: *<\/p>\n\n\n\n<p>Disallow: \/cgi-bin\/<\/p>\n\n\n\n<p>Disallow: \/tmp\/<\/p>\n\n\n\n<p>Disallow: \/~joe<\/p>\n\n\n\n<p>\/<\/p>\n\n\n\n<p>Allow: \/~joe\/public_html\/<\/p>\n\n\n\n<p>Sitemap: http:\/\/www.example.com\/sitemap.xml<\/p>\n<\/blockquote>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">The above Robots.txt file will block all bots from the \/cgi-bin\/, \/tmp\/ and \/~joe\/ directories, and it will allow the bot to crawl the pages in the \/~joe\/public_html\/ directory. It will also tell the bots where to find your sitemap.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">If you want to learn more about Robots.txt or if you need help creating a Robots.txt file for your website, we recommend that you contact a professional SEO company.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Generate robots.txt file using our tool Kwebby Robot TXT Generator<\/h2>\n\n\n\n<p>You can generate your Robots.txt file using our very own <a href=\"https:\/\/kwebby.com\/robots-txt-generator\">Robots.txt File Generator tool<\/a>;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"548\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.25.15@2x-1024x548.png\" alt=\"\" class=\"wp-image-12935\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.25.15@2x-1024x548.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.25.15@2x-300x160.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.25.15@2x-768x411.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.25.15@2x-1536x822.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.25.15@2x-2048x1096.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Here are the options you need to consider;<\/p>\n\n\n\n<p><strong>Option 1<\/strong>: All robots are either allowed or refused, If you have a private app which don&#8217;t have information available for public then you can select &#8220;Refused&#8221; otherwise you can allow it;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"452\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.31.21@2x-1024x452.png\" alt=\"\" class=\"wp-image-12936\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.31.21@2x-1024x452.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.31.21@2x-300x132.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.31.21@2x-768x339.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.31.21@2x-1536x677.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.31.21@2x.png 1914w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\"><strong>Option 2<\/strong>: Crawl Delay option enables to delay the bots for some interval time to reduce server loading, If your site is big and gets lots of traffic then you should enable it for 10 seconds otherwise there&#8217;s no point of delaying it for the starter or medium sized websites;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"467\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.33.22@2x-1024x467.png\" alt=\"\" class=\"wp-image-12937\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.33.22@2x-1024x467.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.33.22@2x-300x137.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.33.22@2x-768x350.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.33.22@2x-1536x700.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.33.22@2x.png 2018w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>Option 3<\/strong>: This option allows you to include Sitemap URL of your website to let the robot knows where your sitemap is and crawl only the necessary URLs which is mentioned in the sitemap itself.<\/p>\n\n\n\n<p>Generally, Your sitemap file is situated at root folder i.e. example.com\/sitemap.xml or example.com\/sitemap_xml, copy the same URL and paste it in the option;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"464\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.36.00@2x-1024x464.png\" alt=\"\" class=\"wp-image-12938\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.36.00@2x-1024x464.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.36.00@2x-300x136.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.36.00@2x-768x348.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.36.00@2x-1536x696.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.36.00@2x.png 1774w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>Option 4<\/strong>: In this section, you can allow or disallow some search robots you don&#8217;t want to crawl your website. Just look at the list of bots and let select either &#8220;allow&#8221; or &#8220;refused&#8221; to enable or disable them to crawl your website;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"464\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.37.01@2x-1024x464.png\" alt=\"\" class=\"wp-image-12939\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.37.01@2x-1024x464.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.37.01@2x-300x136.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.37.01@2x-768x348.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.37.01@2x-1536x696.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.37.01@2x.png 1774w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>Option 5<\/strong>: Restricted directories let you disallow some directories which you don&#8217;t want bots to crawl, which is generally files URLs or private administrative directories i.e. &#8220;\/admin&#8221;, &#8220;\/login&#8221; etc.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"421\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.05@2x-1024x421.png\" alt=\"\" class=\"wp-image-12940\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.05@2x-1024x421.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.05@2x-300x123.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.05@2x-768x316.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.05@2x-1536x632.png 1536w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.05@2x.png 1712w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Now, you have an option to create or save the robots.txt file and paste it to your root folder.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"177\" src=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.56@2x-1024x177.png\" alt=\"\" class=\"wp-image-12941\" title=\"\" srcset=\"https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.56@2x-1024x177.png 1024w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.56@2x-300x52.png 300w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.56@2x-768x133.png 768w, https:\/\/kwebby.com\/blog\/wp-content\/uploads\/2022\/07\/CleanShot-2022-07-06-at-14.41.56@2x.png 1224w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Why is robots.txt important for SEO?<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Robots.txt file plays a vital role in SEO. A robots.txt file tells the search engine crawls which <a href=\"https:\/\/www.deepcrawl.com\/blog\/best-practice\/getting-urls-crawled\/\" rel=\"noreferrer noopener\" target=\"_blank\">URLs crawler<\/a> can have access on your site. This factor is used to avoid overloading your site. If you want to keep the page out of Google, then block indexing with no index or password to protect your page.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">A big part of SEO is sending the right signals at the right time to <a href=\"https:\/\/www.reliablesoft.net\/top-10-search-engines-in-the-world\/\" rel=\"noreferrer noopener\" target=\"_blank\">search engines<\/a>, and robots.txt is the best way to communicate your crawling preference to search engines.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">For SEO&#8217;s best results, make sure you are not <a href=\"https:\/\/kwebby.com\/blog\/block-content-from-google-news\/\" data-wpil-monitor-id=\"415\">blocking any of the content<\/a> that you want to be crawled. Search engines will cache the robots.txt content but usually update that content once a day. If you want this processor to get faster then you can submit your robots.txt URL to Google.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">Do not use <a href=\"https:\/\/moz.com\/learn\/seo\/robotstxt\" rel=\"noreferrer noopener\" target=\"_blank\">robots.txt<\/a> for any private data, because others can get direct access to your information, it may still get indexed. For that use some page passwords to protect your data. Robots.txt file is case sensitive, that&#8217;s why the file must be named in lower case like robots.txt, not a ROBOTS.TXT.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Other tips for robots.txt<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">We have talked about every detailed function of&nbsp;robots.txt and now let&#8217;s move a little deeper into it and understand how each may turn into an SEO disaster if not utilised properly.&nbsp;<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">You mustn&#8217;t block any good content that you want to present publicly by robots.txt or <a href=\"https:\/\/www.botify.com\/learn\/basics\/noindex\" rel=\"noreferrer noopener\" target=\"_blank\">noindex tag<\/a> and being crawled. If you do it wrongly, it might hurt SEO. And as we previously said don&#8217;t overuse crawl- delay as it leads to limiting your pages crawled by the bots.<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">This can be good for some websites but if you have a huge website then you are doing trouble by losing all solid traffic. Robots.txt file is case sensitive as mentioned early, you have to call it &#8216;robots.txt&#8217; in lower case otherwise, it won&#8217;t work!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Frequently Asked Questions (FAQs)<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Why Is the Robots.txt file shown as a Soft 404 Error in Google Search Console?<\/h3>\n\n\n\n<p>The question is asked on <a href=\"https:\/\/www.youtube.com\/watch?v=Fj42gKDQxYI&amp;t=765s\" target=\"_blank\" data-type=\"link\" data-id=\"https:\/\/www.youtube.com\/watch?v=Fj42gKDQxYI&amp;t=765s\" rel=\"noreferrer noopener\">SEO Work Hours August 2024 Edition<\/a>. It was noticed by the user that the <a href=\"https:\/\/kwebby.com\/blog\/where-to-place-robots-txt-google-suggests-this\/\" data-type=\"post\" data-id=\"22090\">robots.txt <\/a>URL had been shown as a <a href=\"https:\/\/kwebby.com\/blog\/google-soft-404-errors\/\" data-type=\"post\" data-id=\"22108\">Soft 404 error<\/a> on the Google search console. <\/p>\n\n\n\n<p>John Mueller Answered this;<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>This one&#8217;s easy. That&#8217;s fine. You don&#8217;t need to do anything. The robots.txt file generally doesn&#8217;t need to be indexed. It&#8217;s fine to have it be seen as a soft 404.<\/p>\n<\/blockquote>\n\n\n\n<p>Therefore, If you see such errors, then do not bother, its fine to be in soft 404 error. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can I Hide My Admin Login Page Using Robots.txt file from Search Crawlers?<\/h3>\n\n\n\n<p>You can do this effortlessly using Robots.txt, hide your login page from search crawlers, and prevent them from indexing. <\/p>\n\n\n\n<p>For example, if your admin login page is located at &#8220;www.example.com\/admin\/login&#8221;, you can add the following line in your Robots.txt file:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"Disallow: \/admin\/login\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #D8DEE9FF\">Disallow<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">\/<\/span><span style=\"color: #D8DEE9\">admin<\/span><span style=\"color: #81A1C1\">\/<\/span><span style=\"color: #D8DEE9\">login<\/span><\/span><\/code><\/pre><\/div>\n\n\n\n<p>This will prevent search engine crawlers from accessing and indexing your admin login page.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">So this is all you need to understand. To increase your site&#8217;s exposure, you have to ensure that search engines are crawling the most relevant data. Here, we see how a well-structured robots.txt file will enable you to direct how bots interact with your site. These are the ultimate guides for your robots.txt that we hope you master now!<\/p>\n\n\n\n<p id=\"cc256b63-6547-4343-b47a-b3de9c41a077\">&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Well-structured robots.txt files help to direct those search engine bots to make their work smoother. Having control over the search engine can manipulate who crawls&hellip;<\/p>\n","protected":false},"author":1,"featured_media":12942,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[277,100,3,4],"tags":[],"class_list":["post-12922","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-advanced-seo-techniques","category-blog","category-seo","category-tutorials"],"_links":{"self":[{"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/posts\/12922","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/comments?post=12922"}],"version-history":[{"count":4,"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/posts\/12922\/revisions"}],"predecessor-version":[{"id":22736,"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/posts\/12922\/revisions\/22736"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/media\/12942"}],"wp:attachment":[{"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/media?parent=12922"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/categories?post=12922"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kwebby.com\/blog\/wp-json\/wp\/v2\/tags?post=12922"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}