How to create a robots txt file for Yandex. Yandex robots

Hello everyone! Today I would like to inform you about robots.txt file. So, so much has been written about on the Internet, but to be honest, I myself haven’t been able to figure out how to create the right robots.txt for a long time. As a result, I created one, and it stands on all my blogs. I don’t see any problems, robots.txt works just wonderfully.

Robots.txt for WordPress

Do you really need robots.txt? The story is still the same – . This is the creation of robots.txt - this is one of the parts of the internal optimization of the site (before speech, there will be a lesson in advance, which will be dedicated to all the internal optimization of the site on WordPress. So do not forget to subscribe to RSS, so as not to miss this material ili.).

One of the functions of this file is indexation protection unnecessary pages on the site Also, addresses are specified and the head is written mirror of the site(Site with www or without www).

Note: for sound systems, one and the same site with www and without www are completely different sites. Ale, having realized that instead of these sites, however, the jokers are gluing them together. It is important for you to register the site mirror in robots.txt. To find out whether it’s a smut (with www or without www), simply type the address of your site in your browser, for example, with www, as it will automatically redirect you to the same site without www, which means the smut is a mirror of your site without www. I hope I explained it correctly.

So this sacred axis, in my opinion, correct robots.txt for WordPress You can go lower.

Correct Robots.txt for WordPress

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: */*/trackback
Disallow: */*/feed/*/
Disallow: */feed
Disallow: /*?*
Disallow: /tag

User-agent: Yandex
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: */*/trackback
Disallow: */*/feed/*/
Disallow: */feed
Disallow: /*?*
Disallow: /tag
Host: website
.gz
Sitemap: https://site/sitemap.xml

Everything that is given above needs to be copied into a text document with extensions.txt, so that the file name is robots.txt. This text document can be created, for example, using additional programs. Tilki, don’t forget, be kind, change in the remaining three rows address to the address of your website. The robots.txt file must be located in the root of the blog, in the same folder as the wp-content, wp-admin, etc. folders.

For those who are too lazy to create this text file, you can simply enter robots.txt and also adjust 3 rows there.

I want to point out that you don’t need to be too obsessed with technical parts, which are discussed below. I bring them up to “know”, so that we can move our horizons so that we know what we need.

Ozhe, row:

User-agent

set rules for any kind of joke: for example, “*” (star) indicates that the rules are for all joke systems, and those below

User-agent: Yandex

means that these rules are no longer valid for Yandex.

Disallow
Here you “throw in” the sections that are NOT necessary for pranksters to index. For example, on the page https://site/tag/seo there is less duplication of articles (repeated) with major articles, and duplication of pages is negatively indicated on the search engine, so it is important that these sectors need to be closed for indexing, so that I'm afraid for this additional rule:

Disallow: /tag

So, from robots.txt, which is given above, due to indexation, all unnecessary sections of the site on WordPress are closed, so simply fill in everything as it is.

Host

Here we ask the main question of the site, about how I learned a little more.

Sitemap

In the remaining two rows we set the address of up to two cards to the site, created for help.

Possible problems

And the axis through this row in robots.txt, my site’s posts stopped being indexed:

Disallow: /*?*

As you know, this very row of robots.txt protects the indexing of articles, which naturally we don’t need. To correct this, you just need to delete 2 rows (for the rules for all search engines and for Yandex) and the remaining correct robots.txt for a WordPress site without CNC will look like this:

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: */*/trackback
Disallow: */*/feed/*/
Disallow: */feed
Disallow: /tag

User-agent: Yandex
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: */trackback
Disallow: */*/trackback
Disallow: */*/feed/*/
Disallow: */feed
Disallow: /tag
Host: website
Sitemap: https://site/sitemap.xml

To verify that we have compiled the robots.txt file correctly, I recommend that you quickly use the Yandex Webmaster service (I have learned how to register in this service).

Let's go to the division Adjusted indexing –> Analysis of robots.txt:

Once there, click on the “Enter robots.txt from the site” button, and then click on the “Check” button:

As soon as you are informed, it means you have the correct robots.txt for Yandex:

Swede navigation on this page:

The current reality is that in Runet there is a site that we respect ourselves, we cannot do without a file called robots.txt - please note that you have nothing to protect from indexing (although practically on each site there are technical pages and duplicate content that can be closed due to indexation ), then the minimum requirement to write a directive with www and without www for Yandex is clearly for which the rules for writing robots.txt, which are discussed below, serve.

What is robots.txt?

The file with this name dates back to 1994, when the W3C consortium decided to introduce such a standard so that sites could provide search engines with indexing instructions.

A file with this name is stored in the root directory of the site; its placement in any other folders is not allowed.

The file contains the following functions:

  1. protects any pages or groups of pages before indexing
  2. allows any pages or groups of pages before indexing
  3. instructs the Yandex robot, like a mirror site and the main one (with www or without www)
  4. shows the layout of the file with the card on the site

All these points are even more important for search engine optimization of the site. The indexation block allows you to close pages during indexation that contain duplicate content, for example, pages of tags, archives, search results, pages with versions for each other. The presence of duplicate content (if it is the same text, not even in the amount of several propositions, present on two or more sides) is a minus for the site in the ranking of search engines, so there will be fewer duplicates.

The allow directive has no independent meaning, since all the pages are already available for indexing. This works in connection with disallow - if, for example, a section is completely closed due to search systems, or if you would like to open it and close the page.

The marking on the head of the site is also one of the most important elements in optimization: search engines view the sites www.yoursite.ru and yoursite.ru as two different resources, unless you directly tell them otherwise. As a result, there is a war on the content - the appearance of duplicates, changes in the strength of external messages (external messages can be placed either with www or without www) and this can lead to lower ranking in search terms.

For Google, the mirror head is registered in the Webmaster tools (http://www.google.ru/webmasters/), and the axis for Yandex and the instructions can only be registered in the same robots.txt.

Inserting an xml file with a sitemap (for example, sitemap.xml) allows pranksters to view this file.

Rules for inserting User-agent

The user-agent in this type is the Shukov system. When writing the instructions, it is necessary to indicate what actions will be taken on all sound systems (a star sign - * is also indicated) or they are not insured for any search system - for example, Yandex or Google.

In order to set the User-agent assigned to all robots, write the following line in your file:

User-agent: *

For Yandex:

User-agent: Yandex

For Google:

User-agent: GoogleBot

Insertion rules disallow and allow

First of all, it’s important to note that the robots.txt file must contain at least one disallow directive for its validity. Now, having looked at the stagnation of these directives on specific butts.

By using this code you allow indexing of all pages of the site:

User-agent: * Disallow:

And if you use this code, for example, all pages will be closed:

User-agent: * Disallow: /

To reserve indexation for a specific directory under the name folder, enter:

User-agent: * Disallow: /folder

You can also add stars to substitute a suitable name:

User-agent: * Disallow: *.php

Important: the star replaces the name of the file completely, so you can’t enter file*.php, only *.php is possible (otherwise all pages with extensions.php will be blocked, in order to exclude them, you can enter a specific page address).

The allow directive, as it was intended above, is used to create problems with disallow (otherwise it makes no sense, because the sides are closed and so open).

For example, it is possible to protect the page in the archive folder before indexing, and then open the index.html page from this directory:

Allow: /archive/index.html Disallow: /archive/

Specify the host and site map

The host is the complete mirror of the site (either the name of the domain plus www or the name of the domain without this prefix). The host is specified only for the Yandex robot (in this case, it is obligatory to use one disallow command).

To insert host robots.txt is responsible for misrepresenting the following entry:

User-agent: Yandex Disallow: Host: www.yoursite.ru

When mapping the site, the robots.txt sitemap requires simple registration of the following path to the corresponding file, from the domain name assignments:

Sitemap: http://yoursite.ru/sitemap.xml

It’s written about how to create a website card for WordPress.

Butt robots.txt for WordPress

For WordPress, the instructions must be specified in such a way as to close all technical directories (wp-admin, wp-includes etc.) before indexing, as well as duplicate pages created by tags, RSS files, comments, searches.

As an example of robots.txt for wordpress, you can take the file from our website:

User-agent: Yandex Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /search Disallow: */trackback Disallow: */feed/ Disallow: */feed Disallow: */comments/ Disallow: /?feed= Disallow: /?s= Disallow: */page/* Disallow: */comment Disallow: */tag/* Disallow: */ attachment/* Allow: /wp-content/uploads/ Host: www..php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /search Disallow: */trackback Disallow: */feed/ Disallow: * /feed Disallow: */comments/ Disallow: /?feed= Disallow: /?s= Disallow: */page/* Disallow: */comment Disallow: */tag/* Disallow: */attachment/* Allow: /wp -content/uploads/ User-agent: * Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /xmlrpc.php Disallow: /search Disallow: */trackback Disallow: */feed/ Disallow: */feed Disallow: */comments/ Disallow: /?feed= Disallow: /?s= Disallow: */page/* Disallow: */comment Disallow: */tag/ * Disallow: */attachment/* Allow: /wp-content/uploads/ Sitemap: https://www..xml

You can enter the robots.txt file on our site using .

If you have lost your food while reading this article, ask in the comments!

First, to stop a search bot from coming to your site, first read the robots.txt file. What is this file? - This is a set of instructions for the sound system.

This is a text file with the extension txt, which is located in the root directory of the site. This set of instructions tells the search robot which pages and files to index on the site, and which ones not. Also, you basically have to mirror the site and create a map for the site.

Is the robots.txt file still needed? For proper indexing of your website. So that the search would not have duplicate pages, different service pages and documents. Once you have correctly configured the directives in robots, you will save your site from many problems with indexing and site mirroring.

How to create the correct robots.txt

It’s easy to add the robots.txt file; you can create a text document using a standard Windows notepad. We are writing directives for sound systems in this file. Next, we save this file under the name robots and text extensions txt. Everything can now be uploaded to the hosting, up to the root folder of the site. Guess what, you can create just one “robots” document for one site. If this file is available on the site, the bot automatically verifies that everything can be indexed.

Since there is only one, then instructions for all sound systems are written in one. Moreover, you can write down both the instructions under the skin and under the mustache. The instructions for various sound robots are provided through the User-agent directive. In the report we’ll talk about the price below.

Robots.txt directives

The file “for robots” can contain the following directives for indexing: User-agent, Disallow, Allow, Sitemap, Host, Crawl-delay, Clean-param. Let's look at the skin instructions in the report.

User-agent directive

User-agent directive- Indicates that there will be instructions for which sound system (more precisely, for which particular robot). If there is a “*”, then the instructions are valid for all robots. If the instructions are from a specific bot, such as Googlebot, then the instructions are valid only for the main Google indexing robot. Moreover, since the instructions are valid for Googlebot and for other subsystems, Google will only read its own instructions and ignore the illegal ones. The Yandex robot will work just like that. I marvel at the butt of writing the directive.

User-agent: YandexBot - instructions only for the main Yandex indexing bot
User-agent: Yandex - instructions for all Yandex bots
User-agent: * — instructions for all robots

Directives Disallow and Allow

Directives Disallow and Allow— give commands on what to index and what not. Disallow sends a command not to index a page or an entire section of the site. And Allow also indicates what needs to be indexed.

Disallow: / - blocks indexing of the entire site
Disallow: /papka/ - prevents indexing of the entire folder
Disallow: /files.php – blocks indexing of the file files.php

Allow: /cgi-bin – allows cgi-bin pages to be indexed

With the Disallow and Allow directives, it is often simply necessary to select special characters. There is no need to use regular expressions.

Special character * - replaces any sequence of characters. Vіn for promochannymi is assigned to the end of the skin rule. If you haven’t prescribed anything, please add the PS yourself. Vikoristan butt:

Disallow: /cgi-bin/*.aspx – prevents all files with the .aspx extension from being indexed
Disallow: /*foto - blocks indexing of files and folders, which replaces the word foto

The special character $ - precedes the special character "*" at the end of the rule. For example:

Disallow: /example$ - disallows indexing '/example', but does not disallow '/example.html'

And if you write it without the special symbol $, then the instructions will work differently:

Disallow: /example - blocks both '/example' and '/example.html'

Sitemap Directive

Sitemap Directive- Intended for insertion into the robot of the search system, where the site map is located on the hosting. The sitemap format is sitemaps.xml. A site map is required for quick and complete indexation of the site. Moreover, the site map is not necessarily one file, it can be a file. Directive entry format:

Sitemap: http://site/sitemaps1.xml
Sitemap: http://site/sitemaps2.xml

Host directive

Host directive- I instruct the robot to basically mirror the site. If there are no mirrors for the site in the index, this directive must first be specified. If you do not specify this, the Yandex robot will index at least two versions of the site with and without www. Until the robot mirror can't glue them together. Butt entry:

Host: www.site
Host: website

In the first version, the robot has an indexed version with www, in the other version without. You are allowed to specify only one Host directive in the robots.txt file. As soon as you write them down, I’ll work them out and, with respect to you, I only ask.

The correct directive is the host is guilty of the following data:
- Specify the connection protocol (HTTP or HTTPS);
- The domain name is written correctly (it cannot be written to the IP address);
— port number (for example, Host: site.com:8080).

Incorrectly issued directives will simply be ignored.

Crawl-delay directive

Crawl-delay directive Allows you to change the settings on the server. This is necessary as soon as your site begins to collapse under the onslaught of various bots. The Crawl-delay directive tells the search bot to wait between the completion of crawling one side of the site and the beginning of crawling another side of the site. The directive must go directly after the entries of the Disallow and/or Allow directives. The Yandex search robot can read other values. For example: 1.5 (seconds).

Clean-param directive

Clean-param directive Required for sites that have dynamic parameters. Be careful not to interfere with the sides. Serious service information: identifiers of sessions, contributors, referrers, etc. So that there are no duplicates and this directive is corrected. Please tell PS not to re-upload the information. Go down and visit the server and crawl the site with a robot.

Clean-param: s /forum/showthread.php

This entry tells PS that the s parameter will be considered insignificant for all urls that begin with /forum/showthread.php. The maximum length for an entry is 500 characters.

We've gone over the directives, let's move on to adjusting our robots file.

Adjusting robots.txt

Let's start straight away before customizing the robots.txt file. He is guilty of taking revenge on at least two entries:

User-agent:- Indicates that for each sound system there will be instructions that follow below.
Disallow:- Let me clarify that part of the site itself should not be indexed. You can close it for indexing, either by closing the side of the site or for other purposes.

Moreover, you can indicate that these directives are valid for all sound systems, or for any specific one. This is specified in the User-agent directive. If you want all the bots to read the instructions, put a “star”

If you want to enter entries for a specific robot, you will need to enter its name.

User-agent: YandexBot

A simple butt of a correctly folded robots file will look like this:

User-agent: *
Disallow: /files.php
Disallow: /section/
Host: website

De, * confirm that the instructions apply to all software;
Disallow: /files.php- Allows the file file.php to be blocked for indexing;
Disallow: /foto/— prevents indexing the entire “foto” section with all attached files;
Host: website- Instructs robots to index the mirror.

If you don’t have any pages on your site that need to be closed for indexing, your robots.txt file might look like this:

User-agent: *
Disallow:
Host: website

Robots.txt for Yandex (Yandex)

To indicate that these instructions are valid for the Yandex search system, you must specify in the User-agent: Yandex directive. Moreover, if we write Yandex, then the site will index all Yandex robots, and if we say YandexBot, then it will be a command only for the main indexing robot.

It is also necessary to clearly register the “Host” directive, basically mirroring the site. As I wrote above, I must try to avoid duplicate pages. Your correct robots.txt for Yandex will look like this.

File creation

Robots.txt is a file containing instructions for sound robots. It is created at the root of the site. You can simply create it on your desktop using another notepad, just as you would create any text file.

To do this, press the right mouse button into the empty space and select Create – Text document (not Word). You will need to use your emergency notepad. Call it robots, its extension is more correct – txt. That's all there is to creating a file.

Yak fold robots.txt

Now you can no longer fill the file with the required inserts. Well, the commands for robots have a much simpler syntax than any kind of programming. There are two ways to save the file:

Visit another site, copy and change the structure of your project.

Write it yourself

I have already written about the first method. This is because the sites have new engines and there is no real difference in functionality. For example, all sites on Wordpress have the same structure, but there can be different expansions, such as forums, online stores and any number of additional catalogs. If you want to know how to change robots.txt, read this article, you can also read the previous one, and in this article it will be said to get a lot.

For example, you have a /source directory on your website, where the output of the articles you write on your blog is saved, but there is no such directory in any other webmaster. For example, you want to close the source folder for indexing. If you copy robots.txt to another resource, there will be no such command there. You will have to add your own instructions, it seems unnecessary.

So, in any case, it is useful to know the basic syntax of instructions for robots, which we will now understand.

How to write your instructions to robots?

First, what does the file begin with, including the insertion of instructions for each sound machine. Try it like this:

User-agent: Yandex Abo User-agent: Googlebot

User-agent: Yandex

User-agent: Googlebot

There is no need to put the same points at the end of the row, you don’t have to program it). By the way, it’s clear that in the first case the instructions are readable only by the Yandex bot, in the other - only by Google. If commands are run by robots, write them like this: User-agent:

Miraculous. We've grown apart from animals to work. It's not difficult. You can see this in a simple example. You have three young brothers, Vasya, Dima and Petrik, and you are the head one. The fathers went and told you that you should follow them.

All three have something to ask of you. Know that you need to give them feedback so that you write instructions to the sound robots. It will look something like this:

User-agent: Vasya Allow: go to the football User-agent: Dima Disallow: go to the football (Dima broke his sins last time, he was punished) User-agent: Petya Allow: go to the cinema, what you have done is still allowed ask, well that’s okay, let’s go).

In this manner, Vasya happily laces up her sneakers, Dima, with her head bowed, marvels at the window at her brother, who is already thinking how many goals he will score today (Dima, having rejected the disallow command, will block). Well, Petya is watching his movie.

It is not important for this butt to understand that Allow is an allowance, and Disallow is a fence. But in robots.txt we issue commands not to people, but to robots, so instead of specific information, the addresses of pages and directories that need to be allowed and protected for indexing are written there.

For example, I am using the website site.ru. Vіn on the WordPress engine. I’m starting to write instructions:

User-agent: * Disallow: /wp-admin/ Disallow: /wp-content/ Disallow: /wp-includes/ Allow: /wp-content/uploads/ Disallow: /source/ Well, etc.

User-agent: *

Disallow: /wp-admin/

Disallow: /wp-content/

Disallow: /wp-includes/

Disallow: /source/

Well. bud.

First of all, I’m furious until everyone works. In a different way, by blocking the indexation of folders in the engine itself, and then allowing the robot access to the folder containing your favorites. All pictures are saved there, so that they are not closed for indexing, as it is planned to remove traffic from searches for pictures.

Well, remember, I was talking earlier in the article about those that you can have additional catalogues? You can create them yourself for various purposes. For example, on one of my sites there is a flash folder where I throw flash games so that they can then be launched on the site. Or source – this folder can store files that are available to clients for acquisition.

The name of the folder is absolutely unimportant. If you need to close it, you need to send the Disallow command to it.

The Allow command is required in order to open certain parts of closed sections. Finally, if you don’t have a robots.txt file, the entire site will be available for indexing. This is both good (be sure to be careful not to close it quietly), and at the same time bad (files and folders will be opened that are not in your sight).

To better understand this point, I urge you to take a look at this item once again:

Disallow: /wp-content/ Allow: /wp-content/uploads/

Disallow: /wp-content/

Allow: /wp-content/uploads/

As you know, we are initially protecting the indexation of everything from the wp-content directory. It saves all your templates, plugins, and also pictures. Obviously, they can be opened. For this we need the Allow command.

Additional parameters

The list of commands is not the only one that can be specified in the file. Also: Host – indicates on the head the mirror of the site. For those who don’t know, there are two options for writing a domain name: domain.com and www.domain.com.

To avoid any problems, it is necessary to indicate one option. This can be done using both webmaster tools and the Robots.txt file. For whom we write: Host: domain.com

What does it give? If you want to try to connect to your site like this: www.domain.com – it will automatically be transferred to the version without www, so that it will be recognized by the head mirror.

Another directive is sitemap. I think you have already realized that it sets the route to the site map in xml format. Example: http://domain.com/sitemap.xml

Again, you can add a card in Yandex.Webmaster, and you can also enter it in robots.txt, so that the robot reads this row and clearly understands where you can find a map for the site. For a robot, a site map is as important as a ball for Vasya, which is why he goes to football. It’s all the same that this ball feeds you (like an older brother). And you youmu:

marveling at the licked sofa

Now you know how to properly configure and change robots.txt for Yandex and use any other search engine for your needs.

What does customizing a file do?

I also spoke about this before, but I’ll say it again. If you have a clearly defined file with commands for robots, you can rest easier, knowing that the robot will not get into an unnecessary section and will not be included in the index of an unnecessary page.

I also said that customizing robots.txt does not mean anything. Zokrema, she doesn’t hide from duplicates, which comes from those that are not complete enough. Just like people. You allowed Vasya to go to football, but it’s not a fact that you won’t get the same results there as Dima. So with duplicates: the date command is possible, but it certainly cannot be entered, so that the statement does not get through to the index that contains the positions.

There is no need to be afraid of doubles because they will burn. For example, it is normal for Yandex to be placed before sites that have serious technical problems. Otherwise, if you run it on the right, then you can actually waste a significant amount of traffic to yourself. However, it is important for our section dedicated to SEO to have an article about duplicates, so we will fight with them.

How can I remove normal robots.txt because I don’t understand anything?

By the way, the creation of robots.txt is not the creation of the site. To put it simply, you can simply copy it from a file from any successful blogger. First of all, if you have a website on WordPress. If you are using a different engine, then you and the site will need to search on the same cms. I already said how to look at a file on someone else’s site: Domain.com/robots.txt

Pouch

I think there’s nothing more to talk about here, since you don’t need to work with the folded instructions for robots with your method on the river. This is true, because it is possible to hire newbies for 30-60 coins, and a professional can get everything for a couple of coins. Everything will work out for you and you won’t have any doubts about it.

And to find out other useful and important tips for promoting and promoting your blog, you can admire our unique one. If you strive to create 50-100% of recommendations, then you can successfully launch any kind of sites in the future.

This article has an example of the optimal, in my opinion, code for the robots.txt file under WordPress, which you can edit on your sites.

For starters, guess what? just needed robots.txt- the robots.txt file is required especially for search robots, to “tell” them which sections/stories the site should include and which ones should not be included. Pages that are closed from the outlet will not be processed to the search engine index (Yandex, Google, etc.).

Option 1: optimal robots.txt code for WordPress

User-agent: * Disallow: /cgi-bin # classic... Disallow: /? # all parameters will be saved to the main one Disallow: /wp- # all WP files: /wp-json/, /wp-includes, /wp-content/plugins Disallow: *?s= # search Disallow: *&s= # search Disallow: /search # searches Disallow: /author/ # author's archives Disallow: */embed # all downloads Disallow: */page/ # all types of pagination Allow: */uploads # uploads Allow: /*/*.js # in the middle /wp - (/*/ - for priority) Allow: /*/*.css # in the middle /wp- (/*/ - for priority) Allow: /wp-*.png # images in plugins, cache folders, etc. Allow: /wp-*.jpg # images in plugins, cache folders, etc. Allow: /wp-*.jpeg # images in plugins, cache folders, etc. Allow: /wp-*.gif # pictures in plugins, cache folders, etc. Allow: /wp-*.svg # images in plugins, cache folders, etc. Allow: /wp-*.pdf # files in plugins, cache folders, etc. Allow: /wp-admin/admin-ajax.php #Disallow: /wp/ # if WP is installed in the subdirectory wp Sitemap: http://example.com/sitemap.xml Sitemap: http://example.com/sitemap2. xml # another file #Sitemap: http://example.com/sitemap.xml.gz # style version (.gz) # Code version: 1.1 # Don't forget to add `site.ru` to your site.

Code selection:

    In the User-agent row: * we indicate that all the rules below will apply to all search robots *. If it is necessary that these rules apply only to one specific robot, then replace * the indicated robot name (User-agent: Yandex, User-agent: Googlebot).

    In the Allow row: */uploads we expressly allow you to index pages in which /uploads is included. Tse obov'yazkovo, because We mostly recommend indexing pages that begin with /wp-, and /wp- enter /wp-content/uploads. Therefore, to interrupt the rule Disallow: /wp - required row Allow: */uploads , even after the requested type /wp-content/uploads/... We may have pictures lying around that may be indexed, and there may also be some unwanted files lying there that have nothing to grab. Allow: can be “before” or “after” Disallow: .

    Other rows block robots from “walking” for orders, which begin with:

    • Disallow: /cgi-bin – closes the scripts directory on the server
    • Disallow: /feed - closes the RSS feed of the blog
    • Disallow: /trackback - closes the notification
    • Disallow: ?s= or Disallow: *?s= - closes pages for search
    • Disallow: */page/ - closes all types of pagination
  1. The Sitemap rule: http://example.com/sitemap.xml instructs the robot to a file with a sitemap in XML format. If you have such a file on your site, write the new path to the new one. Such files can be collected, and then, apparently, they go to the skin.

    In the row Host: site.ru we have a substantive mirror of the site. If the site has mirrors (copies of the site on other domains), then for Yandex to index them again, it is necessary to specify the head mirror. Host Directive: Yandex doesn’t understand, Google doesn’t understand! If the site operates under the https protocol, it must be specified in the Host: Host: http://example.com

    From the Yandex documentation: “Host is an independent directive and operates in any place in the file (cross-section).” So we put it on top or at the end of the file, through the empty row.

Because the presence of hidden feeds is required, for example, for Yandex Zen, if you need to connect a site to a channel (such as a “Digital” commentator). You can open the required information here.

At the same time, feeds have their own format in the headlines of the video, so that search engines understand that it is not an HTML page, but a feed, and, obviously, treat it differently.

The Host directive is no longer needed for Yandex

Yandex is again included in the Host directive, replacing 301 redirects. Host can be easily removed from robots.txt. However, it is important that on all mirror sites there are 301 redirects to the main site (head mirror).

This is important: sorting out the rules before cutting

Yandex and Google process the Allow and Disallow directives out of order in each one specified, but sort them first from a short rule to a long one, and then process the remaining rule:

User-agent: * Allow: */uploads Disallow: /wp-

will be read as:

User-agent: * Disallow: /wp- Allow: */uploads

To quickly understand and understand the peculiarity of sorting, remember the following rule: “Whichever comes before the robots.txt rule, it has a higher priority. If there are two rules, however, priority is given to the Allow directive.”

Option 2: standard robots.txt for WordPress

I don’t know who it is, but I’m for the first option! Because it is logical - I do not need to duplicate the section in order to specify the Host directive for Yandex, which is cross-sectional (of course, a robot in any place template, without specifying which robot is involved). Because there is a non-standard Allow directive, it works for Yandex and Google, and if they don’t open the uploads folder for other robots, which they don’t understand, then in 99% of cases nothing untoward will happen. I haven’t yet noticed that the first robots don’t work as required.

The guidance code is mostly not correct. Thanks to the commentator for pointing out incorrectness, although he had to get there himself. I'm so excited about this (I can have mercy):

    Robots (not Yandex and Google) - do not understand more than 2 directives: User-agent: and Disallow:

  1. The Yandex Host directive: needs to be followed after Disallow:, because robots (not Yandex and Google) may not understand it and may reject robots.txt. For Yandex itself, judging by the documentation, it’s absolutely the same thing to vikorize Host:, even if you want to create robots.txt in one row Host: www.site.ru, in order to glue all the mirrors of the site together.

3. Sitemap: a cross-sectional directive for Yandex and Google and perhaps for many other robots as well, so we write at the end through the empty row and it is applicable for all robots at once.

Based on these amendments, the correct code may look like this:

User-agent: Yandex Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: */embed Disallow: */page/ Disallow: /cgi-bin Disallow: *?s= Allow: /wp-admin/admin-ajax.php Host: site.ru User-agent: * Disallow: /wp-admin Disallow : /wp-includes Disallow: /wp-content/plugins Disallow: /wp-json/ Disallow: /wp-login.php Disallow: /wp-register.php Disallow: */embed Disallow: */page/ Disallow: / cgi-bin Disallow: *?s= Allow: /wp-admin/admin-ajax.php Sitemap: http://example.com/sitemap.xml

Let's add it to ourselves

If you need to protect any pages or groups of pages, you can add a rule (directive) below Disallow:. For example, we need to close all entries in a category from indexing news then before Sitemap: let's add the rule:

Disallow: /news

It’s important to prevent robots from going for similar orders:

  • http://example.com/news
  • http://example.com/news/drugoe-nazvanie/

If you need to close the /news entrance, then we write:

Disallow: */news

  • http://example.com/news
  • http://example.com/my/news/drugoe-nazvanie/
  • http://example.com/category/newsletter-nazvanie.html

You can also read the robots.txt directives on the Yandex help page (but be aware that not all the rules described there apply to Google).

Verification of robots.txt and documentation

You can verify that the written rules are being applied correctly by following these instructions:

  • Yandex: http://webmaster.yandex.ru/robots.xml.
  • Google has a price to pay Search console. Authorization and visibility of the site in the webmaster panel is required...
  • Service for creating a robots.txt file: http://pr-cy.ru/robots/
  • Service for creation and verification of robots.txt: https://seolib.ru/tools/generate/robots/

I asked Yandex...

Having placed the food at the quiet. Yandex's support for cross-sectional vikoristration of Host and Sitemap directives:

Feeding:

I'm flying!
I am writing an article about robots.txt on my blog. I would like to point out the answer to this statement (I don’t know a definitive “so” from the documentation):

I need to merge all the mirrors and for this purpose I need to add the Host directive to the robots.txt file:

Host: site.ru User-agent: * Disallow: /asd

How will you use Host: site.ru correctly? Chi vkazuvatime won robots scho site.ru is basically a mirror. Tobto. This directive is not in the section, but attached (to the beginning of the file) without assigning any User-agent.

Also, we would like to know that the Sitemap directive must be added in the middle of the section or can be added between the boundaries: for example, through an empty row, after the section?

User-agent: Yandex Disallow: /asd User-agent: * Disallow: /asd Sitemap: http://example.com/sitemap.xml

Does the robot understand the Sitemap directive?

I am determined to refuse you a confirmation, which would put a bold mark on my doubts.

Subject:

I'm flying!

The Host and Sitemap directives are cross-sectional, so they will be used by the robot regardless of the place in the robots.txt file, where they are specified.

--
With respect, Platon Shchukin
Yandex support service

Visnovok

It is important to remember that changes in robots.txt on the work site will be noticed in a few months (2-3 months).

Be aware that Google may ignore the rules in robots.txt and take the page into the index, as it is important that the page is already unique and original and is simply responsible for being in the index. However, others are sensitive to the simple hypothesis that incomplete optimizers may incorrectly specify rules in robots.txt and thus close the necessary indexing pages and deprive unnecessary ones. I'm more likely to shy away from another disgrace.

Dynamic robots.txt

WordPress asks for the robots.txt file to be created directly and it is not at all obligatory to physically create the robots.txt file in the root of the site, but it is not recommended, because with this approach it will be very difficult for plugins to change this file, but as required.

About how dynamic creation of the robots.txt file works, read in the description of the function, and below I will give an example of how you can change the location of this file, for example, through the hook.

For this, add the following code to the functions.php file:

Add_action("do_robotstxt", "my_robotstxt"); function my_robotstxt())( $lines = [ "User-agent: *", "Disallow: /wp-admin/", "Disallow: /wp-includes/", "", ]; echo implode("\r\n ", $lines); die; // cut off to the PHP robot)

User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/

Crawl-delay - timeout for godly robots (from 2018 the fate is not covered)

Yandex

Having analyzed the lists for the last two years for our support for food indexing, we realized that one of the main reasons for the increased demand for documents is the incorrectly configured Crawl-delay directive in robots.txt […] In order to prevent site owners from getting caught more about In order to ensure that all necessary pages of sites appear and are updated in search of speed, we are likely to be exposed to the Crawl-delay directive.

When the Yandex robot scans the site as if it were divine, it creates a supernatural attack on the server. You can ask the robot to “change the wrapper.”

For this you need to use the Crawl-delay directive. It indicates the hour and seconds that the robot must stand (check) to scan the skin of the site.

To deal with robots that do not comply with the robots.txt standard, Crawl-delay must be indicated in the group (in the User-Agent section) immediately after Disallow and Allow

The Yandex robot understands the fractional values, for example, 0.5 (p.s. seconds). This does not guarantee that the search robot will access your site within seconds, but allows you to speed up crawling the site.

User-agent: Yandex Disallow: /wp-admin Disallow: /wp-includes Crawl-delay: 1.5 # timeout in 1.5 seconds User-agent: * Disallow: /wp-admin Disallow: /wp-includes Allow: /wp-* .gif Crawl-delay: 2 # time in 2 seconds

Google

Googlebot does not understand the Crawl-delay directive. You can specify a timeout for robots on the webmaster’s panel.

On the avi1.ru service you can now immediately add more SMM promotion to 7 of the most popular social networks. Restore respect to the low level of all services to the site.