Training: WWW-Robots (HTTP, Training)

最新推荐文章于 2024-07-24 10:28:34 发布

Howy_why

最新推荐文章于 2024-07-24 10:28:34 发布

阅读量1.6k

点赞数

分类专栏： WeChall

本文链接：https://blog.csdn.net/weixin_43753319/article/details/94667049

版权

WeChall 专栏收录该内容

31 篇文章 3 订阅

订阅专栏

Training: WWW-Robots (HTTP, Training)

题目描述
In this little training challenge, you are going to learn about the Robots_exclusion_standard.
The robots.txt file is used by web crawlers to check if they are allowed to crawl and index your website or only parts of it.
Sometimes these files reveal the directory structure instead protecting the content from being crawled.
Enjoy!

在这个小小的培训挑战中，您将了解Robots_Exclusion_Standard。
网络爬虫使用robots.txt文件检查是否允许他们抓取和索引您的网站或只允许部分内容。
有时，这些文件会暴露目录结构，而不是保护内容不被抓取。
好好享受吧！

解：

首先要了解 Robots_exclusion_standard 是什么（自己去维基百科了解，需要科学上网），直接点击字符段Robots_exclusion_standard即可来到维基百科。
可以得到：
在这里插入图片描述
和

翻译：
有些人甚至可以使用robots.txt作为指导来查找不允许访问的链接并直接找到它们。