robots.txt禁止
I develop customer websites on a publicly accessible web server so that my customers may check the progress of their website at any given time. I could use .htaccess to require username and password for the site but then I'm constantly needing to remind customers what their password is. My big concern is preventing search engines from finding their way to my development server. Luckily I can add a robots.txt file to my development server websites that will prevent search engines from indexing them.
我在可公开访问的Web服务器上开发客户网站,以便我的客户可以在任何给定时间检查其网站的进度。 我可以使用.htaccess要求网站的用户名和密码 ,但随后我一直需要提醒客户他们的密码是什么。 我最关心的是阻止搜索引擎找到通往我的开发服务器的方式。 幸运的是,我可以将robots.txt文件添加到开发服务器网站上,以防止搜索引擎将它们编入索引。
Robots.txt (The Robots.txt)
User-agent: *
Disallow: /
The above directive prevents the search engines from indexing any pages or files on the website. Say, however, that you simply want to keep search engines out of the folder that contains your administrative control panel. You'd code:
上述指令可防止搜索引擎索引网站上的任何页面或文件。 但是,请说您只是想将搜索引擎保留在包含管理控制面板的文件夹之外。 您将代码:
User-agent: *
Disallow: /administration/
Or if you wanted to allow in all spiders except Google's GoogleBot, you'd code:
或者,如果您想允许除Google的GoogleBot之外的所有蜘蛛,请执行以下代码:
User-Agent: googlebot
Disallow: /
What would you prevent the search engines from seeing?
您会阻止搜索引擎看到什么?
robots.txt禁止