Magento robots.txt 设置

网站正式上线前一般会在外网真实环境下测试或添加数据,这时需要禁止搜索引擎爬虫访问网站。

User-agent: * 

Disallow: /

Allow: /.well-known

正式上线则需要允许爬虫访问,当然对于做跨境的还需要防止国内蜘蛛爬取,节省流量和不必要的广告点击费用

# robots.txt for Magento 1.9.x

#
# Google Image Crawler Setup - having crawler-specific sections makes it ignore generic e.g *

User-agent: Googlebot
Disallow:

User-agent: AdsBot-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

#
# Yandex tends to be rather aggressive, may be worth keeping them at arms lenght
User-agent: YandexBot
Crawl-delay: 20
# Problem is mostly related to layered nav and query params, allow only paging
Allow: /*?p=
Disallow: /*?p=*&
Disallow: /*?

#
# Crawlers Setup
User-agent: *
#
# Allow paging (unless paging inside a listing with more params, as disallowed below)
Allow: /*?p=
#
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /magento/
#Disallow: /media/
Disallow: /media/captcha/
#Disallow: /media/catalog/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
#Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/
#
# Paths (if using shop id in URL must prefix with * or copy for each)
Disallow: */index.php/
Disallow: */catalog/product_compare/
Disallow: */catalog/category/view/
Disallow: */catalog/product/view/
Disallow: */catalog/product/gallery/
Disallow: */catalogsearch/
Disallow: */control/
Disallow: */contacts/
Disallow: */customer/
Disallow: */customize/
Disallow: */newsletter/
Disallow: */poll/
Disallow: */review/
Disallow: */sendfriend/
Disallow: */tag/
Disallow: */wishlist/
Disallow: */checkout/
Disallow: */onestepcheckout/
#
# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
#
# Do not crawl sub category pages that are sorted or filtered.
# This would be very broad, could hurt (incl. SEO).
# Disallow: /*?*
#
# These are more specific, pick what you need - and do not forget to add your custom filters!
Disallow: /*?dir*
Disallow: /*?limit*
Disallow: /*?mode*
Disallow: /*?___from_store=*
Disallow: /*?___store=*
Disallow: /*?cat=*
Disallow: /*?q=*
Disallow: /*?price=*
Disallow: /*?availability=*
Disallow: /*?brand=*
#
# Paths that can be safely ignored (no clean URLs)
Disallow: /*?p=*&
Disallow: /*.php$
Disallow: /*?SID=
#
# 

User-Agent:Pinterest/0.2 (+http://www.pinterest.com/)

Allow:/

User-Agent: almaden

Disallow: /

User-Agent: ASPSeek

Disallow: /

User-Agent: Axmo

Disallow: /

User-Agent: BaiduSpider

Disallow: /

User-Agent: booch

Disallow: /

User-Agent: DTS Agent

Disallow: /

User-Agent: Downloader

Disallow: /

User-Agent: EmailCollector

Disallow: /

User-Agent: EmailSiphon

Disallow: /

User-Agent: EmailWolf

Disallow: /

User-Agent: Expired Domain Sleuth

Disallow: /

User-Agent: Franklin Locator

Disallow: /

User-Agent: Gaisbot

Disallow: /

User-Agent: grub

Disallow: /

User-Agent: HughCrawler

Disallow: /

User-Agent: iaea.org

Disallow: /

User-Agent: lcabotAccept

Disallow: /

User-Agent: IconSurf

Disallow: /

User-Agent: Iltrovatore-Setaccio

Disallow: /

User-Agent: Indy Library

Disallow: /

User-Agent: IUPUI

Disallow: /

User-Agent: Kittiecentral

Disallow: /

User-Agent: larbin

Disallow: /

User-Agent: lwp-trivial

Disallow: /

User-Agent: MetaTagRobot

Disallow: /

User-Agent: Missigua Locator

Disallow: /

User-Agent: NetResearchServer

Disallow: /

User-Agent: NextGenSearch

Disallow: /

User-Agent: NPbot

Disallow: /

User-Agent: Nutch

Disallow: /

User-Agent: ObjectsSearch

Disallow: /

User-Agent: Oracle Ultra Search

Disallow: /

User-Agent: PEERbot

Disallow: /

User-Agent: PictureOfInternet

Disallow: /

User-Agent: PlantyNet

Disallow: /

User-Agent: QuepasaCreep

Disallow: /

User-Agent: ScSpider

Disallow: /

User-Agent: SOFT411

Disallow: /

User-Agent: spider.acont.de

Disallow: /

User-Agent: Sqworm

Disallow: /

User-Agent: SSM Agent

Disallow: /

User-Agent: TAMU

Disallow: /

User-Agent: TheUsefulbot

Disallow: /

User-Agent: TurnitinBot

Disallow: /

User-Agent: Tutorial Crawler

Disallow: /

User-Agent: TutorGig

Disallow: /

User-Agent: WebCopier

Disallow: /

User-Agent: WebZIP

Disallow: /

User-Agent: ZipppBot

Disallow: /

User-Agent: Xenu

Disallow: /

User-Agent: Wotbox

Disallow: /

User-Agent: Wget

Disallow: /

User-Agent: NaverBot

Disallow: /

User-Agent: mozDex

Disallow: /

User-Agent: Sosospider

Disallow: /

User-agent: Sogou web spider

Disallow: /

User-agent: sogou spider

Disallow: /

User-agent: BLEXBot

Disallow: /

User-agent: AhrefsBot

Disallow: /

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值