网站正式上线前一般会在外网真实环境下测试或添加数据,这时需要禁止搜索引擎爬虫访问网站。
User-agent: *
Disallow: /
Allow: /.well-known
正式上线则需要允许爬虫访问,当然对于做跨境的还需要防止国内蜘蛛爬取,节省流量和不必要的广告点击费用
# robots.txt for Magento 1.9.x
#
# Google Image Crawler Setup - having crawler-specific sections makes it ignore generic e.g *
User-agent: Googlebot
Disallow:
User-agent: AdsBot-Google
Disallow:
User-agent: Googlebot-Image
Disallow:
#
# Yandex tends to be rather aggressive, may be worth keeping them at arms lenght
User-agent: YandexBot
Crawl-delay: 20
# Problem is mostly related to layered nav and query params, allow only paging
Allow: /*?p=
Disallow: /*?p=*&
Disallow: /*?
#
# Crawlers Setup
User-agent: *
#
# Allow paging (unless paging inside a listing with more params, as disallowed below)
Allow: /*?p=
#
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /magento/
#Disallow: /media/
Disallow: /media/captcha/
#Disallow: /media/catalog/
Disallow: /media/customer/
Disallow: /media/dhl/
Disallow: /media/downloadable/
Disallow: /media/import/
Disallow: /media/pdf/
Disallow: /media/sales/
Disallow: /media/tmp/
#Disallow: /media/wysiwyg/
Disallow: /media/xmlconnect/
Disallow: /pkginfo/
Disallow: /report/
Disallow: /scripts/
Disallow: /shell/
#Disallow: /skin/
Disallow: /stats/
Disallow: /var/
#
# Paths (if using shop id in URL must prefix with * or copy for each)
Disallow: */index.php/
Disallow: */catalog/product_compare/
Disallow: */catalog/category/view/
Disallow: */catalog/product/view/
Disallow: */catalog/product/gallery/
Disallow: */catalogsearch/
Disallow: */control/
Disallow: */contacts/
Disallow: */customer/
Disallow: */customize/
Disallow: */newsletter/
Disallow: */poll/
Disallow: */review/
Disallow: */sendfriend/
Disallow: */tag/
Disallow: */wishlist/
Disallow: */checkout/
Disallow: */onestepcheckout/
#
# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt
#
# Do not crawl sub category pages that are sorted or filtered.
# This would be very broad, could hurt (incl. SEO).
# Disallow: /*?*
#
# These are more specific, pick what you need - and do not forget to add your custom filters!
Disallow: /*?dir*
Disallow: /*?limit*
Disallow: /*?mode*
Disallow: /*?___from_store=*
Disallow: /*?___store=*
Disallow: /*?cat=*
Disallow: /*?q=*
Disallow: /*?price=*
Disallow: /*?availability=*
Disallow: /*?brand=*
#
# Paths that can be safely ignored (no clean URLs)
Disallow: /*?p=*&
Disallow: /*.php$
Disallow: /*?SID=
#
#
User-Agent:Pinterest/0.2 (+http://www.pinterest.com/)
Allow:/
User-Agent: almaden
Disallow: /
User-Agent: ASPSeek
Disallow: /
User-Agent: Axmo
Disallow: /
User-Agent: BaiduSpider
Disallow: /
User-Agent: booch
Disallow: /
User-Agent: DTS Agent
Disallow: /
User-Agent: Downloader
Disallow: /
User-Agent: EmailCollector
Disallow: /
User-Agent: EmailSiphon
Disallow: /
User-Agent: EmailWolf
Disallow: /
User-Agent: Expired Domain Sleuth
Disallow: /
User-Agent: Franklin Locator
Disallow: /
User-Agent: Gaisbot
Disallow: /
User-Agent: grub
Disallow: /
User-Agent: HughCrawler
Disallow: /
User-Agent: iaea.org
Disallow: /
User-Agent: lcabotAccept
Disallow: /
User-Agent: IconSurf
Disallow: /
User-Agent: Iltrovatore-Setaccio
Disallow: /
User-Agent: Indy Library
Disallow: /
User-Agent: IUPUI
Disallow: /
User-Agent: Kittiecentral
Disallow: /
User-Agent: larbin
Disallow: /
User-Agent: lwp-trivial
Disallow: /
User-Agent: MetaTagRobot
Disallow: /
User-Agent: Missigua Locator
Disallow: /
User-Agent: NetResearchServer
Disallow: /
User-Agent: NextGenSearch
Disallow: /
User-Agent: NPbot
Disallow: /
User-Agent: Nutch
Disallow: /
User-Agent: ObjectsSearch
Disallow: /
User-Agent: Oracle Ultra Search
Disallow: /
User-Agent: PEERbot
Disallow: /
User-Agent: PictureOfInternet
Disallow: /
User-Agent: PlantyNet
Disallow: /
User-Agent: QuepasaCreep
Disallow: /
User-Agent: ScSpider
Disallow: /
User-Agent: SOFT411
Disallow: /
User-Agent: spider.acont.de
Disallow: /
User-Agent: Sqworm
Disallow: /
User-Agent: SSM Agent
Disallow: /
User-Agent: TAMU
Disallow: /
User-Agent: TheUsefulbot
Disallow: /
User-Agent: TurnitinBot
Disallow: /
User-Agent: Tutorial Crawler
Disallow: /
User-Agent: TutorGig
Disallow: /
User-Agent: WebCopier
Disallow: /
User-Agent: WebZIP
Disallow: /
User-Agent: ZipppBot
Disallow: /
User-Agent: Xenu
Disallow: /
User-Agent: Wotbox
Disallow: /
User-Agent: Wget
Disallow: /
User-Agent: NaverBot
Disallow: /
User-Agent: mozDex
Disallow: /
User-Agent: Sosospider
Disallow: /
User-agent: Sogou web spider
Disallow: /
User-agent: sogou spider
Disallow: /
User-agent: BLEXBot
Disallow: /
User-agent: AhrefsBot
Disallow: /