1查看爬取规则
1、查看robots.txt
https://www.zhihu.com/robots.txt
User-agent: *
Crawl-delay: 10 无论使用哪种用户代理, 都应该在两次下载请求之间给出10秒的抓取延
不允许爬取
Disallow: /login
Disallow: /logout
Disallow: /resetpassword
Disallow: /terms
Disallow: /search
Disallow: /notifications
Disallow: /settings
Disallow: /inbox
Disallow: /admin_inbox
Disallow: /?guide
Disallow: /people/---