如果遇到会对请求头进行审查的网站,他的status_code会为非200,这时候想要正常爬取,需要新建键值对kv = {’user-agent','Mozilla/5.0'},并在requests.get()里面加上headers = kv
下面是实际代码
import requests
url = "https://item.jd.com/12029500.html?cpdad=1DLSUE"
kv = {'user-agent','Mozilla/5.0'}
try:
r = requests.get(url, headers = kv)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text[:1000])
print("爬取成功")
except:
print("爬取失败")