MOOC关于爬取京东商品界面的实例:
import requests
url = "https://item.jd.com/100002795959.html"
try:
r = requests.get(url)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text[:1000])
except:
print("爬取失败")
运行后出现
<script>window.location.href='https://passport.jd.com/uc/login?ReturnUrl=http://item.jd.com/100007413580.html'</script>
加入修改header的语句
import requests
url = "https://item.jd.com/100002795959.html"
try:
kv = {'user-agent':'Mozilla/5.0'}
r = requests.get(url,headers = kv)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text[:1000])
except:
print("爬取失败")
爬取成功
<!DOCTYPE HTML>
<html lang="zh-CN">
<head>
<!-- shouji -->
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>【华为P30 Pro】华为 HUAWEI P30 Pro 超感光徕卡四摄10倍混合变焦麒麟980芯片屏内指纹 8GB+128GB极光色全网通版双4G手机【行情 报价 价格 评测】-京东</title>
<meta name="keywords" content="HUAWEIP30 Pro,华为P30 Pro,华为P30 Pro报价,HUAWEIP30 Pro报价"/>
<meta name="description" content="【华为P30 Pro】京东JD.COM提供华为P30 Pro正品行货,并包括HUAWEIP30 Pro网购指南,以及华为P30 Pro图片、P30 Pro参数、P30 Pro评论、P30 Pro心得、P30 Pro技巧等信息,网购华为P30 Pro上京东,放心又轻松" />
<meta name="format-detection" content="telephone=no">
<meta http-equiv="mobile-agent" content="format=xhtml; url=//item.m.jd.com/product/100002795959.html">
<meta http-equiv="mobile-agent" content="format=html5; url=//item.m.jd.com/product/100002795959.html">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<link rel="canonical" href="//item.jd.com/100002795959.html"/>
<link rel="dns-prefetch" href="//misc.360buyimg.com"/>
<link rel="dns-prefet