1.京东商品页面的爬取
https://item.jd.com/2967929.html
代码:
import requests
url = "https://item.jd.com/2967929.html"
try:
r = requests.get(url)
r.raise_for_status()
r.encodint = r.apparent_encoding
print(r.text[:1000])
except:
print("爬取失败")
"""
结果如下:
<!-- shouji -->
<!DOCTYPE HTML>
<html lang="zh-CN">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gbk" />
<title>【华为荣耀8】荣耀8 4GB+64GB 全网通4G手机 魅海蓝【行情 报价 价格 评测】-京东</title>
<meta name="keywords" content="HUAWEI荣耀8,华为荣耀8,华为荣耀8报价,HUAWEI荣耀8报价"/>
<meta name="description" content="【华为荣耀8】京东JD.COM提供华为荣耀8正品行货,并包括HUAWEI荣耀8网购指南,以及华为荣耀8图片、荣耀8参数、荣耀8评论、荣耀8心得、荣耀8技巧等信息,网购华为荣耀8上京东,放心又轻松" />
<meta name="format-detection" content="telephone=no">
<meta http-equiv="mobile-agent" content="format=xhtml; url=//item.m.jd.com/product/2967929.html">
<meta http-equiv="mobile-agent" content="format=html5; url=//item.m.jd.com/product/2967929.html">