python的requests库网页和图片的简单爬取
安装requests库
打开cmd的管理员模式
输入pip install requests
等带安装successful
就可以啦
对jd.com里任意商品页面的爬取
import requests
url = "https://item.jd.com/2967929.html"
try:
r = requests.get(url)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text[:1000])
except:
print("爬取失败")
对amazon的页面爬取,但是amazon存在页面保护,所以我们需要模拟浏览器对其的访问才能得到数据
import requests
r = requests.get("http://www.amazon.cn/gp/product/B01M8L5Z3Y")
r.status_code
r.encoding
r.encoding = r.apparent_encoding
r.text
对图片的爬取保存
import requests
import os
#path = "D:/abc.jpg"
url = "http://imgsize.ph.126.net/?enlarge=true&imgurl=http://edu-image.nosdn.127.net/3321D6673EB82C94D08E1B80E8344166.jpg?imageView&thumbnail=426y240&quality=100_230x130x1x95.png"
root = "D://pics//"
path = root +url.split('/')[-1]
try:
if not os.path.exists(root):
os.mkdir(root)
if not os.path.exists(path):
r = requests.get(url)
with open(path,'wb') as f:
f.write(r.content)
f.close()
print("save success")
else:
print("file exit")
except:
print("爬取失败")
# r =requests.get(url)
# r.status_code
#with open(path,'wb') as f:
#f.write(r.content)
#f.close()