1:我们刚开始爬:
import requests url = "http://www.ip138.com/ips138.asp?ip=" try: r = requests.get(url+"202.204.80.112") r.raise_for_status() 返回状态码 成功200 不是的话主动抛出异常 r.encoding = r.apparent_encoding print(r.text[-500:]) except: print("爬取失败")
2:我们爬取带有关键字的网页
import requests keyword = "python" url = "http://www.baidu.com" try: kv = {'wd': keyword} r = requests.get(url, params=kv) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text) except: print("爬取失败")
import requests keyword = "python" url = "http://www.so.com" try: kv = {"q": keyword} r = requests.get(url, params=kv) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text) except: print("爬取失败")
3:我们爬取图片
import os import requests path = "D:\\abc\\" url = "http://img.lanrentuku.com/img/allimg/1403/13962495222859.jpg" # url = "http://image.nationalgeographic.com.cn/2017/0211/20170211061910157.jpg" root = path + url.split('/')[-1] try: if not os.path.exists(path): os.mkdir(path) 判断文件夹是否存在 如果不存在我们就新建一个 if not os.path.exists(root): r = requests.get(url) r.raise_for_status() with open(root, 'wb')as f: 我们打开我们root文件 二进制写入 f.write(r.content) f.close() print("文件保存成功") else: print("文件已存在") except: print("爬取图片失败")