总结:工程化思想,requests的使用
反爬:1.robots告诉你哪些可以爬取
2.判断header头来限制爬虫(你自己改就完事了)
爬取网页:
总结:工程化,保证程序怎么exe都不会erro
import requests url = "https://uland.taobao.com/sem/tbsearch?refpid=mm_26632258_3504122_32538762&keyword=%E6%B7%98%E5%AE%9D&clk1=c0c6b48b740856939e06f9ee54e480fb&upsid=c0c6b48b740856939e06f9ee54e480fb" try: kv = {'user-agent':'Mozilla/5.0'} r = requests.get(url, headers = kv) r.raise_for_status() r.encoding = r.apparent_encoding print(r.text) except: print("异常")
爬取图片:
总结:读二进制存起来
import requests import os url = "https://img.alicdn.com/imgextra/i4/43784971/O1CN01gqcYoz1malOLa63F4_!!0-saturn_solar.jpg_220x220.jpg_.webp" path ="C://Users//Administrator//Desktop//aaa//abc.jpg" kv = {'user-agent':'Mozilla/5.0'} r = requests.get(url, headers = kv) with open(path, 'wb') as f: f.write(r.content) f.close()