最近搞了搞百度的图片找了找大甜甜的图片,写了个小爬虫发现哈,这百度图片挺坏的,在第一页,你是找不到什么包含图片内容的加载项的,我就继续往下翻了几页,还是在异布加载shr类型的数据中找到了josn类型的数据,图片的地址就在这个里面放着然后用爬虫搞出来就好了,queryWord是搜索关键字,rn是分页的规格,pn是页数就比如:
第一页是pn=0,第二页是pn=30,第三页是pn=60…就这样我只爬取的是一页的,想多爬几页可以加一个循环【for i in rang(起始页,终止页):】来进行多页爬图,下面是代码
###############景甜图片###########景甜图片###############景甜图片#############景甜图片########################################
# base_url = 'https://image.baidu.com/search/acjson?'
# more = {'tn': 'resultjson_com',
# 'ipn': 'rj',
# 'ct': '201326592',
# 'queryWord':'景甜',
# 'cl': '2',
# 'lm': '-1',
# 'ie': 'utf-8',
# 'oe': 'utf-8',
# 'st': '-1',
# 'word':'景甜',
# 'face': '',
# 'istype': '2',
# 'nc': '1',
# 'pn': 90,
# 'rn': '30',
# 'gsm': '1e',
# }
# url = base_url+urlencode(more)
# # first_url = 'http://image.baidu.com/search/index?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&word=%E6%99%AF%E7%94%9C'
# print(url)
# response = requests.get(url, headers=headers2)
# response.encoding = response.apparent_encoding
# html = response.text.replace('\/', '/')
# print(html)
# result = re.findall('thumbURL":"(.*?)"', html)
# print(result)
# ab = 0
# for img in result:
# print(img)
# if img != ' ':
# image = requests.get(img)
# print(image)
# ab += 1
# time.sleep(random.randint(1, 6))
# with open('D:\py\spider\py\sssp\%s' % ab+'a.JPG', 'ab') as f:
# f.write(image.content)
# else:
# pass