不是给肯德基打广告,就是凑巧想爬取这个QAQ
目标网址:http://www.kfc.com.cn/kfccda/storelist/index.aspx
通过观察文本框中输入文字后点击查询,并没有改变url说明这是一个阿贾克斯请求(AJAX)。
需要的打开抓取工具,并选到XHR,再次点击进行抓包,通过查看抓到的Headers能知道URL是:http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword;是一个POST请求,有五个参数,返回数据类型是text,知道这些之后不难写出如下代码。
"""
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
import json
if __name__ == "__main__":
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
city = input('输入城市:')
data = {
'cname': '',
'pid': '',
'keyword': city,
'pageIndex': '1',
'pageSize': '40'
}
# 发送请求
# get就是get ,post就是post,并且要注意,参数有变化的!
response = requests.post(url=url,data=data,headers=headers)
# 5.获取响应数据:json()方法返回的是obj(如果确认响应数据是json类型的,才可以使用json())
dic_obj = response.json()
# 持久化存储
fileName = city+'.json'
fp = open(fileName, 'w', encoding='utf-8')
json.dump(dic_obj,fp=fp,ensure_ascii=False)
"""
"""
import requests
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'}
city = input('输入城市:')
data = {
'cname': '',
'pid': '',
'keyword': city,
'pageIndex': '1',
'pageSize': '40'
}
response = requests.post(url, data=data, headers=headers)
print(type(response))
response = response.json()
print(type(response))
for i in response['Table1']:
store = i['storeName']
address = i['addressDetail']
print('store:' + store, 'address:' + address + '\n')
"""
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import requests
import json
if __name__ == "__main__":
url = 'http://www.kfc.com.cn/kfccda/ashx/GetStoreList.ashx?op=keyword'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}
city = input('输入城市:')
data = {
'cname': '',
'pid': '',
'keyword': city,
'pageIndex': '1',
'pageSize': '40'
}
# 发送请求
# get就是get ,post就是post,并且要注意,参数有变化的!
response = requests.post(url=url,data=data,headers=headers)
# 获取响应数据-这取决于content-type中的数据类型,text就用这个,json就用.josn
page_text = response.text
# 持久化储存
fileName = city+'.html'
with open(fileName,'w',encoding='utf-8') as fp:
fp.write(page_text)
print(fileName,'保存成功!!!')
注意,虽然如果把返回的数据类型写成json并不影响最后结果,但在这里仍建议您仔细查看Response Headers中的Content-Type
另,三段代码都能跑。
以及,这里有更加完整的笔记和文件,欢迎访问,仅供学习交流
https://github.com/jiayoudangdang/python_note_chapter_two_requests_module