试图扒取网站信息:
import requests
import json
if __name__ == '__main__':
# 设置url
url = 'http://scxk.nmpa.gov.cn:81/xk/itownet/portalAction.do?hKHnQfLv=5bUBHzTZfdHBscc3NttBv1xFCxaTK055mUzabozd0yC.oGYS8nrQauzGim2nByx8aafZjqZNSXfAF4D.ZxmWaQTkkXBpoIOf6vUzNk3fAQ3jHQ78zgJI50BpqofukwWd1qbHnF8khLQpGVldizDx4PbT7jnx_9R5pdCNdKufcGAbCA7BakpubjDBtfq.f8bNbszGcV7IkybGJABjF5KA8NJ.AqiBfgDlpYj2RXo0q9CFfFyhjanK8xeI4xllT3bHyXhA8WXYIluMvi6kqb48MpviT6fJU_ONrSVzxyvcFaSW&8X7Yi61c=4AeLHBygDjuo2HJRa61X61T06Xi6qFy5POVhNoqFkOxAfCvK89fN6niA69jRlFtQIpse010dgDBg_FTQD6U41CAjQGaAceO0ubzycTxvs1qEsXu21jZ1LIv7FSNHrcmMd'
# 设置UA伪装
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.54'
}
# 设置参数
data = {
'on':'true',
'page':'1',
'pageSize':'15',
'xkType':'2',
'productName':'',
'conditionType':'1',
'applyname':'',
'applysn':''
}
# 发送请求
response = requests.post(url=url,data=data,headers=headers)
json_ids = response.json()
id_list = [] # 存储企业ID
for dic in json_ids['list']:
id_list.append(dic['ID'])
print(id_list)
结果报错:
报错后检查无果,再次尝试,发现网站url发生了变化:
原因:该网站设置了反扒机制——加入了一个时间参数(“已在调试程序中暂停”可以说明这一点)
解决方案不明。