需要三个模块:request,pandas,re
一。请求网页地址:1.东方财富网网页,F12或F12+Fn打开开发者模式,点击network,刷新网页
2.搜索栏搜索关键字,点击数据,网址就在header里
二。伪装成浏览器
第一步的页面划到底,user-agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0
三。请求数据:
import requests
import pandas
import re
for page in range(1,282):
#get data from network
url = f'https://28.push2.eastmoney.com/api/qt/clist/get?cb=jQuery112401625394594154861_1723723110567&pn=1&pz=20&po=1&np={1,281}&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&dect=1&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1723723110568'
wz = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 Edg/127.0.0.0'}
res = requests.get(url,wz)
#print(res)
#print(res.text)
##selecting data
codelist = re.findall('"f12":(.*?),"f13"',res.text)
namelist = re.findall('"f14":(.*?),"f15"',res.text)
pricelist = re.findall('"f2":(.*?),"f3"',res.text)
#print(namelist)
###组合数据
for i in range(0,len(codelist)):
newlist=[codelist[i],namelist[i],pricelist[i]]
print(newlist)