1.分析网页结构
网页链接如下:https://v6.bang.weibo.com/czv/domainlist?date=202103&period_type=month
可使用选择器提取button标签中share-data中信息
r = requests.get("https://v6.bang.weibo.com/czv/domainlist?date=%s&period_type=month" % date)
soup = BeautifulSoup(r.text)
items = soup.select("button.top-follow-btn.following-btn")
try:
item.attrs["data-type"]
except:
dic = {}
data_json = json.loads(item.attrs["share-data"])
dic["rank"] = data_json["rank"]
dic["uid"] = data_json["uid"]
dic["screen_name"] = data_json["screen_name"]
data_top_100.append(dic)
从20之后数据通过ajax的post请求获取,表单信息如下
url = "https://v6.bang.weibo.com/aj/newczv/rank"
for j in range(2, 6):
data = {}
data['page'] = str(j)
data['show_rank'] = str(j * 20 - 20)
data['period_type'] = 'month'
data['field_id'] = '1001'
data['dt'] = '202