需要提前安装requests和bs4库:
#爬虫两种获取HTML文本信息的方法,来自bs4的BeautifulSoup和lxml的etree
#本文介绍第一种写法来自bs4的BautifulSoup
import requests
from bs4 import BeautifulSoup
#爬取网站地址
url="https://tophub.today/n/KqndgxeLl9"
#伪装浏览器浏览信息,获取user-Agent(在chrome浏览器输入 chrome://version )
header={'user-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"}
#抓取网站信息
response = requests.get(url,headers=header)
res=response.text#转化为spring类型
soup=BeautifulSoup(res,features="lxml")
heat=soup.select('tr td')
for h in heat:
print(h.get_text())
s='num,title,heat\n'
for i in range(len(heat)):
if (i+1)%4!=0:
s+=heat[i].get_text()+","
else:
s+="\n"
with open('1.csv','w',newline='',encoding='utf8') as fw:
fw.write(s)
得出的结果: