用BeautifulSoup爬取微博热搜，并写入csv文件

武师叔

于 2022-04-01 18:40:50 发布

阅读量1k

点赞数 3

分类专栏： # 爬虫文章标签：爬虫网络爬虫

本文链接：https://blog.csdn.net/wushibo123/article/details/123904922

版权

Python爬虫 BeautifulSoup HTML解析 CSV导出数据抓取

关键词由CSDN通过智能技术生成

爬虫专栏收录该内容

2 篇文章 0 订阅

订阅专栏

需要提前安装requests和bs4库：

#爬虫两种获取HTML文本信息的方法，来自bs4的BeautifulSoup和lxml的etree
#本文介绍第一种写法来自bs4的BautifulSoup
import requests
from bs4 import BeautifulSoup
#爬取网站地址
url="https://tophub.today/n/KqndgxeLl9"
#伪装浏览器浏览信息，获取user-Agent(在chrome浏览器输入   chrome://version   )
header={'user-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36"}
#抓取网站信息
response = requests.get(url,headers=header)
res=response.text#转化为spring类型
soup=BeautifulSoup(res,features="lxml")
heat=soup.select('tr td')

for h in heat:
    print(h.get_text())
s='num,title,heat\n'

for i in range(len(heat)):
    if (i+1)%4!=0:
        s+=heat[i].get_text()+","
    else:
        s+="\n"

with open('1.csv','w',newline='',encoding='utf8') as fw:
    fw.write(s)

得出的结果：

武师叔

关注

3
点赞
踩
12

收藏

觉得还不错? 一键收藏
打赏
1
评论
用BeautifulSoup爬取微博热搜，并写入csv文件

需要提前安装requests和bs4库：#爬虫两种获取HTML文本信息的方法，来自bs4的BeautifulSoup和lxml的etree#本文介绍第一种写法来自bs4的BautifulSoupimport requestsfrom bs4 import BeautifulSoup#爬取网站地址url="https://tophub.today/n/KqndgxeLl9"#伪装浏览器浏览信息，获取user-Agent(在chrome浏览器输入 chrome://version ).
复制链接

扫一扫