笔记002：BeautifulSoup中超实用的find_all()

wendao_lx

已于 2022-10-03 16:36:23 修改

阅读量293

点赞数

文章标签：爬虫 beautifulsoup

于 2022-10-03 09:34:24 首次发布

本文链接：https://blog.csdn.net/wendao_lx/article/details/127149250

版权

初使用BeautifulSoup总是觉得哪里不顺手，网页要不下载不全，要不垃圾数据太多不好清理

headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0'}
    htmlmoban = "https://xxxxxxx/whole.html"
    requests_html = requests.get(htmlmoban, headers=headers)
    requests_html.encoding = 'gbk'
    #print(requests_html)
    soup = BeautifulSoup(requests_html.text, "lxml")
    html_list = soup.find_all("div", {"class": {"novellist"}}) #选择class="novellist"所有的
    #print(html_list)
    html_list1 = str(html_list)
    #print(html_list1)
    with open('file/01.txt', 'w', encoding='utf-8') as fw1:
        fw1.write(str(html_list))
        fw1.close()

查了资料后，发现find_all()里面有个attributes属性，可以定义class等

关键在：soup.find_all("div", {"class": {"novellist"}})

使用后，只查找<div class="novellist"></div>的所有数据。

初学者，摸索中……

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

wendao_lx

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
笔记002：BeautifulSoup中超实用的find_all()

html_list = soup.find_all("div", {"class": {"novellist"}}) #选择class="novellist"所有的。这些不是我想要的数据，查了资料后，发现find_all()里面有个attributes属性，可以定义class等。初使用BeautifulSoup总是觉得哪里不顺手，网页要不下载不全，要不垃圾数据太多不好清理，直接上图。
复制链接

扫一扫