python_爬虫进度_练习1

最新推荐文章于 2024-05-19 23:31:32 发布

万恶的罪孽，深渊的凝视

最新推荐文章于 2024-05-19 23:31:32 发布

阅读量367

点赞数

文章标签： python 爬虫

本文链接：https://blog.csdn.net/qaq10086pap/article/details/130173117

版权

爬取美化网站中鼠标指针美化里面的所有鼠标美化的名字：

from bs4 import BeautifulSoup
import requests

for i in range(16):
    # shubiao_yuandaima = requests.get(f"https://zhutix.com/tag/cursors/page/{i + 1}")
    # print(f"查看网页是否允许访问: {shubiao_yuandaima.status_code}")  # 查看能否访问,返回200,表示能
    shubiao_yuandaima = requests.get(f"https://zhutix.com/tag/cursors/page/{i + 1}").text
    # print(shubiao_yuandaima)  # 查看源代码

    jiexi = BeautifulSoup(shubiao_yuandaima, "html.parser")  # 解析源代码,获得实例

    # 使用findAll()方法,获取指定的元素,并赋值
    need_elements = jiexi.findAll("div", attrs={"class": "post-info"})
    # print(need_elements)  # 查看找到的指定内容,由[]将查找到的指定内容包围
    print(f"第{i + 1}页: ")  # 输出多少页
    b = 1  # 给每页鼠标加个序号
    for need_element in need_elements:  # 这个need_elements 是个列表,里面是div标签中的内容
        # print(need_element)  # 查看need_elements中的每项元素
        need_el = need_element.findAll("h2")  # 将列表里面的元素取出来,并再次使用findAll()方法继续提取所需内容,
        # print(need_el)  # 查看列表中提取出来的元素
        for need_el_1 in need_el:
            print(f"{b}.{need_el_1.string}")
            b += 1  # 递增
            # 总结:  先使用requests.get()方法,找到网页HTML的text文件.
            #       然后使用BeautifulSoup()函数,括号中第一个参数指定HTML的text文件,第二个参数指定解析器(因为,这个BeautifulSoup库有很多针对网页的内容
            #       然后将BeautifulSoup返回的实例,使用findAll()方法,找到指定的内容. 括号中第一个参数填网页标签,第二个参数填对应网页class类的字典
    print()