python爬取页面内容，并统计指定字段的数量

最新推荐文章于 2023-06-20 17:37:22 发布

VIP文章 python_Ssane

最新推荐文章于 2023-06-20 17:37:22 发布

阅读量5.7k

点赞数 1

分类专栏：测试文章标签： python 爬虫 BeautifulSoup split 数据统计

本文链接：https://blog.csdn.net/weixin_44530778/article/details/88038767

版权

整体思路：

1、获取想要爬取页面的url

2、应用requests BeautifulSoup库爬取到页面内容，找到所要统计字段的规律，以xml格式存储到本地文件中

3、读取存储的本地文件内容

4、通过split切分获取指定字段的数量

#!/usr/bin/env python

# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup

#文件存放路径
fileName = 'E:/003spider/testlist2.txt'

def get_html_content():
    open_url = 'https://new.ztestin.com/help/helpinfo/index?id=70'
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
    }
    res1 =requests.get(open_url, headers=headers)
    #生成一个beautifulSoup对象，后面查找网页内容时使用
    soup1 = BeautifulSoup(res1.text,'lxml')

    #获取指定标签对的内容，获取到的是一个数组
    xml1 = soup1.find_all('li')
    xml2 = soup1.find_all('p')
    #通过循环将数组中的内容打印出来，并存储到本地文件中，后面统计时会使用到

最低0.47元/天解锁文章

python_Ssane

关注

1
点赞
踩
15

收藏

觉得还不错? 一键收藏
0
评论
python爬取页面内容，并统计指定字段的数量

整体思路：1、获取想要爬取页面的url2、应用requestsBeautifulSoup库爬取到页面内容，找到所要统计字段的规律，以xml格式存储到本地文件中3、读取存储的本地文件内容4、通过split切分获取指定字段的数量#!/usr/bin/env python# -*- coding: utf-8 -*-import requestsfrom bs4 imp...
复制链接

扫一扫