python 爬虫入门(二) 爬取简单网页并保存到本地

最新推荐文章于 2024-05-03 14:23:24 发布

一心要爆肝的浩浩

最新推荐文章于 2024-05-03 14:23:24 发布

阅读量6.9k

点赞数 1

文章标签： python爬虫入门用类方法打包函数爬取简单网站

本文链接：https://blog.csdn.net/weixin_42336559/article/details/80777760

版权

本文介绍了Python爬虫的基本步骤，包括向网页发送请求获取源代码，利用正则表达式匹配数据，以及如何将数据保存。通过DataParserTool类的classmethod解析数据。

摘要由CSDN通过智能技术生成

import re

from urllib.request import Request, urlopen

#爬虫基本的三个步骤:1.向页面发送请求, 获取源代码(都是静态页面的代码);2, 利用正则匹配数据;3 .保存到数据库

class DataParserTool(object):

#类中方法cls

@classmethod

def parser_data(cls, data):

data_list = []
        for title, info, name, time, read_num, comment_num in data:
            title = title.strip() # 去除两端空格strip()切片是非常常用的处理数据的方法

            res = info.replace('&amp;#13;', '')  #replace把第一个参数替换成第二个参数
            res1 = res.replace('\n', '')
            info = res1.strip()

            name = name.strip()
            time = time.strip()

            data_list.append((title, info, name, time, read_num, comment_num))  #append添加注意添加的内容是元组!!
        return

最低0.47元/天解锁文章

一心要爆肝的浩浩

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
0
评论
python 爬虫入门(二) 爬取简单网页并保存到本地

import refrom urllib.request import Request, urlopen#爬虫基本的三个步骤:1.向页面发送请求, 获取源代码(都是静态页面的代码);2, 利用正则匹配数据;3 .保存到数据库class DataParserTool(object):#类中方法cls @classmethod def parser_data(cls, data):dat...
复制链接

扫一扫

python 爬虫入门(二) 爬取简单网页并保存到本地

“相关推荐”对你有帮助么？