[Python3网络爬虫] 3.0-数据存储

3.1 文件

3.1.1 TXT 文本

将文本存储到以 txt 为后缀的文件中

# 第一个参数为要读写的文件,可使用绝对地址和相对地址
# 第二个参数为读写的模式
# encoding 参数指定文件的编码
def write_to_txt(doc):
    with open('result.txt', 'a', encoding='utf-8') as file:
        file.write(doc)

3.1.2 JSON 文件

调用 json 库的loads()方法将 JSON 文本字符串转为 JSON 对象,通过dumps()方法将 JSON 对象转为文本字符串。 JSON 文件的存储实际上是将 JSON 对象转为 JSON 字符串然后存储到以 .json 为扩展名的文件中。

import json


def write_to_json(json_obj):
    data = json.dumps(json_obj)
    with open('result.json', 'w') as file:
        file.write(data)

3.1.3 CSV 文件

CSV : Commas - Separated Values. 是国际通用的一、二维数据的存储格式,每行一个一维数据,用逗号分隔,无空行,以 .csv 为扩展名。

使用 csv 模块来实现 CSV 文件的读写

import csv


3.2 数据库

3.2.1 MySQL

使用 pymysql 模块来操纵 MySQL 数据库,增删改查代码如下:

import pymysql


# 增加记录
def create(db, table, record):
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='123456',
                                 db=db)
    try:
        with connection.cursor() as cursor:
            table = table
            keys = ','.join(record.keys())
            values = ','.join(['%s'] * len(record))
            sql = 'INSERT INTO {table}({keys}) VALUES ({values})'.format(table=table, keys=keys, values=values)
            print(sql)
            cursor.execute(sql, tuple(record.values()))
        connection.commit()
    finally:
        connection.close()


# 查询记录
def retrieve(db, table):
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='123456',
                                 db=db)
    try:
        with connection.cursor() as cursor:
            table = table
            sql = "SELECT * FROM {table} WHERE name=%s".format(table=table)
            print(sql)
            cursor.execute(sql, ('Alex',))
            result = cursor.fetchone()
            return result
    finally:
        connection.close()


# 更新记录
def update(db, table):
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='123456',
                                 db=db)
    try:
        with connection.cursor() as cursor:
            table = table
            sql = 'UPDATE {table} SET age=%s WHERE name=%s'.format(table=table)
            cursor.execute(sql, (24, 'Alex'))
        connection.commit()
    finally:
        connection.close()


# 删除记录
def delete(db, table):
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='123456',
                                 db=db)
    try:
        with connection.cursor() as cursor:
            table = table
            sql = 'DELETE FROM {table} WHERE age=%s'.format(table=table)
            cursor.execute(sql, (25,))
        connection.commit()
    finally:
        connection.close()

3.2.2 MongoDB

使用 pymongo 模块来操纵 MongoDB 数据库,代码如下:

import pymongo


3.2.3 Redis

使用 redis-py 模块来操纵 Redis 数据库,redis-py 3.0 已经取消了对 ”Redis“ 类的支持,”StrictRedis“ 已重命名为 ”Redis“,并提供了一个名为 ”StrictRedis“ 的别名。”Redis“ 类实现了绝大多数的官方命令,并遵循官方命令语法,有几个例外:

  • SELECT: Not implemented. See the explanation in the Thread Safety section below.
  • DEL: ‘del’ 是Python的一个保留关键字。因此 redis-py 使用 ‘delete’ 来替代。
  • MULTI/EXEC: These are implemented as part of the Pipeline class. The pipeline is wrapped with the MULTI and EXEC statements by default when it is executed, which can be disabled by specifying transaction=False. See more about Pipelines below.
  • SUBSCRIBE/LISTEN: Similar to pipelines, PubSub is implemented as a separate class as it places the underlying connection in a state where it can’t execute non-pubsub commands. Calling the pubsub method from the Redis client will return a PubSub instance where you can subscribe to channels and listen for messages. You can only call PUBLISH from the Redis client (see this comment on issue #151 for details).
  • SCAN/SSCAN/HSCAN/ZSCAN: The *SCAN commands are implemented as they exist in the Redis documentation. In addition, each command has an equivalent iterator method. These are purely for convenience so the user doesn’t have to keep track of the cursor while iterating. Use the scan_iter/sscan_iter/hscan_iter/zscan_iter methods for this behavior.

以下为部分代码,更多 Redis 命令请看官方文档:

import redis


# 连接数据库
r = redis.Redis(host='localhost', port=6379, db=0)
# 或使用连接池连接
pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
r = redis.Redis(connection_pool=pool)

# set 命令
r.set('name', 'Alex')
# get 命令
print(r.get('name'))
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值