Python数据存储

最新推荐文章于 2024-07-18 19:21:27 发布

八块腹肌怎么练

最新推荐文章于 2024-07-18 19:21:27 发布

阅读量338

点赞数 1

分类专栏： python 文章标签： pickle python open() json torch.save() csv

本文链接：https://blog.csdn.net/qyhaill/article/details/103106370

版权

python 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

把python常用的数据存储的相关模块总结一下。

1 相关内置函数

python有内置函数open()及一些相关的文件操作函数用来存取硬盘文件。

1.1 `open()`函数

**语法：**
```python
open(name, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
```
**参数说明：**
- name（str）：包含了你要访问的文件路径和名称的字符串；
- mode（str）：mode决定了打开文件的模式：只读，写入，追加等，所有可取值见下表，默认只读（'r'）；
- 其他参数不常用，不再说明。

**`open()`函数返回的是一个文件对象。**

mode	描述
r	以只读方式打开文件。文件的指针将会放在文件的开头。这是默认模式。
rb	以二进制格式打开一个文件用于只读。文件指针将会放在文件的开头。这是默认模式。
r+	打开一个文件用于读写。文件指针将会放在文件的开头。
rb+	以二进制格式打开一个文件用于读写。文件指针将会放在文件的开头。
w	打开一个文件只用于写入。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。
wb	以二进制格式打开一个文件只用于写入。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。
w+	打开一个文件用于读写。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。
wb+	以二进制格式打开一个文件用于读写。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。
a	打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。也就是说，新的内容将会被写入到已有内容之后。如果该文件不存在，创建新文件进行写入。
ab	以二进制格式打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。也就是说，新的内容将会被写入到已有内容之后。如果该文件不存在，创建新文件进行写入。
a+	打开一个文件用于读写。如果该文件已存在，文件指针将会放在文件的结尾。文件打开时会是追加模式。如果该文件不存在，创建新文件用于读写。
ab+	以二进制格式打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。如果该文件不存在，创建新文件用于读写。

1.2 file对象

file对象使用open()函数来创建，下表列出了file对象的常用方法：

方法	描述
`file.close()`	关闭文件。关闭后文件不能再进行读写操作。
`file.read([size])`	从文件中读取size个字节数。如果不给定size或size为负则读取所有。
`file.readline([size])`	不给定size，从文件中读取一行。如果给定了size且大于等于0，读取size个字节数。如果给定了size且size为负，读取一行。
`file.readlines([size])`	从文件中读取size行并返回列表。如果不给定size或size为负则读取所有行。
`file.write(str)`	将字符串写入文件，返回值为写入字符串的长度。
`file.writelines(sequence)`	向文件写入一个序列字符串列表，如果需要换行则要自己加入每行的换行符。
`file.tell()`	返回文件指针的当前位置。

2 json模块

2.1 概念

首先对json（JavaScript Object Notation）做一点介绍：

json是一种通用的数据类型，大多数语言都认识，python的其他数据类型（对象）可按下表和json数据类型相互转化（其他语言类似）；
json数据类型的变量可存储在JSON格式的文件中，从而可以轻易实现不同的编程语言分享数据的功能；
json数据类型实际表现为字符串。

python 原始类型向 json 类型的转化对照表：

python类型	json类型
dict	object
list, tuple	array
int, long, float	number
str, unicode	string
True	true
False	false
None	null

json类型向python原始类型的转化对照表：

json类型	python类型
object	dict
array	list
string	unicode
number(int)	int, long
number(real)	float
true	True
false	False
null	None

在python中使用json类型需要导入json库：import json。

2.2 `dumps()`和`loads()`函数

json.dumps()实现将python对象编码为json类型；json.loads()将json类型解码为python对象。
相对于下文将介绍的json.dump()和json.load()来说，这里函数名中的s可以理解为string。

下面是一个python字典和json类型相互转化的实例，其他类型类似：

>>> import json
>>> stus = {'xiaojun': '1234', 'xiaohei': '2345', 'xiaoming': '3456'}
>>> stus
{'xiaojun': '1234', 'xiaohei': '2345', 'xiaoming': '3456'}
>>> stus_json = json.dumps(stus)
>>> stus_json
'{"xiaojun": "1234", "xiaohei": "2345", "xiaoming": "3456"}'
>>> type(stus_json) # 可以看到json类型实际表现为字符串
<class 'str'>
>>> stus_reverse = json.loads(stus_json)
>>> stus_reverse
{'xiaojun': '1234', 'xiaohei': '2345', 'xiaoming': '3456'}
>>> stus_reverse == stus # 注意如果stus是tuple，那从json转换回来和原始值不等，两者类型不同，参加2.1的两个表
True
>>> type(stus_reverse)
<class 'dict'>

2.3 `dump()`和`load()`函数

json.dump()实现将python对象编码为json类型并写入文件；json.load()从文件读出json类型并将它解码为python对象。

下面是一个示例，number_writer.py将数据写入JSON文件，number_reader.py再读出来：

# number_writer.py文件
import json

numbers = [2, 3, 5, 7, 11, 13]

with open('numbers.json', 'w') as f_obj:
	json.dump(numbers, f_obj) # 将numbers列表编码为json类型并写入f_obj

执行上面的程序后会产生一个numbers.json文件，打开可以看到文件内容为：

[2, 3, 5, 7, 11, 13]

下面编写python程序读取这个文件：

# number_reader.py文件
import json

with open('numbers.json') as f_obj:
	numbers = json.load(f_obj)

print('解码的json文件内容：', numbers)

输出为：

解码的json文件内容： [2, 3, 5, 7, 11, 13]

注意：
对一个文件dump两次，写入的内容会在同一行，load时会报错，例如：
我们将number_writter.py改写如下：

# number_writter.py文件
import json

numbers = [2, 3, 5, 7, 11, 13]

stus = {"xiaoming": 1234, "xiaohei": 2345, "xiaolong": 3456}

with open("numbers.json", 'w') as f_obj:
    json.dump(numbers, f_obj)
    json.dump(stus, f_obj)

执行后生成numbers.json文件，内容为：

[2, 3, 5, 7, 11, 13]{"xiaoming": 1234, "xiaohei": 2345, "xiaolong": 3456}

可以看到两次dump的内容在同一行。

再将number_reader.py改写如下：

# number_reader.py文件
import json

with open('numbers.json') as f_obj:
    numbers = json.load(f_obj)
    stus = json.load(f_obj)

print("load的第一个对象：", numbers)
print("load的第二个对象：", stus)

执行后输出为：

Traceback (most recent call last):
  File "number_reader.py", line 5, in <module>
    numbers = json.load(f_obj)
  File "/usr/lib/python3.6/json/__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 21 (char 20)

shell returned 1

可以看到报错了。

2.4 json模块的其它函数

json模块所有的函数如下：

>>> dir(json)
['JSONDecodeError', 'JSONDecoder', 'JSONEncoder', '__all__', '__author__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_default_decoder', '_default_encoder', 'codecs', 'decoder', 'detect_encoding', 'dump', 'dumps', 'encoder', 'load', 'loads', 'scanner']

除了上面介绍过的，其他函数不太常用，不再一一介绍。

3 pickle模块

pickle模块和json模块非常相似，也有dumps, loads, dump, load四个函数，且功能相似。

两者不同的是：

pickle模块必须以二进制读写模式打开文件，即'wb'或'rb'；
pickle模块可以对同一个文件多次dump后再多次load。

下面只介绍dump和load，不再介绍dumps和loads。
改写number_writter.py为：

# number_writter.py文件
import pickle

numbers = [2, 3, 5, 7, 11, 13]

stus = {"xiaoming": 1234, "xiaohei": 2345, "xiaolong": 3456}

with open("numbers.pkl", 'wb') as f_obj:
	'''必须以二进制模式打开文件'''
    pickle.dump(numbers, f_obj)
    pickle.dump(stus, f_obj)

执行后产生一个numbers.pkl文件，打开是我们看到的是乱码，因为是以二进制模式写入的。

改写number_reader.py为：

# number_reader.py文件
import pickle

with open('numbers.pkl', 'rb') as f_obj:
    '''必须以二进制模式打开'''
    numbers = pickle.load(f_obj)
    stus = pickle.load(f_obj)

print("load的第一个对象：", numbers)
print("load的第二个对象：", stus)

执行后输出为：

load的第一个对象： [2, 3, 5, 7, 11, 13]
load的第二个对象： {'xiaoming': 1234, 'xiaohei': 2345, 'xiaolong': 3456}

4 CSV文件

要在文本文件中存储数据，最简单的方法是将数据作为一系列**以逗号分隔的值（CSV）**写入文件，这样的文件称为CSV文件。

4.1 写CSV文件

使python写入CSV文件需要导入csv模块：import csv。有两种方法写入：csv.writer和csv.DictWriter。

csv.writer写入CSV文件：首先传入一个文件对象创建csv.writer对象，然后调用csv.writer对象的writerow()方法（写入一行）或writerows()方法（写入多行）向这个文件对象写入数据。示例如下：

# csv_writer.py
import csv

headers = ['class', 'name', 'sex', 'height', 'year'] #  表头

rows = [
        [1, 'xiaoming', 'male', 168, 23],
        [1, 'xiaohong', 'female', 170, 18],
        [3, 'longlong', 'male', 172, 21],
        [4, 'lingling', 'female', 172, 22]
        ]

with open('test.csv', 'w') as f:
    f_csv = csv.writer(f) #  创建csv.writer对象
    f_csv.writerow(headers) #  写入一行（表头）
    f_csv.writerows(rows) #  写入多行

执行后新创建的test.csv用excel打开如下：
在这里插入图片描述

csv.DictWriter写入CSV文件：csv.DictWriter对象是为写入字典序列类型的数据准备的。

创建csv.DictWriter对象需要两个参数：文件对象–f_obj和字段名称–fieldnames；
写入表头：调用csv.DictWriter对象的writeheader()方法；
写入数据：调用csv.DictWriter对象的writerow()（写入一行）或writerows()（写入多行）。

示例如下：

# csv_DictWriter.py
import csv

headers = ['class', 'name', 'sex', 'height', 'year'] #  表头

rows = [
        {'class': 1, 'name': 'xiaoming', 'sex': 'male', 'height': 168, 'year': 23},
        {'class': 1, 'name': 'xiaohong', 'sex': 'female', 'height': 170, 'year': 18},
        {'class': 3, 'name': 'longlong', 'sex': 'male', 'height': 172, 'year': 21},
        {'class': 4, 'name': 'lingling', 'sex': 'female', 'height': 172, 'year': 22}
        ]

with open('test.csv', 'w') as f:
    f_csv = csv.DictWriter(f, headers) #  创建csv.DictWriter对象
    f_csv.writeheader() #  写入表头
    f_csv.writerows(rows) #  写入多行
    # 写入一行的情况不再举例

执行后新创建的test.csv用excel打开如下（和上一个例子的内容完全相同）：
在这里插入图片描述

4.2 读CSV文件

读CSV文件有多种方法，这里讲两种，分别是csv模块和pandas模块提供的读CSV文件的方法。

4.2.1 csv模块读CSV文件

csv模块提供了csv.reader和csv.DictReader来读取CSV文件的内容。

csv.reader对象读CSV文件：首先传入一个csv文件对象创建csv.reader对象，它是一个iterator，可以使用for循环和next()方法遍历。示例如下：

# csv_reader.py
import csv

with open('test.csv') as f:
    f_csv = csv.reader(f)
    print('The header of the table:', next(f_csv)) #  用next()函数读csv.reader对象
    for row in f_csv: #  使用for循环遍历csv.reader对象
        print('The name of the student: %s' % row[1])
        print('The info of the student:', row)

输出结果为：

The header of the table: ['class', 'name', 'sex', 'height', 'year']
The name of the student: xiaoming
The info of the student: ['1', 'xiaoming', 'male', '168', '23']
The name of the student: xiaohong
The info of the student: ['1', 'xiaohong', 'female', '170', '18']
The name of the student: longlong
The info of the student: ['3', 'longlong', 'male', '172', '21']
The name of the student: lingling
The info of the student: ['4', 'lingling', 'female', '172', '22']

csv.DictReader对象以字典序列的方式读CSV文件：

首先传入一个csv文件对象创建csv.DictReader对象，它是一个iterator，可以使用for循环和next()方法遍历；
csv.DictReader对象会自动将CSV文件的第一行当做表头，即获取的字典的key值；
csv.DictReader对象将第二行至最后一行的内容当做字典的value值。

示例如下：

# csv_reader.py
import csv

with open('test.csv') as f:
    f_csv = csv.DictReader(f)
    print('The header of the table:', f_csv.fieldnames) #  fieldnames属性是表头
    for row in f_csv: #  使用for循环遍历csv.reader对象
        print('The name of the student: %s' % row['name'])
        print('The info of the student:', row)

输出为：

The header of the table: ['class', 'name', 'sex', 'height', 'year']
The name of the student: xiaoming
The info of the student: OrderedDict([('class', '1'), ('name', 'xiaoming'), ('sex', 'male'), ('height', '168'), ('year', '23')])
The name of the student: xiaohong
The info of the student: OrderedDict([('class', '1'), ('name', 'xiaohong'), ('sex', 'female'), ('height', '170'), ('year', '18')])
The name of the student: longlong
The info of the student: OrderedDict([('class', '3'), ('name', 'longlong'), ('sex', 'male'), ('height', '172'), ('year', '21')])
The name of the student: lingling
The info of the student: OrderedDict([('class', '4'), ('name', 'lingling'), ('sex', 'female'), ('height', '172'), ('year', '22')])

4.2.1 pandas模块读CSV文件

pandas模块提供了更方便的方法pandas.read_csv()读取CSV文件。
示例如下：

>>> import pandas as pd
>>> data = pd.read_csv("test.csv")          # 读取csv文件
>>> print(data.head(2))                     # 打印前两行
   class      name     sex  height  year
0      1  xiaoming    male     168    23
1      1  xiaohong  female     170    18
>>> print(data.columns)                     # 表头
Index(['class', 'name', 'sex', 'height', 'year'], dtype='object')
>>> print(data.shape)                       # 表格形状
(4, 5)
>>> print(data.loc[1:3])                    # 特定行
   class      name     sex  height  year
1      1  xiaohong  female     170    18
2      3  longlong    male     172    21
3      4  lingling  female     172    22
>>> print(data[1:3])                        # 特定行
   class      name     sex  height  year
1      1  xiaohong  female     170    18
2      3  longlong    male     172    21
>>> print(data['name'])                     # 特定列
0    xiaoming
1    xiaohong
2    longlong
3    lingling
Name: name, dtype: object
>>> print(data['name'][1:3])                # 特定列的特定行
1    xiaohong
2    longlong
Name: name, dtype: object
>>> print(data.loc[1:2, ['name', 'sex']])   # 特定行的特定列
       name     sex
1  xiaohong  female
2  longlong    male