numpy使用以下函数读写文件。(一般情况下,我们更多的是使用pandas库来读取文件数据)
文件读取
以下函数读取的数据文件下载自kaggle.com/datasets。
① genfromtxt函数。能够处理更复杂的情况,例如填充值等。
genfromtxt(fname, dtype=float, comments=’#’, delimiter=None,
skip_header=0, skip_footer=0, converters=None,
missing_values=None, filling_values=None, usecols=None,
names=None, excludelist=None,
deletechars=’’.join(sorted(NameValidator.defaultdeletechars)),
replace_space=’_’, autostrip=False, case_sensitive=True,
defaultfmt=“f%i”, unpack=None, usemask=False, loose=True,
invalid_raise=True, max_rows=None, encoding=‘bytes’)
delimiter:默认为连续的空白字符。
data = np.genfromtxt("data/cholera_data.csv", delimiter=',',dtype=str)
print("data.shape:", data.shape, sep="\n")
print("前5行数据:", data[:5], sep="\n")
# data.shape:
# (2493, 6)
# 前5行数据:
# [['Country' 'Year' 'Number of reported cases of cholera'
# 'Number of reported deaths from cholera' 'Cholera case fatality rate'
# 'WHO Region']
# ['Afghanistan' '2016' '677' '5' '0.7' 'Eastern Mediterranean']
# ['Afghanistan' '2015' '58064' '8' '0.01' 'Eastern Mediterranean']
# ['Afghanistan' '2014' '45481' '4' '0.0' 'Eastern Mediterranean']
# ['Afghanistan' '2013' '3957' '14' '0.35' 'Eastern Mediterranean']]
② loadtxt函数
loadtxt(fname, dtype=float, comments=’#’, delimiter=None,
converters=None, skiprows=0, usecols=None, unpack=False,
ndmin=0, encoding=‘bytes’, max_rows=None)
print("-" * 32, "loadtxt", "-" * 32)
data = np.loadtxt("data/cholera_data.csv", delimiter=',',dtype=str)
print("data.shape:", data.shape, sep="\n")
print("前5行数据:", data[:5], sep="\n")
# data.shape:
# (2493, 6)
# 前5行数据:
# [['Country' 'Year' 'Number of reported cases of cholera'
# 'Number of reported deaths from cholera' 'Cholera case fatality rate'
# 'WHO Region']
# ['Afghanistan' '2016' '677' '5' '0.7' 'Eastern Mediterranean']
# ['Afghanistan' '2015' '58064' '8' '0.01' 'Eastern Mediterranean']
# ['Afghanistan' '2014' '45481' '4' '0.0' 'Eastern Mediterranean']
# ['Afghanistan' '2013' '3957' '14' '0.35' 'Eastern Mediterranean']]
文件存储
文件一般保存成txt或csv格式。使用savetxt函数来将数据存储到文件中。
savetxt(fname, X, fmt=’%.18e’, delimiter=’ ‘, newline=’\n’, header=’’,
footer=’’, comments=’# ', encoding=None)
fname:文件路径
X:存放数据的数组
fmt:格式化符
delimiter:分隔符
data = np.arange(15).reshape(3, 5)
np.savetxt("data/savetxt.csv", data, fmt='%s', delimiter=',')
# 文件数据
# 0,1,2,3,4
# 5,6,7,8,9
# 10,11,12,13,14