python3.6怎么保存文件_python3.6 数据分析-数据加载、存储与文件格式

1. 数据加载与存储

1.1. np.save,np.load

In [78]: a = np.arange(10)

In [79]: a

Out[79]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [80]: np.save('some_array',a)

In [83]: np.load('some_array.npy')

Out[83]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

1.2. 常规用 pd.read_ 和data.to_走遍天下,新版pandas几乎什么格式都能读了。

2. CSV 和 txt 格式

读取.csv格式的文件,直接read_csv不需要加分隔号;用read_table需要制定分隔号

关于用CLI读数据,linux人尽皆知用cat,但是windows用的是type,而且斜杠方向与linux相反

csv很方便,直接read,然后选择参数,例如header,index_col

a) 例子1,csv可以用read_csv或read_table读取

# windows system

# ex1, csv and text values

In [3]: !type ch06\ex1.csv

a,b,c,d,message

1,2,3,4,hello

5,6,7,8,world

9,10,11,12,foo

In [10]: df = pd.read_csv('ch06/ex1.csv')

In [11]: df

Out[11]:

a b c d message

0 1 2 3 4 hello

1 5 6 7 8 world

2 9 10 11 12 foo

In [12]: df1 = pd.read_table('ch06/ex1.csv')

In [13]: df1

Out[13]:

a,b,c,d,message

0 1,2,3,4,hello

1 5,6,7,8,world

2 9,10,11,12,foo

In [14]: df1 = pd.read_table('ch06/ex1.csv',sep=',')

In [15]: df1

Out[15]:

a b c d message

0 1 2 3 4 hello

1 5 6 7 8 world

2 9 10 11 12 foo

b) 例子2,csv设置参数header,index_col

# ex2 csv and header,index_col

In [48]: pd.read_csv('ch06/ex2.csv',header=None)

Out[48]:

0 1 2 3 4

0 1 2 3 4 hello

1 5 6 7 8 world

2 9 10 11 12 foo

In [49]: pd.read_csv('ch06/ex2.csv',names=['a','b','c','d','message'])

Out[49]:

a b c d message

0 1 2 3 4 hello

1 5 6 7 8 world

2 9 10 11 12 foo

In [53]: pd.read_csv('ch06/ex2.csv',names= names,index_col = 'message')

Out[53]:

a b c d

message

hello 1 2 3 4

world 5 6 7 8

foo 9 10 11 12

# csv_mindex.csv

In [57]: !type ch06\csv_mindex.csv

key1,key2,value1,value2

one,a,1,2

one,b,3,4

one,c,5,6

one,d,7,8

two,a,9,10

two,b,11,12

two,c,13,14

two,d,15,16

In [60]: parsed = pd.read_csv('ch06/csv_mindex.csv',index_col=['key1','key2'])

In [61]: parsed

Out[61]:

value1 value2

key1 key2

one a 1 2

b 3 4

c 5 6

d 7 8

two a 9 10

b 11 12

c 13 14

d 15 16

c) 例子3,多个空格时使用正则式\s+

In [62]: list(open('ch06/ex3.txt'))

Out[62]:

[' A B C\n',

'aaa -0.264438 -1.026059 -0.619500\n',

'bbb 0.927272 0.302904 -0.032399\n',

'ccc -0.264273 -0.386314 -0.217601\n',

'ddd -0.871858 -0.348382 1.100491\n']

In [63]:

In [63]:

In [63]: result = pd.read_table('ch06/ex3.txt',sep='\s+')

In [64]: result

Out[64]:

A B C

aaa -0.264438 -1.026059 -0.619500

bbb 0.927272 0.302904 -0.032399

ccc -0.264273 -0.386314 -0.217601

ddd -0.871858 -0.348382 1.100491

d) 例子4,忽略格式不对的行,处理缺失值

In [65]: !type ch06\ex4.csv

# hey!

a,b,c,d,message

# just wanted to make things more difficult for you

# who reads CSV files with computers, anyway?

1,2,3,4,hello

5,6,7,8,world

9,10,11,12,foo

In [66]:

In [66]: pd.read_csv('ch06/ex4.csv',skiprows=[0,2,3])

Out[66]:

a b c d message

0 1 2 3 4 hello

1 5 6 7 8 world

2 9 10 11 12 foo

In [67]: !type ch06\ex5.csv

something,a,b,c,d,message

one,1,2,3,4,NA

two,5,6,,8,world

three,9,10,11,12,foo

In [68]: pd.read_csv('ch06/ex5.csv',na_values='Null')

Out[68]:

something a b c d message

0 one 1 2 3.0 4 NaN

1 two 5 6 NaN 8 world

2 three 9 10 11.0 12 foo

In [69]: setNAvaluse = {'message':['foo','NA'],'something':['two']}

In [70]: pd.read_csv('ch06/ex5.csv',na_values=setNAvaluse)

Out[70]:

something a b c d message

0 one 1 2 3.0 4 NaN

1 NaN 5 6 NaN 8 world

2 three 9 10 11.0 12 NaN

JSON 格式

json 包,直接load就好。可以看py4e免费在线text book

XML tree

python3.6 直接有elementree可以用,数据读出来常规处理就好。同上

二进制

参考官网

HDF5文件

这个好像是hadoop里的文件格式,适用于处理大批量文件,大数据上手继续学这部分。

In [39]: store = pd.HDFStore('mydata.h5')

In [41]: frame

Out[41]:

a b c d message

0 1 2 3 4 hello

1 5 6 7 8 world

2 9 10 11 12 foo

In [42]: store['obj1'] = frame

In [43]: store

Out[43]:

File path: mydata.h5

/obj1 frame (shape->[3,5])

In [44]: store['obj1_col'] = frame['a']

In [45]: store

Out[45]:

File path: mydata.h5

/obj1 frame (shape->[3,5])

/obj1_col series (shape->[3])

EXCEL

不用按照书里的安装啥库了,现在pandas可以直接读pd.read_excel('ch06/test.xls')

使用HTML和Web API

从网页中获取数据,暂时我只用过urllib和socket...

可以看py4e网站: Networked programs

request库好像是高级用法,待做

数据库

简单的SQL语言可以用内置的sqlite3

MongoDB

这是NoSQL数据库,还没装,迟点跟着hadoop一起做...

2018.7.2x 大数据文件格式,上手后再做。被成功安利request库处理网页。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值