[' A B C\n',
'aaa -0.264438 -1.026059 -0.619500\n',
'bbb 0.927272 0.302904 -0.032399\n',
'ccc -0.264273 -0.386314 -0.217601\n',
'ddd -0.871858 -0.348382 1.100491\n']
result = pd.read_table('examples/ex3.txt', sep='\s+')
result
A
B
C
aaa
-0.264438
-1.026059
-0.619500
bbb
0.927272
0.302904
-0.032399
ccc
-0.264273
-0.386314
-0.217601
ddd
-0.871858
-0.348382
1.100491
result = pd.read_csv('examples/ex5.csv')
result
something
a
b
c
d
message
0
one
1
2
3.0
4
NaN
1
two
5
6
NaN
8
world
2
three
9
10
11.0
12
foo
数据部分读取
1.很多行时,只展示10行
result = pd.read_csv('examples/ex6.csv')
result
one
two
three
four
key
0
0.467976
-0.038649
-0.295344
-1.824726
L
1
-0.358893
1.404453
0.704965
-0.200638
B
2
-0.501840
0.659254
-0.421691
-0.057688
G
3
0.204886
1.074134
1.388361
-0.982404
R
4
0.354628
-0.133116
0.283763
-0.837063
Q
...
...
...
...
...
...
9995
2.311896
-0.417070
-1.409599
-0.515821
L
9996
-0.479893
-0.650419
0.745152
-0.646038
E
9997
0.523331
0.787112
0.486066
1.093156
K
9998
-0.362559
0.598894
-1.843201
0.887292
G
9999
-0.096376
-1.012999
-0.657431
-0.573315
0
10000 rows × 5 columns
2.只读取前五行
pd.read_csv('examples/ex6.csv', nrows=5)
one
two
three
four
key
0
0.467976
-0.038649
-0.295344
-1.824726
L
1
-0.358893
1.404453
0.704965
-0.200638
B
2
-0.501840
0.659254
-0.421691
-0.057688
G
3
0.204886
1.074134
1.388361
-0.982404
R
4
0.354628
-0.133116
0.283763
-0.837063
Q
3.使用chunksize分块处理大型csv文件
chunker = pd.read_csv('examples/ex6.csv', chunksize=1000)
tot = pd.Series([])for piece in chunker:
tot = tot.add(piece['key'].value_counts(), fill_value=0)
tot = tot.sort_values(ascending=False)
显示前十的计数结果
tot[:10]
E 368.0
X 364.0
L 346.0
O 343.0
Q 340.0
M 338.0
J 337.0
F 335.0
K 334.0
H 330.0
dtype: float64
数据的读取, 保存import numpy as npimport pandas as pdnp.random.seed(12345)import matplotlib.pyplot as pltplt.rc('figure', figsize=(10, 6))np.set_printoptions(precision=4, suppress=True)读取csv,txt文件df...