pandas 快速教程

 

快速教程---Pandas

对象创建

通过传递值列表创建一个系列,让大熊猫创建一个默认的整数索引:

In [1]:

 

import pandas as pd
import numpy as np
s = pd.Series([1,3,5,np.nan,6,8])
s

Out[1]:

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

通过传递带有日期时间索引和标记列的NUMPY数组创建数据文件:

In [4]:

 

dates = pd.date_range('20130101', periods=6)
dates

Out[4]:

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [15]:

 

df = pd.DataFrame(np.arange(24).reshape(6,4), index=dates, columns=list('ABCD'))
df

Out[15]:

 ABCD
2013-01-010123
2013-01-024567
2013-01-03891011
2013-01-0412131415
2013-01-0516171819
2013-01-0620212223

In [16]:

 

df2 = pd.DataFrame({ 'A' : 1.,
                    'B' : pd.Timestamp('20130102'),
                    'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
                    'D' : np.array([3] * 4,dtype='int32'),
                    'E' : pd.Categorical(["test","train","test","train"]),
                    'F' : 'foo' })
df2

Out[16]:

 ABCDEF
01.02013-01-021.03testfoo
11.02013-01-021.03trainfoo
21.02013-01-021.03testfoo
31.02013-01-021.03trainfoo

生成的数据文件的列具有不同的dType。

In [17]:

 

df2.dtypes

Out[17]:

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

查看数据

下面是如何查看框架的顶部和底部行:

In [18]:

 

df.head()

Out[18]:

 ABCD
2013-01-010123
2013-01-024567
2013-01-03891011
2013-01-0412131415
2013-01-0516171819

In [19]:

 

df.tail(3)

Out[19]:

 ABCD
2013-01-0412131415
2013-01-0516171819
2013-01-0620212223

显示索引、列和下面的NUMPY数据:

In [20]:

 

df.index

Out[20]:

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [21]:

 

df.columns

Out[21]:

Index(['A', 'B', 'C', 'D'], dtype='object')

In [22]:

 

df.values

Out[22]:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

描述你的数据的快速统计摘要:

In [23]:

 

df.describe()

Out[23]:

 ABCD
count6.0000006.0000006.0000006.000000
mean10.00000011.00000012.00000013.000000
std7.4833157.4833157.4833157.483315
min0.0000001.0000002.0000003.000000
25%5.0000006.0000007.0000008.000000
50%10.00000011.00000012.00000013.000000
75%15.00000016.00000017.00000018.000000
max20.00000021.00000022.00000023.000000

转置你的数据:

In [24]:

 

df.T

Out[24]:

 2013-01-01 00:00:002013-01-02 00:00:002013-01-03 00:00:002013-01-04 00:00:002013-01-05 00:00:002013-01-06 00:00:00
A048121620
B159131721
C2610141822
D3711151923

轴排序:

In [25]:

 

df.sort_index(axis=1, ascending=False)

Out[25]:

 DCBA
2013-01-013210
2013-01-027654
2013-01-03111098
2013-01-0415141312
2013-01-0519181716
2013-01-0623222120

按值排序:

In [27]:

 

df.sort_values(by='B',ascending=False)

Out[27]:

 ABCD
2013-01-0620212223
2013-01-0516171819
2013-01-0412131415
2013-01-03891011
2013-01-024567
2013-01-010123

选择

选择单个列,生成一个序列,相当于df.a:

In [28]:

 

df['A']

Out[28]:

2013-01-01     0
2013-01-02     4
2013-01-03     8
2013-01-04    12
2013-01-05    16
2013-01-06    20
Freq: D, Name: A, dtype: int32

In [29]:

 

df[0:3]

Out[29]:

 ABCD
2013-01-010123
2013-01-024567
2013-01-03891011

In [30]:

 

df['20130102':'20130104']

Out[30]:

 ABCD
2013-01-024567
2013-01-03891011
2013-01-0412131415

标签选择

使用标签获取横截面:

In [31]:

 

df.loc[dates[0]]

Out[31]:

A    0
B    1
C    2
D    3
Name: 2013-01-01 00:00:00, dtype: int32

In [32]:

 

df.loc[:,['A','B']]

Out[32]:

 AB
2013-01-0101
2013-01-0245
2013-01-0389
2013-01-041213
2013-01-051617
2013-01-062021

In [33]:

 

df.loc['20130102':'20130104',['A','B']]

Out[33]:

 AB
2013-01-0245
2013-01-0389
2013-01-041213

In [34]:

 

df.loc['20130102',['A','B']]

Out[34]:

A    4
B    5
Name: 2013-01-02 00:00:00, dtype: int32

In [35]:

 

df.loc[dates[0],'A']

Out[35]:

0

为了获得对标量的快速访问(等价于先前的方法):

In [36]:

 

df.at[dates[0],'A']

Out[36]:

0

位置选择

通过传递整数的位置进行选择:

In [37]:

 

df.iloc[3]

Out[37]:

A    12
B    13
C    14
D    15
Name: 2013-01-04 00:00:00, dtype: int32

In [38]:

 

df.iloc[3:5,0:2]

Out[38]:

 AB
2013-01-041213
2013-01-051617

In [39]:

 

df.iloc[[1,2,4],[0,2]]

Out[39]:

 AC
2013-01-0246
2013-01-03810
2013-01-051618

In [40]:

 

df.iloc[1:3,:]

Out[40]:

 ABCD
2013-01-024567
2013-01-03891011

In [41]:

 

df.iloc[:,1:3]

Out[41]:

 BC
2013-01-0112
2013-01-0256
2013-01-03910
2013-01-041314
2013-01-051718
2013-01-062122

In [42]:

 

df.iloc[1,1]

Out[42]:

5

为了获得对标量的快速访问(等价于先前的方法):

In [43]:

 

df.iat[1,1]

Out[43]:

5

布尔索引

使用单个列的值来选择数据。

In [44]:

 

df[df.A > 0]

Out[44]:

 ABCD
2013-01-024567
2013-01-03891011
2013-01-0412131415
2013-01-0516171819
2013-01-0620212223

In [45]:

 

df[df > 0]

Out[45]:

 ABCD
2013-01-01NaN123
2013-01-024.0567
2013-01-038.091011
2013-01-0412.0131415
2013-01-0516.0171819
2013-01-0620.0212223

使用ISIN()方法进行过滤:

In [47]:

 

df2 = df.copy()
df2['E'] = ['one', 'one','two','three','four','three']
df2

Out[47]:

 ABCDE
2013-01-010123one
2013-01-024567one
2013-01-03891011two
2013-01-0412131415three
2013-01-0516171819four
2013-01-0620212223three

In [48]:

 

df2[df2['E'].isin(['two','four'])]

Out[48]:

 ABCDE
2013-01-03891011two
2013-01-0516171819four

设置

设置一个新列会自动根据索引对数据进行对齐。

In [49]:

 

s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6))
s1

Out[49]:

2013-01-02    1
2013-01-03    2
2013-01-04    3
2013-01-05    4
2013-01-06    5
2013-01-07    6
Freq: D, dtype: int64

In [50]:

 

df['F'] = s1
df

Out[50]:

 ABCDF
2013-01-010123NaN
2013-01-0245671.0
2013-01-038910112.0
2013-01-04121314153.0
2013-01-05161718194.0
2013-01-06202122235.0

In [51]:

 

df.at[dates[0],'A'] = 0
df.iat[0,1] = 0
df

Out[51]:

 ABCDF
2013-01-010023NaN
2013-01-0245671.0
2013-01-038910112.0
2013-01-04121314153.0
2013-01-05161718194.0
2013-01-06202122235.0

In [52]:

 

df.loc[:,'D'] = np.array([5] * len(df))
df

Out[52]:

 ABCDF
2013-01-010025NaN
2013-01-0245651.0
2013-01-03891052.0
2013-01-0412131453.0
2013-01-0516171854.0
2013-01-0620212255.0

In [53]:

 

df2 = df.copy()
df2[df2 > 0] = -df2
df2

Out[53]:

 ABCDF
2013-01-0100-2-5NaN
2013-01-02-4-5-6-5-1.0
2013-01-03-8-9-10-5-2.0
2013-01-04-12-13-14-5-3.0
2013-01-05-16-17-18-5-4.0
2013-01-06-20-21-22-5-5.0

缺失数据

熊猫主要使用值NP.Na表示缺失的数据。默认情况下不包括在计算中。重新索引允许您在指定的轴上更改/添加/删除索引。这将返回数据的副本。

In [54]:

 

df

Out[54]:

 ABCDF
2013-01-010025NaN
2013-01-0245651.0
2013-01-03891052.0
2013-01-0412131453.0
2013-01-0516171854.0
2013-01-0620212255.0

In [55]:

 

df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
df1.loc[dates[0]:dates[1],'E'] = 1
df1

Out[55]:

 ABCDFE
2013-01-010025NaN1.0
2013-01-0245651.01.0
2013-01-03891052.0NaN
2013-01-0412131453.0NaN

In [56]:

 

df1.dropna(how='any')             #To drop any rows that have missing data.

Out[56]:

 ABCDFE
2013-01-0245651.01.0

In [57]:

 

df1.fillna(value=9)               #Filling missing data.

Out[57]:

 ABCDFE
2013-01-0100259.01.0
2013-01-0245651.01.0
2013-01-03891052.09.0
2013-01-0412131453.09.0

In [58]:

 

pd.isna(df1)                     #To get the boolean mask where values are nan.

Out[58]:

 ABCDFE
2013-01-01FalseFalseFalseFalseTrueFalse
2013-01-02FalseFalseFalseFalseFalseFalse
2013-01-03FalseFalseFalseFalseFalseTrue
2013-01-04FalseFalseFalseFalseFalseTrue

操作

一般来说,操作排除了丢失的数据。执行描述性统计:

In [59]:

 

df.mean()

Out[59]:

A    10.000000
B    10.833333
C    12.000000
D     5.000000
F     3.000000
dtype: float64

In [60]:

 

df.mean(1)                    #Same operation on the other axis:

Out[60]:

2013-01-01     1.75
2013-01-02     4.20
2013-01-03     6.80
2013-01-04     9.40
2013-01-05    12.00
2013-01-06    14.60
Freq: D, dtype: float64

操作具有不同维度和需要对齐的对象。此外,熊猫自动沿指定维度广播。

In [64]:

 

s = pd.Series([1,3,5,np.nan,6,8], index=dates).shift(-2)
s

Out[64]:

2013-01-01    5.0
2013-01-02    NaN
2013-01-03    6.0
2013-01-04    8.0
2013-01-05    NaN
2013-01-06    NaN
Freq: D, dtype: float64

应用

将函数应用于数据:

In [65]:

 

df

Out[65]:

 ABCDF
2013-01-010025NaN
2013-01-0245651.0
2013-01-03891052.0
2013-01-0412131453.0
2013-01-0516171854.0
2013-01-0620212255.0

In [66]:

 

df.apply(np.cumsum)

Out[66]:

 ABCDF
2013-01-010025NaN
2013-01-02458101.0
2013-01-03121418153.0
2013-01-04242732206.0
2013-01-054044502510.0
2013-01-066065723015.0

In [67]:

 

df.apply(lambda x: x.max() - x.min())

Out[67]:

A    20.0
B    21.0
C    20.0
D     0.0
F     4.0
dtype: float64

直方图化

In [68]:

 

s = pd.Series(np.random.randint(0, 7, size=10))
s

Out[68]:

0    2
1    3
2    5
3    0
4    1
5    1
6    0
7    5
8    2
9    6
dtype: int32

In [69]:

 

s.value_counts()

Out[69]:

5    2
2    2
1    2
0    2
6    1
3    1
dtype: int64

字符串方法

系列在str属性中配备了一组字符串处理方法,使得对数组的每个元素进行操作变得容易,如下面的代码片段所示。注意,STR中的模式匹配通常默认使用正则表达式。

In [70]:

 

s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
s.str.lower()

Out[70]:

0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

合并

pandas提供了各种工具,用于在join/merge类型操作的情况下,轻松地将Series、DataFrame和Player对象与各种用于索引和关系代数功能的设置逻辑组合在一起。
将pandas对象与Copter()连接在一起:

In [76]:

 

df = pd.DataFrame(np.arange(10,50).reshape(10,4))
df

Out[76]:

 0123
010111213
114151617
218192021
322232425
426272829
530313233
634353637
738394041
842434445
946474849

In [77]:

 

# break it into pieces
pieces = [df[:3], df[3:7], df[7:]]
pieces

Out[77]:

[    0   1   2   3
 0  10  11  12  13
 1  14  15  16  17
 2  18  19  20  21,     0   1   2   3
 3  22  23  24  25
 4  26  27  28  29
 5  30  31  32  33
 6  34  35  36  37,     0   1   2   3
 7  38  39  40  41
 8  42  43  44  45
 9  46  47  48  49]

In [79]:

 

len(pieces),type(pieces[0])

Out[79]:

(3, pandas.core.frame.DataFrame)

In [80]:

 

pd.concat(pieces)

Out[80]:

 0123
010111213
114151617
218192021
322232425
426272829
530313233
634353637
738394041
842434445
946474849

加入

SQL样式合并。

In [81]:

 

left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})
left

Out[81]:

 keylval
0foo1
1foo2

In [82]:

 

right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})
right

Out[82]:

 keyrval
0foo4
1foo5

In [83]:

 

pd.merge(left, right, on='key')

Out[83]:

 keylvalrval
0foo14
1foo15
2foo24
3foo25

In [84]:

 

left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
left

Out[84]:

 keylval
0foo1
1bar2

In [85]:

 

right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})
right

Out[85]:

 keyrval
0foo4
1bar5

In [86]:

 

pd.merge(left, right, on='key')

Out[86]:

 keylvalrval
0foo14
1bar25

追加

将行追加到数据文件中。请参阅追加部分。

In [87]:

 

df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
df

Out[87]:

 ABCD
00.476360-0.1429001.3069150.886458
10.6855980.6738360.1818960.446095
2-0.240541-1.670222-1.0866950.024977
30.4331921.776994-0.226416-0.376136
40.9553231.6550590.500006-1.382097
50.9692081.964340-1.2497881.448280
6-1.078558-0.989534-1.0445460.529323
71.8513161.235936-0.4370190.546799

In [88]:

 

s = df.iloc[3]
df.append(s, ignore_index=True)

Out[88]:

 ABCD
00.476360-0.1429001.3069150.886458
10.6855980.6738360.1818960.446095
2-0.240541-1.670222-1.0866950.024977
30.4331921.776994-0.226416-0.376136
40.9553231.6550590.500006-1.382097
50.9692081.964340-1.2497881.448280
6-1.078558-0.989534-1.0445460.529323
71.8513161.235936-0.4370190.546799
80.4331921.776994-0.226416-0.376136

分组

通过“分组”,我们指的是涉及一个或多个以下步骤的过程:
1、根据一些标准将数据分成组
2、将函数独立地应用于各组
3、将结果结合到数据结构中

In [89]:

 

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                          'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8),
                   'D' : np.random.randn(8)})
df

Out[89]:

 ABCD
0fooone-1.0262842.063183
1barone0.512460-0.363375
2footwo-0.4849390.982710
3barthree2.073508-0.393659
4footwo-0.1970302.044099
5bartwo-1.3691762.172274
6fooone0.4452501.503858
7foothree-1.6758510.207408

In [90]:

 

#Grouping and then applying the sum() function to the resulting groups.
df.groupby('A').sum()

Out[90]:

 CD
A  
bar1.2167911.415240
foo-2.9388536.801258

In [91]:

 

#Grouping by multiple columns forms a hierarchical index, and again we can apply the sum() function.
df.groupby(['A','B']).sum()

Out[91]:

  CD
AB  
barone0.512460-0.363375
three2.073508-0.393659
two-1.3691762.172274
fooone-0.5810343.567041
three-1.6758510.207408
two-0.6819683.026809

重塑

In [92]:

 

tuples = list(zip(*[['bar', 'bar', 'baz', 'baz','foo', 'foo', 'qux', 'qux'],
                    ['one', 'two', 'one', 'two','one', 'two', 'one', 'two']]))
tuples

Out[92]:

[('bar', 'one'),
 ('bar', 'two'),
 ('baz', 'one'),
 ('baz', 'two'),
 ('foo', 'one'),
 ('foo', 'two'),
 ('qux', 'one'),
 ('qux', 'two')]

In [93]:

 

index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index

Out[93]:

MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
           names=['first', 'second'])

In [95]:

 

df = pd.DataFrame(np.arange(10,26).reshape(8,2), index=index, columns=['A', 'B'])
df

Out[95]:

  AB
firstsecond  
barone1011
two1213
bazone1415
two1617
fooone1819
two2021
quxone2223
two2425

In [96]:

 

df2 = df[:4]
df2

Out[96]:

  AB
firstsecond  
barone1011
two1213
bazone1415
two1617

In [97]:

 

#The stack() method “compresses” a level in the DataFrame’s columns.
stacked = df2.stack()
stacked

Out[97]:

first  second   
bar    one     A    10
               B    11
       two     A    12
               B    13
baz    one     A    14
               B    15
       two     A    16
               B    17
dtype: int32

对于“堆叠的”DataFrame或Series(具有MultiIndex作为索引),stack()的逆操作是unstack(),在默认情况下,它解锁最后一个级别:

In [98]:

 

stacked.unstack()

Out[98]:

  AB
firstsecond  
barone1011
two1213
bazone1415
two1617

In [99]:

 

stacked.unstack(1)

Out[99]:

 secondonetwo
first   
barA1012
B1113
bazA1416
B1517

In [100]:

 

stacked.unstack(0)

Out[100]:

 firstbarbaz
second   
oneA1014
B1115
twoA1216
B1317

基准表

In [104]:

 

df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
                   'B' : ['A', 'B', 'C'] * 4,
                   'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2,
                   'D' : np.arange(12),
                   'E' : np.arange(10,22)})
df

Out[104]:

 ABCDE
0oneAfoo010
1oneBfoo111
2twoCfoo212
3threeAbar313
4oneBbar414
5oneCbar515
6twoAfoo616
7threeBfoo717
8oneCfoo818
9oneAbar919
10twoBbar1020
11threeCbar1121

In [105]:

 

pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])

Out[105]:

 Cbarfoo
AB  
oneA9.00.0
B4.01.0
C5.08.0
threeA3.0NaN
BNaN7.0
C11.0NaN
twoANaN6.0
B10.0NaN
CNaN2.0

时间序列

pandas具有简单、强大和高效的功能,用于在频率转换期间执行重采样操作(例如,将第二数据转换为5分钟数据)。这在金融应用中非常普遍,但并不局限于此。

In [107]:

 

rng = pd.date_range('1/1/2018', periods=10, freq='S')
rng

Out[107]:

DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 00:00:01',
               '2018-01-01 00:00:02', '2018-01-01 00:00:03',
               '2018-01-01 00:00:04', '2018-01-01 00:00:05',
               '2018-01-01 00:00:06', '2018-01-01 00:00:07',
               '2018-01-01 00:00:08', '2018-01-01 00:00:09'],
              dtype='datetime64[ns]', freq='S')

In [108]:

 

ts = pd.Series(np.random.randint(0, 10, len(rng)), index=rng)
ts

Out[108]:

2018-01-01 00:00:00    2
2018-01-01 00:00:01    9
2018-01-01 00:00:02    4
2018-01-01 00:00:03    1
2018-01-01 00:00:04    5
2018-01-01 00:00:05    3
2018-01-01 00:00:06    7
2018-01-01 00:00:07    2
2018-01-01 00:00:08    5
2018-01-01 00:00:09    2
Freq: S, dtype: int32

In [112]:

 

ts.resample('5S').sum()

Out[112]:

2018-01-01 00:00:00    21
2018-01-01 00:00:05    19
Freq: 5S, dtype: int32

时间跨度表示之间的转换:

In [114]:

 

rng = pd.date_range('1/1/2018', periods=5, freq='M')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

Out[114]:

2018-01-31   -0.488761
2018-02-28   -0.407915
2018-03-31    0.428891
2018-04-30   -0.306055
2018-05-31   -0.805451
Freq: M, dtype: float64

In [115]:

 

ps = ts.to_period()
ps

Out[115]:

2018-01   -0.488761
2018-02   -0.407915
2018-03    0.428891
2018-04   -0.306055
2018-05   -0.805451
Freq: M, dtype: float64

In [116]:

 

ps.to_timestamp()

Out[116]:

2018-01-01   -0.488761
2018-02-01   -0.407915
2018-03-01    0.428891
2018-04-01   -0.306055
2018-05-01   -0.805451
Freq: MS, dtype: float64

范畴

pandas可以在数据文件中包含分类数据。

In [117]:

 

df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
df

Out[117]:

 idraw_grade
01a
12b
23b
34a
45a
56e

In [118]:

 

#Convert the raw grades to a categorical data type.
df["grade"] = df["raw_grade"].astype("category")
df["grade"]

Out[118]:

0    a
1    b
2    b
3    a
4    a
5    e
Name: grade, dtype: category
Categories (3, object): [a, b, e]

In [119]:

 

#Rename the categories to more meaningful names (assigning to Series.cat.categories is inplace!).
df["grade"].cat.categories = ["very good", "good", "very bad"]
df

Out[119]:

 idraw_gradegrade
01avery good
12bgood
23bgood
34avery good
45avery good
56every bad

In [120]:

 

#Reorder the categories and simultaneously add the missing categories (methods under Series .cat return a new Series by default).
df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", "good", "very good"])
df["grade"]

Out[120]:

0    very good
1         good
2         good
3    very good
4    very good
5     very bad
Name: grade, dtype: category
Categories (5, object): [very bad, bad, medium, good, very good]

In [121]:

 

df.sort_values(by="grade")

Out[121]:

 idraw_gradegrade
56every bad
12bgood
23bgood
01avery good
34avery good
45avery good

In [122]:

 

#Grouping by a categorical column also shows empty categories.
df.groupby("grade").size()

Out[122]:

grade
very bad     1
bad          0
medium       0
good         2
very good    3
dtype: int64

作图

In [126]:

 

%matplotlib inline
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
ts = ts.cumsum()
ts.plot()

Out[126]:

<matplotlib.axes._subplots.AxesSubplot at 0xacef320>

In [127]:

 

import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,columns=['A', 'B', 'C', 'D'])
df = df.cumsum()
plt.figure(); df.plot(); plt.legend(loc='best')

Out[127]:

<matplotlib.legend.Legend at 0xada7160>
<matplotlib.figure.Figure at 0x97fc240>

获取数据输入/输出

写入CSV文件。

In [ ]:

 

df.to_csv('foo.csv')

In [ ]:

 

#Reading from a csv file.
pd.read_csv('foo.csv')

表格

EXCEL的阅读和写作。

In [ ]:

 

df.to_excel('foo.xlsx', sheet_name='Sheet1')

In [ ]:

 

pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values=['NA'])
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值