pandas (1)

pandas

和numpy不同的是,pandas更像一个字典型的,而numpy是类似列表的
会像个字典一样把每个数据上都加上序列

import pandas as pd
import numpy as np

s = pd.Series([1,2,3,6,np.nan,44,1])
print(s)

#result
0     1.0
1     2.0
2     3.0
3     6.0
4     NaN
5    44.0
6     1.0
dtype: float64

像列表 表格一样输出

import pandas as pd
import numpy as np

dates = pd.date_range('20160101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=['a','b','c','d'])
print(df)
print(df.columns)
#result
                   a         b         c         d
2016-01-01 -1.430430 -0.421334  1.219365 -0.480058
2016-01-02  0.612175  1.118887  0.678260 -0.689190
2016-01-03  0.983088  3.028284  0.020579 -0.251909
2016-01-04 -0.669926 -0.019545  1.813316  1.129999
2016-01-05  0.436789 -0.832122 -0.713937  1.164483
2016-01-06  0.430476  2.399838  0.299447  0.523971
Index(['a', 'b', 'c', 'd'], dtype='object')

#index come from data

使用describe来描述该列表的性质

print(df.describe()) #analys the value of columns

#describe the characteristic of this 
             a         b         c         d
count  6.000000  6.000000  6.000000  6.000000
mean   0.319191  0.191965  0.536882 -0.213288
std    0.802585  2.000006  0.635241  0.982923
min   -1.037829 -2.291519  0.007462 -1.677567
25%   -0.053965 -1.172490  0.161186 -0.757976
50%    0.650846  0.066436  0.330561 -0.139355
75%    0.876507  1.334585  0.611195  0.515499
max    0.988458  3.138598  1.743238  0.906949

使用sort_index 进行排序

axis 表示是对列还是行
ascending 表示是正序还是反序

print(df.sort_index(axis=1,ascending=False))

                  d         c         b         a
2016-01-01 -0.656583  1.181235 -1.499221 -0.951707
2016-01-02  0.142750  0.854452  1.219795  0.876144
2016-01-03  1.213446  0.362272 -1.255121 -0.876265
2016-01-04  0.667960 -0.696974 -0.162850  0.005028
2016-01-05 -2.494620 -1.073663  0.380002 -1.473647
2016-01-06  1.118285  0.134734  1.144273  0.048522

选定具体数据

  • loc : select by location
  • iloc : select by label
  • df.x < condition : select by condition
import pandas as pd
import numpy as np

dates = pd.date_range('20160101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['a','b','c','d'])

print(df.loc['20160101'])   #select by label:loc
print(df.loc['20160101',['a','b']])  #

print(df.iloc[3:5,1]) #select by position: iloc

print(df[df.a>8]) #boolean indexing
a    0
b    1
c    2
d    3
Name: 2016-01-01 00:00:00, dtype: int32
a    0
b    1
Name: 2016-01-01 00:00:00, dtype: int32
2016-01-04    13
2016-01-05    17
Freq: D, Name: b, dtype: int32
             a   b   c   d
2016-01-04  12  13  14  15
2016-01-05  16  17  18  19
2016-01-06  20  21  22  23

重新定义值

import pandas as pd
import numpy as np

dates = pd.date_range('20160101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['a','b','c','d'])

df.iloc[2,2] = 111  # use the position to change the value
df.loc['20160101','b'] = 222  #use label to change the value
df.a[df.a>4] = 0 #boolearn change

df['f'] = pd.Series([1,2,3,4,5,6],index=pd.date_range('20160101',periods=6)) 
# to add values
print(df)

# result
           a    b    c   d  f
2016-01-01  0  222    2   3  1
2016-01-02  4    5    6   7  2
2016-01-03  0    9  111  11  3
2016-01-04  0   13   14  15  4
2016-01-05  0   17   18  19  5
2016-01-06  0   21   22  23  6

处理丢失数据

import pandas as pd
import numpy as np

dates = pd.date_range('20160101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['a','b','c','d'])

df.iloc[0,1] = np.nan
df.iloc[1,2] = np.nan
print(df.dropna(axis=0,how='any')) #how = {'any'} if nan , dropout this line

print(df.fillna(value=0)) # fill the nan

print(df.isnull()) #cheak if nan is exist
            a     b     c   d
2016-01-03   8   9.0  10.0  11
2016-01-04  12  13.0  14.0  15
2016-01-05  16  17.0  18.0  19
2016-01-06  20  21.0  22.0  23

             a     b     c   d
2016-01-01   0   0.0   2.0   3
2016-01-02   4   5.0   0.0   7
2016-01-03   8   9.0  10.0  11
2016-01-04  12  13.0  14.0  15
2016-01-05  16  17.0  18.0  19
2016-01-06  20  21.0  22.0  23

               a      b      c      d
2016-01-01  False   True  False  False
2016-01-02  False  False   True  False
2016-01-03  False  False  False  False
2016-01-04  False  False  False  False
2016-01-05  False  False  False  False
2016-01-06  False  False  False  False
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值