pandas （1）

最新推荐文章于 2024-07-12 16:26:30 发布

Sxyinn

最新推荐文章于 2024-07-12 16:26:30 发布

阅读量122

点赞数

分类专栏： python 文章标签： numpy python

本文链接：https://blog.csdn.net/weixin_43956102/article/details/105600030

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

pandas

和numpy不同的是，pandas更像一个字典型的，而numpy是类似列表的
会像个字典一样把每个数据上都加上序列

import pandas as pd
import numpy as np

s = pd.Series([1,2,3,6,np.nan,44,1])
print(s)

#result
0     1.0
1     2.0
2     3.0
3     6.0
4     NaN
5    44.0
6     1.0
dtype: float64

像列表表格一样输出

import pandas as pd
import numpy as np

dates = pd.date_range('20160101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=['a','b','c','d'])
print(df)
print(df.columns)
#result
                   a         b         c         d
2016-01-01 -1.430430 -0.421334  1.219365 -0.480058
2016-01-02  0.612175  1.118887  0.678260 -0.689190
2016-01-03  0.983088  3.028284  0.020579 -0.251909
2016-01-04 -0.669926 -0.019545  1.813316  1.129999
2016-01-05  0.436789 -0.832122 -0.713937  1.164483
2016-01-06  0.430476  2.399838  0.299447  0.523971
Index(['a', 'b', 'c', 'd'], dtype='object')

#index come from data

使用describe来描述该列表的性质

print(df.describe()) #analys the value of columns

#describe the characteristic of this 
             a         b         c         d
count  6.000000  6.000000  6.000000  6.000000
mean   0.319191  0.191965  0.536882 -0.213288
std    0.802585  2.000006  0.635241  0.982923
min   -1.037829 -2.291519  0.007462 -1.677567
25%   -0.053965 -1.172490  0.161186 -0.757976
50%    0.650846  0.066436  0.330561 -0.139355
75%    0.876507  1.334585  0.611195  0.515499
max    0.988458  3.138598  1.743238  0.906949

使用sort_index 进行排序

axis 表示是对列还是行
ascending 表示是正序还是反序

print(df.sort_index(axis=1,ascending=False))

                  d         c         b         a
2016-01-01 -0.656583  1.181235 -1.499221 -0.951707
2016-01-02  0.142750  0.854452  1.219795  0.876144
2016-01-03  1.213446  0.362272 -1.255121 -0.876265
2016-01-04  0.667960 -0.696974 -0.162850  0.005028
2016-01-05 -2.494620 -1.073663  0.380002 -1.473647
2016-01-06  1.118285  0.134734  1.144273  0.048522

选定具体数据

loc : select by location
iloc : select by label
df.x < condition : select by condition

import pandas as pd
import numpy as np

dates = pd.date_range('20160101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['a','b','c','d'])

print(df.loc['20160101'])   #select by label:loc
print(df.loc['20160101',['a','b']])  #

print(df.iloc[3:5,1]) #select by position: iloc

print(df[df.a>8]) #boolean indexing

a    0
b    1
c    2
d    3
Name: 2016-01-01 00:00:00, dtype: int32
a    0
b    1
Name: 2016-01-01 00:00:00, dtype: int32
2016-01-04    13
2016-01-05    17
Freq: D, Name: b, dtype: int32
             a   b   c   d
2016-01-04  12  13  14  15
2016-01-05  16  17  18  19
2016-01-06  20  21  22  23

重新定义值

import pandas as pd
import numpy as np

dates = pd.date_range('20160101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['a','b','c','d'])

df.iloc[2,2] = 111  # use the position to change the value
df.loc['20160101','b'] = 222  #use label to change the value
df.a[df.a>4] = 0 #boolearn change

df['f'] = pd.Series([1,2,3,4,5,6],index=pd.date_range('20160101',periods=6)) 
# to add values
print(df)

# result
           a    b    c   d  f
2016-01-01  0  222    2   3  1
2016-01-02  4    5    6   7  2
2016-01-03  0    9  111  11  3
2016-01-04  0   13   14  15  4
2016-01-05  0   17   18  19  5
2016-01-06  0   21   22  23  6

处理丢失数据

import pandas as pd
import numpy as np

dates = pd.date_range('20160101',periods=6)
df = pd.DataFrame(np.arange(24).reshape((6,4)),index=dates,columns=['a','b','c','d'])

df.iloc[0,1] = np.nan
df.iloc[1,2] = np.nan
print(df.dropna(axis=0,how='any')) #how = {'any'} if nan , dropout this line

print(df.fillna(value=0)) # fill the nan

print(df.isnull()) #cheak if nan is exist

            a     b     c   d
2016-01-03   8   9.0  10.0  11
2016-01-04  12  13.0  14.0  15
2016-01-05  16  17.0  18.0  19
2016-01-06  20  21.0  22.0  23

             a     b     c   d
2016-01-01   0   0.0   2.0   3
2016-01-02   4   5.0   0.0   7
2016-01-03   8   9.0  10.0  11
2016-01-04  12  13.0  14.0  15
2016-01-05  16  17.0  18.0  19
2016-01-06  20  21.0  22.0  23

               a      b      c      d
2016-01-01  False   True  False  False
2016-01-02  False  False   True  False
2016-01-03  False  False  False  False
2016-01-04  False  False  False  False
2016-01-05  False  False  False  False
2016-01-06  False  False  False  False

Sxyinn

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pandas （1）

pandas和numpy不同的是，pandas更像一个字典型的，而numpy是类似列表的会像个字典一样把每个数据上都加上序列import pandas as pdimport numpy as nps = pd.Series([1,2,3,6,np.nan,44,1])print(s)#result0 1.01 2.02 3.03 6.0...
复制链接

扫一扫