pandas学习笔记

DataFrame就是由一列一列的Series组成的,pandas相当于python的excel

1.pandas无论是Series还是DataFrame的index都可以包含重复的index

date={'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada','Nevada'],
 'year': [2000, 2001, 2002, 2001, 2002, 2003],
 'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
df1=pd.DataFrame(date,columns=['year','year','pop','debt'],index=['one','two','three','four','five','six'])
df1
       year  year  pop debt
one    2000  2000  1.5  NaN
two    2001  2001  1.7  NaN
three  2002  2002  3.6  NaN
four   2001  2001  2.4  NaN
five   2002  2002  2.9  NaN
six    2003  2003  3.2  NaN

2.Series和dataFrame选取数据值

#Series可以采用索引和0,1,2...
d={'name':'zy','sex':'female','age':12}
pd2=pd.Series(d)
pd2[2]
'female'
pd2['age']
12
#选取多个值
pd2[['age','sex']]
age        12
sex    female
dtype: object

pd2[[1,0]]
name    zy
age     12
dtype: object

#DataFrame
df1
       year  year  pop debt
one    2000  2000  1.5  NaN
two    2001  2001  1.7  NaN
three  2002  2002  3.6  NaN
four   2001  2001  2.4  NaN
five   2002  2002  2.9  NaN
six    2003  2003  3.2  NaN
df1['year'] #列
       year  year
one    2000  2000
two    2001  2001
three  2002  2002
four   2001  2001
five   2002  2002
six    2003  2003
df1[['year','pop']] #多个列
       year  year  pop
one    2000  2000  1.5
two    2001  2001  1.7
three  2002  2002  3.6
four   2001  2001  2.4
five   2002  2002  2.9
six    2003  2003  3.2
df1.loc['one'] #行 index
year    2000
year    2000
pop      1.5
debt     NaN
Name: one, dtype: object
df1.loc[['one','two']] #多个行
     year  year  pop debt
one  2000  2000  1.5  NaN
two  2001  2001  1.7  NaN

df1[:4] #取前四行 无df1[1]


df3=pd.DataFrame(np.arange(16).reshape(4,4),index=['a','b','c','d'],columns=['one','two','three','four'])
df3[df1['three']>2] #也是对索引index即行进行操作

#还可以采用reindex() 重新索引 该方式可以产生空值NaN
df1.reindex(['one','four','seven'])
         year    year  pop debt
one       1.0     1.0  1.0    1
four   2001.0  2001.0  2.4  NaN
seven     NaN     NaN  NaN  NaN


#选取列
df2=df1[['pop','debt']]#因为df1中有重复列,无法使用该函数选取列
df2.reindex(columns=['debt','pop'])
      debt  pop
one      1  1.0
two    NaN  1.7
three  NaN  3.6
four   NaN  2.4
five   NaN  2.9
six    NaN  3.2



3.利用标签的切片运算和普通的python切片不同,包含末端

df4=pd.Series(np.arange(4),index=['a','b','c','d'])
df4
a    0
b    1
c    2
d    3

df4.loc['a':'c']
a    0
b    1
c    2
dtype: int64

df4[1:3]
b    1
c    2
dtype: int64

df4.iloc[:2]#整数切片
0    0
1    1
dtype: int64
df4.loc[:2] #利用索引标签值切片
0    0
1    1
2    2
dtype: int64

4.value_counts不仅可以对Series各值出现的频率,也可以统计任何数组和序列,必须是一维的,如series,list,一维ndarray

5.过滤函数isin()

#Series
pd1=pd.Series(np.random.randint(0,10,(6,)),index=list('abcdef'))
pd1
a    5
b    3
c    9
d    8
e    6
f    8
dtype: int64
pd1.isin([1,2,3,4,5])
a     True
b     True
c    False
d    False
e    False
f    False
dtype: bool
pd1[pd1.isin([1,2,3,4,5])]
a    5
b    3
dtype: int64
#Dateframe

pd2=pd.DataFrame(np.random.randint(0,10,(4,4)),columns=list('abcd'),index=list('abcd'))
pd2
   a  b  c  d
a  1  7  6  1
b  3  1  5  7
c  2  8  4  0
d  8  7  7  0
pd2.isin([1,2,3])#value为list类型
       a      b      c      d
a   True  False  False   True
b   True   True  False  False
c   True  False  False  False
d  False  False  False  False
#过滤会产生空值
pd2[pd2.isin([1,2,3])]
     a    b   c    d
a  1.0  NaN NaN  1.0
b  3.0  1.0 NaN  NaN
c  2.0  NaN NaN  NaN
d  NaN  NaN NaN  NaN
#value类型为字典,匹配键值
value1={'a':[1,2,3],'b':[1,2,3]}
pd2.isin(value1)
       a      b      c      d
a   True  False  False  False
b   True   True  False  False
c   True  False  False  False
d  False  False  False  False

#value类型为Dateframe 一一匹配 index和columns都要对应(二维)
value2=pd.DataFrame(value1)
pd2.isin(value2)
       a      b      c      d
a  False  False  False  False
b  False  False  False  False
c  False  False  False  False
d  False  False  False  False
value2
   a  b
0  1  1
1  2  2
2  3  3
#index也要一样
value2=pd.DataFrame(value1,index=list('abc'))
pd2.isin(value2)
       a      b      c      d
a   True  False  False  False
b  False  False  False  False
c  False  False  False  False
d  False  False  False  False

6.df2.drop(axis=)可以用来删除DataFrame的行或列,注意axis=1 是表示删除一列,axis=0表示删除一行

7.pandas中的空值类型为np.float64 即np.nan pd1=pd.Series([1,np.nan,3])

dropna()默认删除有空值的行 删除列可使axis=1 只要有空值就整行整列删除,drop(how=all)表示只删除所有值都是空值的行和列.fillna()表示用其他值代替NaN

8,df.loc 既可以选取行也可以选取列,逗号前是行操作,逗号后是列操作

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值