学习第二天,将会每天坚持打卡学习就学习,如果觉得有帮助可以点赞一哈!!!
# 1.将空值用上下的平均值去填充
1)导入库并且建立一个DataFrame
import pandas as pd
import numpy as np
data={"course":["A","B","C","D","E",np.nan,"F","G"],"grade":[22,34,45,45,67,np.nan,53,23]}
df=pd.DataFrame(data)
df
course | grade | |
---|---|---|
0 | A | 22.0 |
1 | B | 34.0 |
2 | C | 45.0 |
3 | D | 45.0 |
4 | E | 67.0 |
5 | NaN | NaN |
6 | F | 53.0 |
7 | G | 23.0 |
2)开始填充
df["grade"]=df["grade"].fillna(df["grade"].interpolate())
df
course | grade | |
---|---|---|
0 | A | 22.0 |
1 | B | 34.0 |
2 | C | 45.0 |
3 | D | 45.0 |
4 | E | 67.0 |
5 | NaN | 60.0 |
6 | F | 53.0 |
7 | G | 23.0 |
# 2.按照grade列进行去除重复值
df.drop_duplicates(["grade"])
course | grade | |
---|---|---|
0 | A | 22.0 |
1 | B | 34.0 |
2 | C | 45.0 |
4 | E | 67.0 |
5 | NaN | 60.0 |
6 | F | 53.0 |
7 | G | 23.0 |
# 3.将grade列转化为list
df["grade"].to_list()
[22.0, 34.0, 45.0, 45.0, 67.0, 60.0, 53.0, 23.0]
# 4.计算grade列的平均值
df["grade"].mean()
43.625
# 5.获取grade,course列
1)第一种方法
df.grade
0 22.0 1 34.0 2 45.0 3 45.0 4 67.0 5 60.0 6 53.0 7 23.0 Name: grade, dtype: float64
2)第二种方法
df["grade"]
0 22.0 1 34.0 2 45.0 3 45.0 4 67.0 5 60.0 6 53.0 7 23.0 Name: grade, dtype: float64
看到有些小伙伴私信我,这些怎样可以记住,我总结了一哈:
注释1:
常用的方法有(一般末尾带’()‘):eg:以生成的对象df2为例子,1.抽样查看数据:head,tail,sample,take df.head() df2.tail() df2.loc[:,'pop'] df2.iloc[:,1] df2.sample(n=3)sample_idx = p.random.permutation (3) df2.take(sample_idx) 2.drop删除记录/字段data = pd.DataFrame(np.arange(16).reshape((4,4)),index=['Ohio', 'Colorado', 'Utah', 'New York'], columns=['one','two','three','four'])
data.drop(['New York'],inplace=False) # inplace=False,不在原对象上进行操作,返回新对象
data.drop(['Ohio'],inplace=True) # inplace=True,在原对象上进行操作,不返回新对象
data.drop(['one'],axis=1,inplace=True) # 删除列
## 3.函数应用和映射:applymap,apply
data = pd.DataFrame(np.random.randn(3,4), columns=list('abcd'))
np.abs(data)
### 1)将函数应用在每个元素上:精度保留到0.01
format = lambda x: '%.2f' %x
display(data.applymap(format))
display(data.a.map(format))
### 2)将函数应用在一行或一列的数组上:结果为标量或数组
f = lambda x:x.max() - x.mean()
data.apply(f,axis=0)
def f1(x):
return pd.Series([x.min(),x.max()],index=['min','max'])
data.apply(f1)
注释2: 而不带括号的是属性,eg:df2.index,df2.values,df2.columns。