#参考 https://github.com/jackfrued/Python-100-Days/tree/master/Day76-90/code
https://blog.csdn.net/zutsoft/article/details/51498026以及《利用python进行数据分析》
机器学习基础、Pandas的应用、 NumPy和SciPy的应用
Matplotlib和数据可视化、k最近邻(KNN)分类、 决策树、贝叶斯分类
支持向量机(SVM)、K -均值聚类、 回归分析、大数据分析入门
大数据分析进阶、Tensorflow入门、Tensorflow实战、 推荐系统
1-pandas入门.ipynb | |||
2-pandas-索引.ipynb | |||
3-pandas数据清洗之空数据.ipynb | |||
4-pandas多层索引.ipynb | |||
5-pandas多层索引计算.ipynb | |||
6-pandas数据集成.ipynb | |||
7-pandas数据集成merge.ipynb | |||
8-pandas分组聚合操作.ipynb | |||
索引:loc , iloc ,
随机数和随机抽取:col = np.random.choice(cols) ; np.random.randint(a,b,size); np.random.randn()
(1)索引和数据清洗之空数据
其他函数:copy; fillna; ffill; bfill; value; np.arange(); unique ; value_counts; df['Python'].value_counts().head(8) head; isnull(); any; all( # df.isnull().any()/all() );df.notnull().any()
ax = DataFrame(df['Python'].unique()) ; ax.notnull().sum()
df2 = df2.add_suffix('_mean') #add_suffix返回一个新的df
count();
df['Python'].unique() #unqiue返回array,np的格式;Series(df['Python'].unique()).notnull().sum()
# 计算DataFrame的众数,并返回Serie或者列表
zhongshu = []
for col in df.columns:
zhongshu.append(df[col].value_counts().index[0])
zhongshu
df和series:s = Series(np.random.randint(0,150,size = 6),index=list('abcdef'));
(3)多层索引:
- df3 = DataFrame(np.random.randint(0,150,size = (12,3)), columns = ['Python','En', 'Math'],index = pd.MultiIndex.from_product([['张三','李四','王五'],['期中','期末'],['A','B']])) ; df.stack(level); unstack ; stack将多层索引的列变为行,默认level= level.max()
- # 先获取列后获取行 #first col and then index #df['Python']['张三']['期中']
- # 先获取行,后获取列 df = DataFrame(np.random.randint(0,150,size = (6,3)),columns = ['Python','En', 'Math'], index = pd.MultiIndex.from_product([['张三','李四','王五'],['期中','期末']])) df.loc['张三'].loc['期中']['Python']
- df.mean(axis = 1,level = 1).round(1)
(4)数组集成的函数或者方法:df1.append(df2, sort = False)、merge()、concat()、df4.merge(df5,left_index=True,right_index=True)
【pandas】[3] DataFrame 数据合并,连接(merge,join,concat)
https://blog.csdn.net/zutsoft/article/details/51498026
(6)分组聚合:group、groupby
df = DataFrame({'Hand':['right','left','left','right','right','right','right','right','left','right'],
'Smoke':['yes','yes','no','no','yes','no','no','no','no','yes'],
'sex':['male','female','female','male','male','male','female','female','male','female'],
'weight':[80,50,48,75,68,100,40,90,88,76],
'IQ':[100,120,90,130,140,80,94,110,100,160]})
#df.groupby(by = ['Hand'])[['weight']].apply(np.mean).round(1) #后面的weight可用一个[]就行,by那儿也可不用加[]
df.groupby(by = 'Hand')['weight'].apply(np.mean).round(1) #单括号返回series,双括号返回df
ay = df.groupby(by = ['Hand','sex'])[['IQ']].max()
ay.unstack()
#transform只用于Series,比较apply和transform的区别
#agg函数的用法
data = df.groupby(by = ['Hand'])['IQ','weight'];
data.agg(['max','mean']).round(1);data.agg({'IQ':'max','weight':'mean'}).round(1)