python \ pandas

《 Introduction to Data Science:A Python Approach to Concepts,Techniques and Application》第二章:pandas的数据处理初步

目录:
1.读取数据
2.选择数据
3.筛选数据
4.清洗数据
5.数据处理
6.排序
7.分组
8.绘图

相关要点

1.读取数据:

pd.read_csv(‘PycharmProjects/newcode/book_review/educ_figdp_1_Data.csv’#文件路径,na_value = ‘:’,usecols = [’’,’’,’’])
pd.read_excel() pd.read_hdf() pd.read_table() pd.read_clipboard()
a.head() a.tail() 前五行、后五行部分展示

2.选择数据

a[‘value’] a[10:20] a.ix[10:20,[‘time’,‘value’]
0 value1 10 time geo value
1 value2 11
3 value3 …
… 20

3.筛选数据
a[a[‘value’]>=5] != <=

4.清洗数据
a[a[‘value’].isnull()].head()
a.dropna(how = ‘‘any’’,subset = ‘‘value’’)
a.fillna(value = {‘value’ = 0} a.rename(index = {'china m m m m m mm mm ':‘China’})

5.数据处理
增加 a[‘valuenorm’] = a[‘value’]/a[‘value’].max()
a.append() a.drop()

count() discribe() prod() std() var() a[‘value’]/10000 a[‘value’].apply(np.sqrt)
6.排序
a.sort_values(by = ‘value’ ,ascending = False,inplace = True)
a.sort_index(axis = 0,ascending = True,inplace = True)

7.分组
a[[‘geo’,‘value’]].groupby(‘geo’).mean()
8.绘图

a.plot (kind = ‘bar/barh’, style = ‘r’‘y’‘b’,alpha = 4, title = ‘ABC’)

代码演示

      **import pandas as pd
         import matplotlib.pyplot as plt
         edu = pd.read_csv('C:/Users/yangk/PycharmProjects/newcode/book_review/educ_figdp_1_Data.csv',na_values=':',\
              usecols=['TIME','GEO','Value'])
         filtered_data = edu[edu['TIME']>2005]
         pivedu = pd.pivot_table(filtered_data, values = 'Value', index = ['GEO'],columns = ['TIME'])
         pivedu = pivedu.drop(['Euro area (13 countries)','Euro area (15 countries)','Euro area (17 countries)','Euro area (18 countries)',
                  'European Union (25 countries)','European Union (27 countries)','European Union (28 countries)'],axis = 0)
         pivedu = pivedu.rename(index = {'Germany (until 1990 former territory of the FRG)':'Germany'})
         pivedu = pivedu.dropna()
         pivedu.rank(ascending = False,method = 'first')
         totalSum = pivedu.sum(axis = 1).sort_values(ascending = False)
         totalSum.plot(kind = 'bar', style = 'b', alpha = 0.4,title = 'Total Values for Country')
         my_colors = ['b','r','g','y','m','c']
         ax = pivedu.plot(kind = 'barh',stacked = True, color = my_colors)
         ax.legend(loc = 'center left', bbox_to_anchor = (1,.5))**

绘图

图片: 在这里插入图片描述![

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

#杨抄阅

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值