【Python基础】案例分析:GDP分析

GDP分析

1 分析过程与目标

  • 数据来源
  • 熟悉数据
  • 分析过程
  • 分析结果呈现
  • 使用知识点与代码实现

1.1 数据来源

  • 企业内部采集数据:web端,小程序,Android或者IOS应用,智能设备(智能电表,温度传感器等)
  • 开方数据平台:国家数据统计局,世界银行数据等
  • 第三方数据集:kaagle等竞赛平台
  • 爬虫抓取第三方数据
  • 数据可能由多源组成

1.2 熟悉数据

  • 通过工具展示数据
  • 查看数据字段
  • 多个数据源观察,数据源关系

2 各国与地区GDP数据分析关系多源组成

import pandas as pd
import numpy as np
%matplotlib inline
#读取excel文件
fpath = r'data\GDP.csv'
f = open(fpath)
pdata = pd.read_csv(f)
pdata
Country NameCountry CodeIndicator NameIndicator Code196019611962196319641965...2008200920102011201220132014201520162017
0ArubaABWGDP (current US$)NY.GDP.MKTP.CDNaNNaNNaNNaNNaNNaN...2.791961e+092.498933e+092.467704e+092.584464e+09NaNNaNNaNNaNNaNNaN
1AfghanistanAFGGDP (current US$)NY.GDP.MKTP.CD5.377778e+085.488889e+085.466667e+087.511112e+088.000000e+081.006667e+09...1.019053e+101.248694e+101.593680e+101.793024e+102.053654e+102.004633e+102.005019e+101.921556e+101.946902e+10NaN
2AngolaAGOGDP (current US$)NY.GDP.MKTP.CDNaNNaNNaNNaNNaNNaN...8.417803e+107.549238e+108.247091e+101.041160e+111.153980e+111.249120e+111.267770e+111.029620e+119.533511e+10NaN
3AlbaniaALBGDP (current US$)NY.GDP.MKTP.CDNaNNaNNaNNaNNaNNaN...1.288135e+101.204421e+101.192695e+101.289087e+101.231978e+101.277628e+101.322824e+101.133526e+101.186387e+10NaN
4AndorraANDGDP (current US$)NY.GDP.MKTP.CDNaNNaNNaNNaNNaNNaN...4.007353e+093.660531e+093.355695e+093.442063e+093.164615e+093.281585e+093.350736e+092.811489e+092.858518e+09NaN
..................................................................
259KosovoXKXGDP (current US$)NY.GDP.MKTP.CDNaNNaNNaNNaNNaNNaN...5.687488e+095.653793e+095.829934e+096.649291e+096.473725e+097.072092e+097.386891e+096.440501e+096.649889e+09NaN
260Yemen, Rep.YEMGDP (current US$)NY.GDP.MKTP.CDNaNNaNNaNNaNNaNNaN...2.691085e+102.513027e+103.090675e+103.272642e+103.539315e+104.041523e+104.322858e+103.773392e+102.731761e+10NaN
261South AfricaZAFGDP (current US$)NY.GDP.MKTP.CD7.575248e+097.972841e+098.497830e+099.423212e+091.037379e+101.133417e+10...2.871000e+112.972170e+113.752980e+114.168780e+113.963330e+113.668100e+113.511190e+113.176110e+112.954560e+11NaN
262ZambiaZMBGDP (current US$)NY.GDP.MKTP.CD7.130000e+086.962857e+086.931429e+087.187143e+088.394286e+081.082857e+09...1.791086e+101.532834e+102.026556e+102.346010e+102.550337e+102.804546e+102.715063e+102.115439e+102.106399e+10NaN
263ZimbabweZWEGDP (current US$)NY.GDP.MKTP.CD1.052990e+091.096647e+091.117602e+091.159512e+091.217138e+091.311436e+09...4.415703e+098.621574e+091.014186e+101.209845e+101.424249e+101.545177e+101.589105e+101.630467e+101.661996e+10NaN

264 rows × 62 columns

2.2 清洗数据

观察数据,删除无用数据;

pdata.columns
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017'],
      dtype='object')
#删除数据
pdata = pdata.drop(['Country Code','Indicator Name', 'Indicator Code'], axis=1)
#重置索引
pdata = pdata.set_index('Country Name')
pdata = pdata.stack()
pdata = pd.DataFrame(pdata)
pdata.columns = ['GDP']
pdata
GDP
Country Name
Aruba19941.330168e+09
19951.320670e+09
19961.379888e+09
19971.531844e+09
19981.665363e+09
.........
Zimbabwe20121.424249e+10
20131.545177e+10
20141.589105e+10
20151.630467e+10
20161.661996e+10

11507 rows × 1 columns

2.3 设定分析目标

  • 主要国家GDP数据变化
  • 从1990年开始主要国家GDP数据变化
  • 中国GDP1990年开始GDP增长与累积增长
  • 中国GDP1990年开始,增长超过10%年份
  • 中国GDP连续5年增长最高的年份

3 主要国家DGP分析

选择国家:['China', 'Japan','United States', 'Germany', 'France', 'United Kingdom']

3.1 主要国家GDP趋势

问题:选择合适图表代表数据趋势?折线图图]最高的年份

import matplotlib.pyplot as plt
import matplotlib
import matplotlib.pyplot as plt
#设置支持中文
matplotlib.rcParams['font.sans-serif'] = ['SimHei']
matplotlib.rcParams['axes.unicode_minus']=False
#设置画布大小
plt.figure(1,figsize=(15, 4))
countrys = ['China', 'Japan','United States', 'Germany', 'France', 'United Kingdom']
for c in countrys:
    plt.plot(pdata.loc[c])
plt.title("主要国家GDP增长趋势")
plt.legend(countrys)
_ = plt.xticks(rotation=90)

请添加图片描述

3.2 1990年开始GDP对比

import matplotlib.pyplot as plt
plt.figure(1,figsize=(15, 4))
countrys = ['China', 'Japan','United States', 'Germany', 'France', 'United Kingdom']
for c in countrys:
    #取国家,切片,取年代
    plt.plot(pdata.loc[c]['1990':])
plt.title("1990年-2016年 主要国家GDP增长趋势")
plt.legend(countrys)
_ = plt.xticks(rotation=90)

请添加图片描述

4 中国GDP分析

4.1 从1990年开始GDP变化化

import matplotlib.pyplot as plt
plt.rcParams['font.family'] = ['sans-serif']
plt.rcParams['font.sans-serif'] = ['SimHei']
_ = plt.figure(1,figsize=(15, 4))
#国家
countrys = ['China']
base1 = 10000*10000*10
base2 = 10000*10000*10
for c in countrys: 
    #中国,年份1990-
    data = pdata.loc[c]['1990':]
    plt.plot(data/base1, label='每年值')
    #累计值
    plt.plot(data.cumsum()/base1,label='累积GDP', color='r')
plt.legend()
_ = plt.xticks(rotation=90)

请添加图片描述

4.2 中国GDP分析增长超过10%的年份

问题: 计算每一年增长率,使用什么知识点?

gdp = pdata.loc['China']['1990':]
gdp
GDP
19903.608580e+11
19913.833730e+11
19924.269160e+11
19934.447310e+11
19945.643250e+11
19957.345480e+11
19968.637470e+11
19979.616040e+11
19981.029040e+12
19991.094000e+12
20001.211350e+12
20011.339400e+12
20021.470550e+12
20031.660290e+12
20041.955350e+12
20052.285970e+12
20062.752130e+12
20073.552180e+12
20084.598210e+12
20095.109950e+12
20106.100620e+12
20117.572550e+12
20128.560550e+12
20139.607220e+12
20141.048240e+13
20151.106470e+13
20161.119910e+13
4.2.1 计算每年增长率思路

思路1:循环迭代 :第二年-第一年/第一年

思路2:利用numpy计算

  • 第一年数据tmp1:[开始:结束-1]
  • 第二年数据tmp2:[第二年:结束]
  • 结果:tmp2-tmp1/tmp1 * 10000算
#获取第一年与第二年数据
tmp1 = gdp.loc[:'2015']['GDP']/1000000
tmp2 = gdp.loc['1991':]['GDP']/1000000
#转换成整数
tmp1 = tmp1.astype('i')
tmp2 = tmp2.astype('i')
#计算增长率
index = (tmp2.values-tmp1.values)/tmp1.values*100
#第一年插入0
grow = np.insert(index, 0, 0)
#插入新的列
gdp['grow'] = grow
gdp
GDPgrow
19903.608580e+110.000000
19913.833730e+116.239296
19924.269160e+1111.357868
19934.447310e+114.172952
19945.643250e+1126.891312
19957.345480e+1130.164001
19968.637470e+1117.588912
19979.616040e+1111.329359
19981.029040e+127.012866
19991.094000e+126.312680
20001.211350e+1210.726691
20011.339400e+1210.570851
20021.470550e+129.791698
20031.660290e+1212.902655
20041.955350e+1217.771594
20052.285970e+1216.908482
20062.752130e+1220.392219
20073.552180e+1229.070211
20084.598210e+1229.447551
20095.109950e+1211.129113
20106.100620e+1219.387078
20117.572550e+1224.127548
20128.560550e+1213.047124
20139.607220e+1212.226668
20141.048240e+139.109607
20151.106470e+135.555026
20161.119910e+131.214674
4.2.2 获取增长大于10%的年份
vals = gdp[gdp.grow > 10]
vals
GDPgrow
19924.269160e+1111.357868
19945.643250e+1126.891312
19957.345480e+1130.164001
19968.637470e+1117.588912
19979.616040e+1111.329359
20001.211350e+1210.726691
20011.339400e+1210.570851
20031.660290e+1212.902655
20041.955350e+1217.771594
20052.285970e+1216.908482
20062.752130e+1220.392219
20073.552180e+1229.070211
20084.598210e+1229.447551
20095.109950e+1211.129113
20106.100620e+1219.387078
20117.572550e+1224.127548
20128.560550e+1213.047124
20139.607220e+1212.226668

4.3 5年连续累加增长率最高年份

分析问题:5年连续增长:第一年+第二年+第三年…+第五年

如何计算并获取最

4.3.1 rolling:移动窗口方法

应用场景:金融,股票,统计等大?

#求连续2个数据中最大值
tmp = pd.Series([1,2,3,1,1,0])
tmp.rolling(2).max()
0    NaN
1    2.0
2    3.0
3    3.0
4    1.0
5    1.0
dtype: float64
tmp = pd.Series([1,2,3,1,1,0])
print(tmp.rolling(2).min_periods)
None
tmp = gdp.rolling(5).sum()
tmp
GDPgrow
1990NaNNaN
1991NaNNaN
1992NaNNaN
1993NaNNaN
19942.180203e+1248.661428
19952.553893e+1278.825430
19963.034267e+1290.175045
19973.568955e+1290.146536
19984.153264e+1292.986450
19994.682939e+1272.407818
20005.159741e+1252.970508
20015.635394e+1245.952447
20026.144340e+1244.414785
20036.775590e+1250.304575
20047.636940e+1261.763489
20058.711560e+1267.945280
20061.012429e+1377.766648
20071.220592e+1397.045161
20081.514384e+13113.590056
20091.829844e+13106.947575
20102.211309e+13109.426172
20112.693351e+13113.161501
20123.194188e+1397.138414
20133.695089e+1379.917531
20144.232334e+1377.898025
20154.728742e+1364.065972
20165.091397e+1341.153098
tmp[tmp['grow']==tmp.grow.max()]
GDPgrow
20081.514384e+13113.590056
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

住在天上的云

如果您喜欢我的文章,欢迎打赏哦

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值