2016年美国大选数据统计数据分析

概述

数据来源:https://www.kaggle.com/fivethirtyeight/2016-election-polls
因为下载数据需要注册登录,比较麻烦,这边为了方便,我直接把需要分析的数据表导出来啦
链接:https://pan.baidu.com/s/1IasBj6DcqXvFkJox4Zg2VQ?pwd=7ctn
提取码:7ctn

读取CSV文件格式:

loadtxt(fname, dtype = , comments = '#', delimiter = None, converters = None, skiprows = 0,
        usecols = None, unpack = False, nbmin = 0, enconding = 'bytes')

主要参数及其说明:

参数说明
fname读取的CSV文件名
delimiter数据的分隔符
stype数据类型,默认float
comments注释
delimiter分隔符,默认是空格
converters转换元素类型
skiprows跳过前几行读取,默认是0,必须是int整型
usecols要读取哪些列,0是第一列
unpack如果为True,将分列读取
ndmin指定生成数组的最小维度
enconding要使用的编码

题目要求

利用Numpy所学知识完成2016年美国大选数据统计,将Clinton和Trump自2015-11到2016-08每月得票数进行统计输出

1:导入模块

import datetime as dt
import pandas as pd
import numpy as np
import csv

2:获取数据

获取日期、Clinton的票数数据、特朗普投票数据三列需要的数据

data = np.loadtxt('presidential_polls.csv',dtype=str,usecols=(7,17,18),delimiter=',')

查看数据

print(data)
[['enddate' 'adjpoll_clinton' 'adjpoll_trump']
 ['10/31/2016' '42.6414' '40.86509']
 ['10/30/2016' '43.29659' '44.72984']
 ...
 ['9/22/2016' '45.9713' '39.97518']
 ['6/21/2016' '45.2939' '46.66175']
 ['8/18/2016' '31.62721' '44.65947']]

3:数据处理

把数据转化为列表并去掉第一行的的标题

data_poll = data.tolist()[1:]

查看数据

i = 0
for x in data_poll:
    i += 1
    print(x, end = ' ')
    if i % 3 == 0:
        print()
['10/31/2016', '42.6414', '40.86509'] ['10/30/2016', '43.29659', '44.72984'] ['10/30/2016', '46.29779', '40.72604'] 
['10/24/2016', '46.35931', '45.30585'] ['10/25/2016', '45.32744', '42.20888'] ['10/25/2016', '44.6508', '42.26663'] 
['10/31/2016', '46.21834', '43.56017'] ['10/30/2016', '46.89049', '43.50333'] ['10/27/2016', '41.22576', '37.24948'] 
['10/31/2016', '42.21983', '41.6954'] ['10/31/2016', '44.53217', '43.84845'] ['10/27/2016', '41.81832', '47.92262'] 
['10/23/2016', '55.68839', '29.50605'] ['10/30/2016', '43.31551', '40.34972'] ['10/26/2016', '45.20793', '42.01937'] 
['10/26/2016', '43.19458', '45.07725'] ['10/24/2016', '50.18283', '39.33826'] ['10/24/2016', '42.67789', '46.11255'] 
['10/28/2016', '47.77047', '39.80679'] ['10/25/2016', '45.74354', '41.34735'] ['10/17/2016', '46.84417', '39.99571'] 
['10/28/2016', '38.51061', '50.7572'] ['10/28/2016', '41.75385', '38.87231'] ['10/26/2016', '45.63602', '41.55637'] 
['10/28/2016', '41.76', '43.84806'] ['10/26/2016', '45.78602', '45.0337'] ['10/28/2016', '47.77576', '44.78595'] 
['10/31/2016', '44.50363', '44.1804'] ['10/30/2016', '45.66489', '40.41809'] ['10/25/2016', '38.42823', '49.47709'] 
['10/24/2016', '43.04313', '38.24964'] ['10/30/2016', '44.7114', '41.14791'] ['10/30/2016', '42.7114', '46.14791'] 
['10/30/2016', '44.7114', '45.14791'] ['10/30/2016', '45.7114', '40.14791'] ['10/30/2016', '46.38828', '44.13978'] 
['10/23/2016', '45.74498', '41.64333'] ['10/24/2016', '43.73338', '39.62985'] ['10/24/2016', '45.73579', '46.35058'] 
['10/30/2016', '46.7114', '41.14791'] ['10/26/2016', '44.08772', '44.58124'] ['10/30/2016', '48.63733', '43.2056'] 
['10/26/2016', '47.3517', '39.01773'] ['10/18/2016', '47.20443', '41.15833'] ['10/22/2016', '42.63636', '46.87188'] 
['10/24/2016', '46.95812', '39.36292'] ['10/25/2016', '43.94143', '42.24168'] ['10/30/2016', '43.7114', '46.14791'] 
['10/30/2016', '45.63095', '40.2258'] ['10/31/2016', '42.73449', '45.10929'] ['10/24/2016', '41.09849', '44.92952'] 
['10/30/2016', '45.78382', '40.53563'] ['10/29/2016', '39.79761', '50.76878'] ['10/30/2016', '48.5308', '39.71922'] 
......
['6/12/2016', '46.06344', '38.65057'] ['10/19/2016', '31.53417', '29.49314'] ['11/15/2015', '47.57453', '37.87221'] 
['2/21/2016', '46.96003', '39.42957'] ['10/13/2016', '38.10209', '53.95455'] ['7/19/2016', '50.60115', '33.0715'] 
['7/11/2016', '43.29751', '41.88533'] ['8/16/2016', '29.94538', '36.82408'] ['9/22/2016', '30.45553', '47.80848'] 
['8/10/2016', '42.62525', '42.01089'] ['1/7/2016', '42.07473', '45.06726'] ['8/4/2016', '26.74404', '40.16534'] 
['7/11/2016', '40.33774', '41.5603'] ['10/13/2016', '37.30964', '54.76821'] ['10/6/2016', '49.13094', '39.41588'] 
['9/22/2016', '45.9713', '39.97518'] ['6/21/2016', '45.2939', '46.66175'] ['8/18/2016', '31.62721', '44.65947'] 

1:日期处理

将日期由mm/dd/yyyy转化为yyyy/mm的格式:
知识点:
(1)%y 两位数的年份表示(00 - 99)
(2)%Y 四位数的年份表示(000 - 9999)
(3)%m 月份(01 - 12)
(4)%d 月内中的一天(0 - 31)
1:用列表解析式把日期提取出来

date = [i[0] for i in data_poll] 

2:把日期的f分成三个参数分别用m,d,Y保存
date1 = [dt.datetime.strptime(date,'%m/%d/%Y') for date in date]

3:以yyyy-mm的格式保存日期的年和月
date2 = [i.strftime('%Y-%m') for i in date1]

2:投票数据处理

1:处理Clinton的投票数据,先遍历一边数据,把空的数值数据初始化为零,最后把数据转化为浮点数类型

Clinton_poll = [i[1] for i in data_poll]

for i in range(len(Clinton_poll)):
    if Clinton_poll[i] =='':
        Clinton_poll[i]='0'
Clinton_poll_arr = np.array( Clinton_poll,dtype=np.float64)

2:处理Trump的投票数据,同样先遍历一边数据,把空的数值数据初始化为零,最后把数据转化为浮点数类型

Trump_poll = [i[2] for i in data_poll]
for i in range(len(Trump_poll)):
    if Trump_poll[i] =='':
        Trump_poll[i]='0'

Trump_poll_arr = np.array(Trump_poll,dtype=np.float64)

3:使用DataFrame合并日期,Clinton的投票数据以及Trump的投票数据为一个二维数组

my_data = pd.DataFrame({'Date':date2, 'Clinton': Clinton_poll_arr, 'Trump':Trump_poll_arr},
                       columns=['Date','Clinton','Trump'])

4:对Clinton每月的数据求和并输出2015-11到2016-08的投票数据

sum_Clinton = my_data['Clinton'].groupby(my_data['Date']).sum()

print('Clinton从2015-11到2016-08的投票数据如下:')
i = 0
for key_value in sum_Clinton.items():
    i += 1
    print(key_value)
    if i == 10:
        break;

5:对Trump每月的数据求和并输出2015-11到2016-08的投票数据

sum_Trump = my_data['Trump'].groupby(my_data['Date']).sum()
print('Trump从2015-11到2016-08的投票数据如下:')
i = 0
for key_value in sum_Trump.items():
    i += 1
    print(key_value)
    if i == 10:
        break;

完整代码

import numpy as np
import pandas as pd
import datetime as dt
import csv

data = np.loadtxt('presidential_polls.csv',dtype=str,usecols=(7,17,18),delimiter=',')
data_poll = data.tolist()[1:]

date = [i[0] for i in data_poll]
date1 = [dt.datetime.strptime(date,'%m/%d/%Y') for date in date]
date2 = [i.strftime('%Y-%m') for i in date1]

Clinton_poll = [i[1] for i in data_poll]

for i in range(len(Clinton_poll)):
    if Clinton_poll[i] =='':
        Clinton_poll[i]='0'
Clinton_poll_arr = np.array( Clinton_poll,dtype=np.float64)

Trump_poll = [i[2] for i in data_poll]
for i in range(len(Trump_poll)):
    if Trump_poll[i] =='':
        Trump_poll[i]='0'

Trump_poll_arr = np.array(Trump_poll,dtype=np.float64)

my_data = pd.DataFrame({'Date':date2, 'Clinton': Clinton_poll_arr, 'Trump':Trump_poll_arr},
                       columns=['Date','Clinton','Trump'])
sum_Clinton = my_data['Clinton'].groupby(my_data['Date']).sum()

print('Clinton从2015-11到2016-08的投票数据如下:')
i = 0
for key_value in sum_Clinton.items():
    i += 1
    print(key_value)
    if i == 10:
        break;

print('------------------------------------------')

sum_Trump = my_data['Trump'].groupby(my_data['Date']).sum()
print('Trump从2015-11到2016-08的投票数据如下:')
i = 0
for key_value in sum_Trump.items():
    i += 1
    print(key_value)
    if i == 10:
        break;



Clinton从2015-112016-08的投票数据如下:
('2015-11', 1916.6980600000002)
('2015-12', 4637.256880000004)
('2016-01', 6585.16702)
('2016-02', 7946.2286100000065)
('2016-03', 11156.098239999998)
('2016-04', 11579.426779999998)
('2016-05', 12242.275380000008)
('2016-06', 19771.335760000005)
('2016-07', 23233.11167999999)
('2016-08', 67909.28209999984)
------------------------------------------
Trump从2015-112016-08的投票数据如下:
('2015-11', 1937.3290100000002)
('2015-12', 4088.921899999999)
('2016-01', 6253.249349999999)
('2016-02', 7672.339800000001)
('2016-03', 9991.593580000008)
('2016-04', 9884.156190000002)
('2016-05', 12069.761289999995)
('2016-06', 18154.906229999993)
('2016-07', 22757.07327000001)
('2016-08', 66428.29714000005)

实验总结

实训的过程不是特别顺利,尤其是在网站里面获取数据尝试了很多种方法才成功,正如陆游笔下的
——山重水复疑无路,柳暗花明又一村
后面的过程还是比较顺利的

  • 9
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值