数据分析-以美国2012大选为例

qq_28368825

已于 2022-03-06 12:55:58 修改

阅读量169

点赞数

文章标签： python

于 2022-03-06 10:38:40 首次发布

本文链接：https://blog.csdn.net/qq_28368825/article/details/123306566

版权

数据预处理缺失值填充负值处理日期格式转换数据分析

关键词由CSDN通过智能技术生成

导入库：

import numpy as np
import pandas as pd

再导入文件：

td=pd.read_csv('./usa_election.txt')
td

在这里插入图片描述

分析数据

td.info()

在这里插入图片描述
分析纯数字的列

td.describe()

在这里插入图片描述

发现有空值，由于保密或者其他原因造成了空值，所以需要用关键字’NOT PR0VIDE’进行填充。

td.fillna(value='NOT PR0VIDE',inplace=True)

查看文件发现有些’contb_receipt_amt’（捐赠金额）为负，需要删除。

td['contb_receipt_amt']<0
td.loc[td['contb_receipt_amt']<0]
drop_indexs=td.loc[td['contb_receipt_amt']<0].index
td.drop(labels=drop_indexs,axis=0,inplace=True)

将日期格式转换为yyyy-mm-dd的格式。
首先建立字典

months={'JAN':'01',
        'FEB':'02',
        'MAR':'03',
        'APR':'04',
        'MAY':'05',
        'JUN':'06',
        'JUL':'07',
        'AUG':'08',
        'SEP':'09',
        'OCT':'10',
        'NOV':'11',
        'DEC':'12'}

然后处理

def transformData(f):
    day,month,year=f.split('-')
    month=months[month]
    return '20'+year+'-'+month+'-'+day
td['contb_receipt_dt']=td['contb_receipt_dt'].map(transformData)

查看老兵最支持谁：

td['contbr_occupation']=='DISABLED VETERAN'
veteran=td.loc[td['contbr_occupation']=='DISABLED VETERAN']
veteran.groupby(by='cand_nm')['contb_receipt_amt'].sum()