数据准备阶段
这里主要用到的是这个总统候选人城镇(./data/president_county_candidate.csv)的这个表来做数据展示
数据操作
数据的导入
path_president_candidate = './data/president_county_candidate.csv'
usa_election =pd.read_csv(path_president_candidate)
usa_election
从这个数据我们可以看出美国的选举包括51个州(state)和n多个城市(county)以及候选人(candition)的名字(Joe Biden、Donald Trump、Jo Jorgensen、Howie Hawkins),以及所代表的党派(party),和总共获得的票数(total_votes),以及是否获胜(won)
usa_election.info()
定义一个函数对输入的州和城镇的名字,画出相印的候选人的得票情况
def county_result(county,state):
data = usa_election[(usa_election["county"] ==county)&(usa_election["state"] ==state)]
fig = px.bar(data,x="candidate",y="total_votes")
fig.update_layout(
title={
"text":"Election result in {country}({state})",
"y":0.95,
"x":0.5
},
xaxis_title="candidate",
yaxis_title="total_votes",
)
fig.show()
county_result("Kent County","Delaware")
对这样子的数据我们要把Donald Trump和Joe Biden所代表的dem和rep单独的拎出来进行数据分析
dem的数据操作
#把共和党的选票按州的形式聚集起来
dem_election =usa_election[usa_election['party']=="DEM"]
dem_election = dem_election.groupby('state').sum()
dem_election.rename(columns ={"total_votes":"dem_votes","won":"dem_won_counties"},inplace = True)
dem_election.head(5)
rep的数据操作
rep_election =usa_election[usa_election['party']=="REP"]
rep_election = rep_election.groupby('state').sum()
rep_election.rename(columns ={"total_votes":"rep_votes","won":"rep_won_counties"},inplace = True)
rep_election.head(5)
这样子我们就得到了我们需要的数据集即DEM/REP在各个州的赢了多少个城镇以及得到的具体票数
画出DEM/REP的总共的得票总量以及占比
def most_voted_candidate():
total_votes_dem = dem_election["dem_votes"].sum()
total_votes_rep = rep_election["rep_votes"].sum()
total_vote = usa_election.total_votes.sum()
print(f"total votes for jeo briden(DEM):{total_votes_dem}")
print(f"total votes for jeo Donald Trump(REP):{total_votes_rep}")
fig =px.bar(x=['DEM',"REP"],y=[total_votes_dem,total_votes_rep])
fig.update_layout(
title = {
"text":"total votes for dem and rep",
"y":0.95,
"x":0.5
},
xaxis_title = "parties",
yaxis_title = "total votes"
)
fig.show()
fig = px.bar (x=['DEM',"REP"] , y = [(total_votes_dem/total_vote),(total_votes_rep/total_vote)])
fig.update_layout( title = {
"text":"total votes for dem and rep",
"y":0.95,
"x":0.5
},
xaxis_title = "parties",
yaxis_title = "% votes"
)
fig.show()
most_voted_candidate()
画地图分析
对两个党派之间的数据进行整合 把胜利者的名字拎出来
president_election = pd.concat([rep_election,dem_election],axis=1)
president_election['winner'] = np.where(president_election.rep_votes>president_election.dem_votes,"Donald Trump","joe biden")
president_election
美国的各个州的经纬度表
path_states_map = "./data/statelatlong.csv"
states_lat_long = pd.read_csv(path_states_map,index_col="City")
states_lat_long
把经纬度表和美国选举的数据进行连接
president_election = pd.concat([president_election,states_lat_long],axis=1)
president_election
可以画出我们需要的地图了
fig = px.choropleth(
president_election,
locations="State",
color="winner",
color_discrete_sequence=["red","blue"],
locationmode= "USA-states",
scope = "north america",
title ="USA Presidential Votes coun"
)
fig.show()
ok 做到这里就结束了这样做一个简单的项目了
需要这样子的小项目来练练手的话可以关注公众号来获取
输入【python_uaselect_001】获取数据集和相应的代码