目录
5.找到有哪些state/region使得state的值为NaN,进行去重操作
6.为找到的这些state/region的state项补上正确的值,从而去除掉state这一列的所有NaN
9.我们会发现area(sq.mi)这一列有缺失数据,找出是哪些行
13.排序,并找出人口密度最高的五个州 df.sort_values()
需求:
1.导入文件,查看原始数据
导入块
# -*-coding:utf-8-*-
import numpy as np
import pandas as pd
from pandas import DataFrame
导入文件,查看原始数据
#导入文件,查看原始数据
abb=pd.read_csv("./data/state-abbrevs.csv")
pop=pd.read_csv("./data/state-population.csv")
area=pd.read_csv("./data/state-areas.csv")
print("---------------------------------------------------")
print(abb.head(2))
print("---------------------------------------------------")
print(pop.head(2))
print("---------------------------------------------------")
print(area.head(2))
输出结果
E:\pythonProject5\venv\Scripts\python.exe E:/pythonProject5/人口分析案例.py
---------------------------------------------------
state abbreviation
0 Alabama AL
1 Alaska AK
---------------------------------------------------
state/region ages year population
0 AL under18 2012 1117489.0
1 AL total 2012 4817528.0
---------------------------------------------------
state area (sq. mi)
0 Alabama 52423
1 Alaska 656425
Process finished with exit code 0
2.将人口数据和各州简称数据进行合并
#将人口数据和各州简称数据进行合并
abb_pop=pd.merge(abb,pop,how='outer',left_on="abbreviation",right_on="state/region" )
print(abb_pop.head())
输出结果
E:\pythonProject5\venv\Scripts\python.exe E:/pythonProject5/人口分析案例.py
state abbreviation state/region ages year population
0 Alabama AL AL under18 2012 1117489.0
1 Alabama AL AL total 2012 4817528.0
2 Alabama AL AL under18 2010 1130966.0
3 Alabama AL AL total 2010 4785570.0
4 Alabama AL AL under18 2011 1125763.0
Process finished with exit code 0
3.将合并的数据中重复的abbreviation列进行删除
#将合并的数据中重复的abbreviation列进行删除
abb_pop.drop(labels="abbreviation",axis=1,inplace=True)
print(abb_pop.head())
输出结果
state state/region ages year population
0 Alabama AL under18 2012 1117489.0
1 Alabama AL total 2012 4817528.0
2 Alabama AL under18 2010 1130966.0
3 Alabama AL