Intermediate Python -- DataCamp
进阶数据操作(Dictionary&DataFrame)
List
数据单独列出来就是List
pop = [30.55, 2.77, 39.21]
countries = ["afghanistan", "albania", "algeria"]
ind_alb = countries.index("albania")
ind_alb
1
pop[ind_alb]
2.77
list使用起来不是很方便。
Dictionary
两组list的数据对应起来的话就变成Dictionary了
world = {"afghanistan":30.55, "albania":2.77, "algeria":39.21}
world["albania"]
2.77
新增或者删除数值
world["sealand"] = 0.000028
world
{'afghanistan': 30.55, 'albania': 2.81, 'algeria': 39.21, 'sealand': 2.8e-05}
del(world["sealand"]) world
{'afghanistan': 30.55, 'albania': 2.81, 'algeria': 39.21}
多层字典
说白了就是字典里面套字典
# Dictionary of dictionaries
europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
'france': { 'capital':'paris', 'population':66.03 },
'germany': { 'capital':'berlin', 'population':80.62 },
'norway': { 'capital':'oslo', 'population':5.084 } }
# Print out the capital of France
europe["france"]["capital"]
# Create sub-dictionary data
data={"capital":"rome","population":59.83}
# Add data to europe under key 'italy'
europe["italy"]=data
# Print europe
print(europe)
{'france': {'population': 66.03, 'capital': 'paris'}, 'italy': {'population': 59.83, 'capital': 'rome'}, 'germany': {'population': 80.62, 'capital': 'berlin'}, 'norway': {'population': 5.084, 'capital': 'oslo'}, 'spain': {'population': 46.77, 'capital': 'madrid'}}
DataFrame
数据分析里用的最多的其实还是DataFrame(数据框),操作数据框的话会用到pandas工具包 。
把字典转换成数据框
关键语法
import xx as x
pd.DataFrame()
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
# Import pandas as pd
import pandas as pd
# Create dictionary my_dict with three key:value pairs: my_dict
my_dict={"country":names,"drives_right":dr,"cars_per_cap":cpc}
# Build a DataFrame cars from my_dict: cars
cars=pd.DataFrame(my_dict)
# Print cars
print(cars)
cars_per_cap country drives_right
0 809 United States True
1 731 Australia False
2 588 Japan False
3 18 India False
4 200 Russia True
5 70 Morocco True
6 45 Egypt True
设置行名
cars.index=xxxxx
import pandas as pd
# Build cars DataFrame
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr = [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]
cars_dict = { 'country':names, 'drives_right':dr, 'cars_per_cap':cpc }
cars = pd.DataFrame(cars_dict,)
print(cars)
# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']
# Specify row labels of cars
cars.index=row_labels
# Print cars again
print(cars)
cars_per_cap country drives_right
0 809 United States True
1 731 Australia False
2 588 Japan False
3 18 India False
4 200 Russia True
5 70 Morocco True
6 45 Egypt True
cars_per_cap country drives_right
US 809 United States True
AUS 731 Australia False
JPN 588 Japan False
IN 18 India False
RU 200 Russia True
MOR 70 Morocco True
EG 45 Egypt True
读取csv文件
cars= pd.read_csv("cars.csv",index_col = 0)
Pandas的简单操作
之后会花篇幅详细学习Pandas
选择列
[]和[[ ]]的区别,带列名和不带列名
print(cars["country"])
US United States
AUS Australia
JPN Japan
IN India
RU Russia
MOR Morocco
EG Egypt
Name: country, dtype: object
print(cars[["country"]])
country
US United States
AUS Australia
JPN Japan
IN India
RU Russia
MOR Morocco
EG Egypt
print(cars[["country","drives_right"]])
country drives_right
US United States True
AUS Australia False
JPN Japan False
IN India False
RU Russia True
MOR Morocco True
EG Egypt True
选择行
可以选取指定行,这个和R很相似
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out first 3 observations
print(cars[0:3])
# Print out fourth, fifth and sixth observation
print(cars[3:6])
cars_per_cap country drives_right
US 809 United States True
AUS 731 Australia False
JPN 588 Japan False
cars_per_cap country drives_right
IN 18 India False
RU 200 Russia True
MOR 70 Morocco True
print(cars[0:3])
cars_per_cap country drives_right
US 809 United States True
AUS 731 Australia False
JPN 588 Japan False
output:
cars_per_cap country drives_right
US 809 United States True
AUS 731 Australia False
JPN 588 Japan False
cars_per_cap country drives_right
IN 18 India False
RU 200 Russia True
MOR 70 Morocco True
loc and iloc
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)
# Print out drives_right value of Morocco
print(cars.loc[["MOR","drives_right"]])
# Print sub-DataFrame
print(cars.loc[["RU","MOR"],["country","drives_right"]])
output:
cars_per_cap country drives_right
MOR 70.0 Morocco True
drives_right NaN NaN NaN
country drives_right
RU Russia True
MOR Morocco True