连接
这个其实我SQL里面学过,基本上用的SQL里面的会比较多,一共是利用了左连接右连接,里连接和外连接,这里也都有涉及
当然,我们先读取数据再说
牛逼,这里的是文件夹,里面有很多的表 首先得学会如何批量读取
import pandas as pd
import numpy as np
date_range函数是生成一个固定频率的时间索引,其中periods:固定时期,取值为整数或None
for i in date:
df=pd.read_csv('data/us_report/' + d + '.csv')
df.head()
Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active FIPS Incident_Rate Total_Test_Results People_Hospitalized Case_Fatality_Ratio UID ISO3 Testing_Rate Hospitalization_Rate
0 Alabama US 2020-11-17 05:30:30 32.3182 -86.9023 219232 3249 88038.0 127945.0 1.0 4471.216158 1466603.0 NaN 1.481992 84000001 USA 29911.231169 NaN
1 Alaska US 2020-11-17 05:30:30 61.3707 -152.4044 24399 98 7165.0 17136.0 2.0 3335.269874 872347.0 NaN 0.401656 84000002 USA 119247.209673 NaN
2 American Samoa US 2020-11-17 05:30:30 -14.2710 -170.1320 0 0 NaN 0.0 60.0 0.000000 1768.0 NaN NaN 16 ASM 3177.512985 NaN
3 Arizona US 2020-11-17 05:30:30 33.7298 -111.4312 276912 6302 45737.0 224873.0 4.0 3804.406738 1987781.0 NaN 2.275813 84000004 USA 27309.496990 NaN
4 Arkansas US 2020-11-17 05:30:30 34.9697 -92.3731 134348 2225 115625.0 16498.0 5.0 4451.846442 1516866.0
还蛮多的~~~
值连接
这里是根据几列的值进行连接,这里可以进行建立
df1=pd.DataFrame({'name':['zhang san','li si'],
'age':[20,30]})
df2=pd.DataFrame({'name':['li si','wang wu'],
'gender':['m','f']})
df1.merge(df2,on='name',how="left")
name age gender
0 zhang san 20 NaN
1 li si 30 m
这里,总是不记得打,号
如果是信息一样,列名不一样呢?
df1=pd.DataFrame({'name1':['zhang san','li si'],
'age':[20,30]})
df2=pd.DataFrame({'name2':['li si','wang wu'],
'gender':['m','f']})
df1.merge(df2,left_on='name1',right_on='name2',how="left")
name1 age name2 gender
0 zhang san 20 NaN NaN
1 li si 30 li si m
这里主义,位置不一样是会报错的,因为要考虑他的连接的是否存在列名
如果是同名不同人呢?
df1=pd.DataFrame({'name':['zhang san','zhang san'],
'age':[20,30],
'class':["one","two"]})
df2=pd.DataFrame({'name':['zhang san','zhang san'],
'gender':['m','f'],
'class':["two","one"]})
df1.merge(df2,on=['name','class'],how="left")
name age class gender
0 zhang san 20 one f
1 zhang san 30 two m
如果是不同列名,同名不同人呢? 哈哈 想不到了
索引连接
本质和上面是一样的
df1 = pd.DataFrame({'Age':[20,30]},: index=pd.Series( ['San Zhang','Si Li'],name='Name'))
df2 = pd.DataFrame({'Gender':['F','M']},
index=pd.Series( ['Si Li','Wu Wang'],name='Name'))
df1.join(df2, how='left')
Age Gender
Name
San Zhang 20 NaN
Si Li 30 F
横向与纵向连接
这个是jion函数
df1 = pd.DataFrame({'Name':['San Zhang','Si Li'],'Age':[20,30]})
df2 = pd.DataFrame({'Name':['Wu Wang'], 'Age':[40]})
pd.concat([df1, df2])
Name Age
0 San Zhang 20
1 Si Li 30
0 Wu Wang 40
pd.concat([df1, df2],1)
Name Age Name Age
0 San Zhang 20 Wu Wang 40.0
1 Si Li 30 NaN NaN
大概是这样 0和1和外面的不太一样
也可以在末尾添加,这个和append没哦去吧
s = pd.Series(['Wu Wang', 21], index = df1.columns)
df1.append(s, ignore_index=True)
Name Age
0 San Zhang 20
1 Si Li 21
2 Wu Wang 21
类连接
讲实话,好像也用不到~~~~