Data Science Intro by Michigan University - Week3

最新推荐文章于 2023-01-11 23:48:00 发布

Walker0319

最新推荐文章于 2023-01-11 23:48:00 发布

阅读量218

点赞数

分类专栏：数据分析笔记文章标签： python pandas

本文链接：https://blog.csdn.net/weixin_57901967/article/details/118183686

版权

1. venn diagram

full outer join (union)
inner join (intersection)

2. merge (join horizontally)

staff_df = pd.DataFrame([{
   'Name': 'Kelly', 'Role': 'HR'}, {
   'Name': 'Sally', 'Role': 'TA'}, {
   'Name': 'James', 'Role': 'GD'}])
staff_df = staff_df.set_index('Name')
stu_df = pd.DataFrame([{
   'Name': 'James', 'School': 'Com'}, {
   'Name': 'Mike', 'School': 'Bus'}, {
   'Name': 'Sally', 'School': 'Eng'}])
stu_df = studf.set_index('Name')

merge

use indices

union

pd.merge(staff_df, stu_df, how='outer', left_index=True, right_indtx=True)

intersection

pd.merge(staff_df, stu_df, how='inner', left_index=True, right_indtx=True)

left join

pd.merge(staff_df, stu_df, how='left', left_index=True, right_indtx=True)

Name	Role	School
Kelly	HR	NaN
Sally	TA	Eng
James	GD	Bus

right join (ommited)

use cloumns

pd.merge(staff_df, stu_df, how='right', on='Name')

when 2 dataframes have conflitc

staff_df = pd.DataFrame([{
   'Name': 'Kelly', 'Role': 'HR', 'Location': 'KKK'},
	{
   'Name': 'Sally', 'Role': 'TA',   'Location': 'SSS'}, 
	{
   'Name': 'James', 'Role': 'GD', 'Location': 'JJJ'}])
stu_df = pd.DataFrame([{
   'Name': 'James', 'School': 'Bus','Location': '1024jjj'},
	{
   'Name': 'Mike', 'School': 'Law', 'Location': 'mmm22'},
	{
   'Name': 'Sally', 'School': 'Eng', 'Location': '512sss'}])
pd.merge(staff_df, stu_df, how='left', on='Name')

Name 	Role 	Location_x 	School 	Location_y

multi-indexing

pd.merge(staff_df, stu_df, how='inner', on=['FName','LName'])

3. concat (join vertically)

3.1 cell magic “%%capture”

3.2 concat

frames = [df_2011, df_2012, df_2013]
pd.concat(frames, keys=['2011', '2-12', '2013'] ) 
#set an extra level of indices with 'keys'

Concatenation also has inner and outer method. If two dataframes do not have identical columns, and choose the outer method, some cells will be NaN. choose to do inner, then some observations will be dropped due to NaN values.

4. pandas idioms - pandorable

4.1 method chaining

def first_approach():
	global df
	return(df.where(df['SUMLEV']==50)
		.dropna()
		.set_index()['STNAEME','CTYNAME']
		.rename(columns={
   'ESTIMATESBASE2010':'Estimates Base 2010'}))
df = pd.read_csv('datasets/census.csv')
timeit.timeit(first_approac

最低0.47元/天解锁文章

Walker0319

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Data Science Intro by Michigan University - Week3

1. venn diagramfull outer join (union)inner join (intersection)2. merge (join horizontally)staff_df = pd.DataFrame([{'Name': 'Kelly', 'Role': 'HR'}, {'Name': 'Sally', 'Role': 'TA'}, {'Name': 'James', 'Role': 'GD'}])staff_df = staff_df.set_index('Name'
复制链接

扫一扫