Pandas
Q: inner join用什么?
# df1, df2
df1.merge(df2, on='xxx', suffixes=('_aaa', '_bbb'))
Q: 多个tables拼接如何?
# df1, df2, df3
df1.merge(df2, on=['xxx', 'yyy']) /
.merge(df3, on='zzz', suffixes=(...))
Q: left join 怎么搞?
df.merge(df2, on='xxx', how='left')
Q: right join又怎么搞?
df1.merge(df2, how='right',
left_on='l_id', right_on='r_id')
Q: outer join是个什么鬼?
df1.merge(df2, on='xxx', how='outer', suffixes=('_x', '_y'))
Q: 这个怎么搞?
#先outer
iron_1_and_2 = iron_1_actors.merge(iron_2_actors, on='id', how='outer', suffixes=('_1', '_2'))
#再找NaN,null
m = ((iron_1_and_2['name_1'].isnull()) |
(iron_1_and_2['name_2'].isnull()))
#filter出null的
print(iron_1_and_2[m].head())
Q: semi-join是啥子?anti-join 呢?
对比与inner join,table并没有合并。
anti-join
# Merge employees and top_cust
empl_cust = employees.merge(top_cust, on='srid',
how='left', indicator=True)
#这个indicator就会显示出该row是来自于left还是right还是both,有一个判别功能。column名为 _merge
# Select the srid column where _merge is left_only
#选择leftonly就是把没有拼接到table2的项目都找出来了。
srid_list = empl_cust.loc[empl_cust['_merge'] == 'left_only', 'srid']
# Get employees not working with top customers
print(employees[employees['srid'].isin(srid_list)])
semi-join
# 1st inner join first,找出共性
df_merge = df1.merge(df2, on='id')
# filter
df_common_list = df1['id'].isin(df_merg['id'])
# select
df1[df_common_list]
Q: 两个tables如何上下拼接?
# df1, df2, df3
df_concat = pd.concat([df1, df2, df3], ingore_index=True)
Q: 如何找correlation?
xxx.corr()
Q: merge_asof()怎么用
Q: 像使用sql语句的是什么?
xxx.query('name == "aaa" or (age>100 and sex=="male")')
Q:
id_vars==>是这两个columns不变的情况