最近项目组需要搞迁移,需要把每个hdfs的location路径mv到指定目录,每个表有多个分区,这时就需要处理,如何将多个分区生成多行,例如,一开始biads.ads_表对应的parition为空,
db | tablename | partition |
biads | ads_ |
通过程序获取表目前有三个分区['pt_d=20190601', 'pt_d=20190602', 'pt_d=20190603']
我期望变成
db | tablename | partition |
biads | ads_ | pt_d=20190601 |
biads | ads_ | pt_d=20190602 |
biads | ads_ | pt_d=20190603 |
,直接上代码,没空解释
# -*- coding:utf-8 -*-
import pandas as pd
c = {
'db': ['biads'],
'tablename': ['ads_'],
'partition':['']
}
df = pd.DataFrame(c)
print(df)
s = ['pt_d=20190601','pt_d=20190602','pt_d=20190603']
print(s)
frame = []
for i in s:
a = [x for x in df[(df['db']=='biads') & (df['tablename']=='ads_')]['partition'].tolist() if x!='']
if not a:
df.loc[(df['db']=='biads') & (df['tablename']=='ads_'),['partition']] = str(i)
else:
l = df[(df['db']=='biads') & (df['tablename']=='ads_')].copy()
l['partition']=str(i)
frame.append(l)
frame.append(df)
r = pd.concat(frame)
print(r)