目前有两个文件:segments,segments_new,其中需要segments_new中的第一列数据,以及segments中第一列的说话人信息,拼接成speaker_id,生成的新文件labels中包括两列:segment_id,speaker_id
def make_label(src_dir1,src_dir2):
wf = open('./labels','a')
lines = open(src_dir1,'r').readlines()
rows = open(src_dir2, 'r').readlines()
data = []
speaker = []
for line in lines:
segment_id = line.split(' ')[0]
data.append(segment_id)
for row in rows:
speaker_id_str = row.split(' ')[0].split('_')[3]
speaker.append(speaker_id_str)
lst = [(key,val) for key,val in zip(data,speaker)]
for val in lst:
wf.write(val[0] + ' ' + val[0] + '_' + val[1] + '\n')
make_label('./segments_new','./segments')
segments文件格式:
segments_new文件格式:
生成的labels文件格式: