背景
需要处理一批textgrid文件,实现长短格式转换(同时出于实验需要删掉一层tier)。
由于textgrid文件中的text存在换行符(\n),且长短格式混淆,导致tgt和praatio都没有办法直接读取。
解决方案
实践发现:
读取长格式换行符 | 读取短格式换行符 | |
---|---|---|
tgt | N | Y |
praatio | Y | N |
在不改变库中源代码的前提下,使用了except实现将一批混合长短格式的textgrid文件批量转为短格式。
代码如下:
from praatio import textgrid as otg
import tgt
import glob
import loguru
import os
def process_by_praatio(path, new_path):
tg = otg.openTextgrid(path, True)
tg.removeTier('待删除tier名')
tg.save(new_path, "short_textgrid", True)
def process_by_tgt(path, new_path):
tg = tgt.read_textgrid(path)
tg.delete_tier('待删除tier名')
tgt.write_to_file(tg, new_path, format='short', encoding='utf-8')
loguru.logger.success(f'File {path} processed successfully by tgt')
path = #输入待处理目录
new_path = #输入输出目录
if not os.path.exists(new_path):
os.makedirs(new_path)
files = glob.glob(path + '/*.TextGrid')
for file in files:
loguru.logger.info(f'Processing file {file}')
new_filepath = new_path + '/' + os.path.basename(file)
try:
process_by_tgt(file, new_filepath)
except IndexError:
loguru.logger.error(f'File {file} failed to process by tgt; trying praatio')
try:
process_by_praatio(file, new_filepath)
except:
loguru.logger.error(f'File {file} failed to process by praatio')