Description:在ted2srt网站下载TED资源的字幕文件时,如果同时勾选English和中文,那么得到的字幕文件格式如下。
其中字幕时间的信息多了一行,分别对应英文字幕和中文字幕的时间轴,在播放时会有问题。
1
00:00:12,368 --> 00:00:13,784
00:00:12,368 --> 00:00:13,758 # 多余一行,应删除
Applying for jobs online
在线申请工作
2
00:00:13,808 --> 00:00:16,424
00:00:13,758 --> 00:00:16,398 # 多余一行,应删除
is one of the worst digital experiences of our time.
是我们这个时代最糟糕的 数字化体验之一。
3
00:00:16,448 --> 00:00:19,144
00:00:16,398 --> 00:00:19,129 # 多余一行,应删除
And applying for jobs in person really isn't much better.
面对面交谈也没好到哪儿去。
What?删除多余的第二行字幕时间信息,目标输出如下。
1
00:00:12,368 --> 00:00:13,784
Applying for jobs online
在线申请工作
2
00:00:13,808 --> 00:00:16,424
is one of the worst digital experiences of our time.
是我们这个时代最糟糕的 数字化体验之一。
3
00:00:16,448 --> 00:00:19,144
And applying for jobs in person really isn't much better.
面对面交谈也没好到哪儿去。
Solution:正则表达式
## 待替换的pattern
(,\d+)\n0\d+.*?\d$
## 替换后的pattern
\1