从Youtube中下载的字幕是Webvtt格式,我用的射手影音播放器无法正确加载该字幕,所以用Python写了一个脚本将vtt格式的字幕转化为srt格式的字幕。我所使用Python程序编写平台是PSF(Python Software Foundation)官网(Welcome to Python.org)下载的IDLE(Python3.7-64bit),也可以在此网站上下载https://python123.io/,在IDLE中运行该程序即可将vtt格式字幕转成srt格式字幕。
所下载的vtt字幕格式如下:
WEBVTT
Kind: captions
Language: en
00:00:00.960 --> 00:00:05.600
The mathematics we learn in school doesn’t
quite do the field of mathematics justice.
00:00:05.600 --> 00:00:09.490
We only get a glimpse at one corner of it,
but the mathematics as a whole is a huge and
00:00:09.490 --> 00:00:11.690
wonderfully diverse subject.
00:00:11.690 --> 00:00:15.730
My aim with this video is to show you all
that amazing stuff.
00:00:15.730 --> 00:00:18.340
We’ll start back at the very beginning.
转换后的效果如下:
1
00:00:00.960 --> 00:00:05.600
The mathematics we learn in school doesn't
quite do the field of mathematics justice.
2
00:00:05.600 --> 00:00:09.490
We only get a glimpse at one corner of it,
but the mathematics as a whole is a huge and
3
00:00:09.490 --> 00:00:11.690
wonderfully diverse subject.
4
00:00:11.690 --> 00:00:15.730
My aim with this video is to show you all
that amazing stuff.
5
00:00:15.730 --> 00:00:18.340
We'll start back at the very beginning.
脚本代码如下:
import os
#在源代码的目录下再创建了一个WebVtt的目录用于放webvtt格式的字幕文件(后缀名是.vtt)
path = "./WebVtt/The Map of Mathematics.vtt"
#在源代码的目录下再创建了一个Srt的目录用于放转化后的srt格式的字幕文件
srtpath = "./Srt"
vtt = open(path)
#得到vtt文件中的字符串内容
filevtt = vtt.read()
vtt.close()
#用split函数及两个换行符分成字符串列
listvtt = filevtt.split('\n\n')
#构造新的字符串 数字+换行符+字幕内容+两个换行符
srtstring = str(1)+'\n'+listvtt[0]+'\n\n'
for i in range(1,len(listvtt)-1):
srtstring = srtstring+str(i+1)+'\n'+listvtt[i]+'\n\n'
#print(srtstring)
#重命名后缀名
srtName = os.path.basename(path).split('.')
srtfile = open(srtpath+'/'+srtName[0]+'.srt','w')
srtfile.write(srtstring)
srtfile.close()
srtname = os.path.basename(srtpath+'/'+srtName[0]+'.srt')
#输出提示
print(os.path.basename(path)+' '+'has been successfully converted to'+' '+srtname)
注意点:
1.vtt字幕中前三行我是直接删去的,当然也可以在脚本中去除。
2.可能由于编码的问题,原vtt字幕文本中有中文的单引号,字幕加载时会出现问题,目前我是先在转换之前用CTRL+H用英文的单引号替换中文的单引号,这样就没问题了。