


Speech-to-text, eh? I wanted to convert episodes of my favorite podcast so their invaluable content is searchable. I'm moderately excited with the results but I'd like to document the effort nonetheless.

语音转文字,是吗? 我想转换我最喜欢的播客的剧集,以便可以搜索它们的宝贵内容。 我对结果感到有些兴奋,但尽管如此,我还是要记录下来。

深度语音 (DeepSpeech)

First thought - what open-source packages exist out there? Checking out wikipedia I see a brand-new one from Mozilla - DeepSpeech. Intriguing.

首先考虑-那里存在哪些开源软件包? 查看Wikipedia,我看到了Mozilla的全新产品-DeepSpeech 。 有趣的。

安装 (Install)

Wasn't painless so let me drop this here... though it will probably be much better soon enough.


(There's an NPM package too which I missed but...) I saw there's a python installer thing called pip which I have installed on my laptop. Don't remember doing it, but it's there. So as the docs say:

(我也错过了NPM软件包,但是...)我看到有一个名为pip的python安装程序,已安装在笔记本电脑上。 不记得这样做了,但是它就在那里。 因此,正如文档所说:

$ pip install deepspeech

Didn't work. No such package. Turns out I have an old pip. So update it first:

没用没有这样的包。 原来我有一个老pip 。 因此,首先更新它:

$ sudo pip install --upgrade pip



$ pip install deepspeech still didn't work, so sudo it is:

$ pip install deepspeech仍然没有用,所以sudo是:

$ sudo pip install deepspeech

音讯 (Audio)

I grabbed the podcast MP3 (Episode 1), but DeepSpeech requires a special WAV (16bit, mono, yadda-yadda), so ffmpeg to the rescue:

我抓取了播客MP3 (第1集),但DeepSpeech需要特殊的WAV(16位,单声道,yadda-yadda),因此请使用ffmpeg进行救援:

ffmpeg -i UBK_HFH_Ep_001_f.mp3 -acodec pcm_s16le -ar 16000 UBK_HFH_Ep_001_f.wav


Docs say run:


$ deepspeech output_model.pb my_audio_file.wav alphabet.txt

They also say:


$ deepspeech output_model.pb my_audio_file.wav alphabet.txt lm.binary trie

Neither of those work because all these output_model.pb, alphabet.txt are nowhere to be found on my system. Thanks to this discussion, there is a solution.

这些都不起作用,因为在我的系统上找不到所有这些output_model.pbalphabet.txt 。 通过此讨论,有了一个解决方案。

  1. Download the "models" zip from github (warning: 1.3 GB)

    从github下载“模型” zip (警告:1.3 GB)

  2. unzip anywhere

  3. navigate to the models/ folder


  4. Replace output_model.pb from the instructions with output_graph.pb found in the release package


  5. ...and succeed!

$ deepspeech output_graph.pb ../UBK_HFH_Ep_001_f.wav alphabet.txt lm.binary trie 
Loading model from file output_graph.pb
Loaded model in 1.336s.
Loading language model from files lm.binary trie
Loaded language model in 3.863s.
Running inference.


To be fair, there's music and two hosts, and they goof off and there's music production jargon... so the results arew far from perfect. But good enough for searching the content, don't you think? Here goes:

公平地说,这里有音乐和两个主持人,他们愚蠢无聊,而且有音乐制作行话……所以结果远非完美。 但是足以搜索内容了,您不觉得吗? 开始:

i tertoeworoocneatiiyouhadhaateponeormeoversrimoekayameeyourironhmanomyoumeyoninohaveyouevermixedstraightdupediami can easily say yes but i want to make a difficulty so what is ediimanymoreanythingwistoletrounthetinstandswawllthatrighttherewastaexabpleofwhat'sgalled ouseum guess noiid'tyuknowidon'tknowififdubstep or any of the breakbeatvaritiscountasethemigainstsoyupeopledancedtothatit'jitit'sinterestingandhaveyou'resenifvideosoputbedanceandubsstepb'ecausthesdoesbrothers lay it nouns you know its really fastenatingtispuckmahadsorhasbenunotlesteningyouumthewaenerlyavisablecourse of action ye er all those horelessmiglingrlyhothamata electronic dance music all the is it goes significantly beyond the pootstheres also batwatbowhooheboallthemassiv respect to all ourleconadansemuse brother out there loyo'detansatoenfact for many many years spunditwelve hundreds myself at a couple of crates of vinaliwasintominimalackhouseactually not the musinnaybebut what is a is challeninmustinisedyantract pay in the butmoneloy look heres some things i love the dance music but you aln'tedtelistent at im about to day if you make alectronacdancemeslicno many of you we'l have this already to some degree many of you will have a youre better than i am in every in every possible respect your better him being you look better you smell better but for those who dont smell quite a nice as in a less a music try to remember if you can what music thats plate thats played on instruments by people in a room notassarilyrecordingofertibisemainabandinarom they respond to one another and things haftentogether so what doesnt happen is the drumrdoesn'tstart playing a beat and then suddenly both thersitarand then eat bars laterbootthere'sasethat'sthat's not the way unfolds now theres theres some good reasons for doing that and i i trynobancemi'sexpecisiveyou'remakingfinaltat specifically and i mabialemdating myself now if youre making it lethegrdyomanginhistaaai'givyoutafoofrthetroloofeiaacatataaaatualyidigoresesorasnawithleteresalingaprupbucatococashadighoveresation friend about out of a friend i okay with someone i know a age thats better a thats believable about a contretmusiccontracts and how some of them still have a word phonographs really a a just thinking itll be fine i sat side for our anstrutyaaandfunnytouptekdosewith i will provide you if they phonograph m p three as you rherdioofyormakesyowazoathersreadyanackcataasyesinyoursoyeasoeyouegatathonityou'r mixingafar i understand like a sevaspinthetwelvehundredsiunderstandthatthere's a reason to have sounds come in on not metronomically like that but what im saying to you is you need in need more and i m not talking about more in a sense of eangofpaceormorestuffhappengallthtimeontalkingabouttransitionsohyeahagatit because the music is so programmed and because its so it generally is extraordinarily repetitive and then gether thats what its about as good a lot is like this bulifitis and whether it is a isintothe matter but the whinsitsitsgeneraly built a layer at a time generally by the same persons and thats as a couple of clodratorsbutgenerly onpersondoingthisat'sallcomingoutofthe same mind is coming out i in yearly this part and thisbrugsbrtimnotaraneyitflimerilylikethatgetdall your parts out but then move them around play with things first of forest and then once you got it this we here this is a i'monotdoisamixer so if you can get into this space as a composer your life will be so much easier or the delay the guy is mixing it ortorgawlnohevermaxs ing whence you got al your parts arranged and you got the general flow of the song then to pay attention to what is the focus of this a bar sex whats really carry in the groofitmihtnotbewhat you think i said just pushing pool levels around to make something really loud and see how that one than make some elsehorthetoutandturnotofthingriedowngetgetisenealli'ssectionsflow and then create transitions create things that avepeenthatsicnal that it change is coming as this is my first demondropthe's free o signal the change is coming do something to build and dianorndeverybodyin the world he is new its cheesy in a work are is doing as a tritedmore clever but if you listen to the pot and stuff on the on the radio theres always hovers and and sweepswitgooandandthethinghadstissreasontheyput that stuff in it cause its effective it lets you know somethings happening what is the really on happen boonsomehingjust happen in thentellesomesort of explosions at enerjecnesessarysiprosandoosmurassomethinghads that moment new sounds come in old sounds go away generally and something rings out so creattyure transitions like that give me as a mixer something to sink my teeth into as these parts come and go and then as your laringstuffugp when you arrange things have two things come in at once and always have time things coming and not just one to two things happen and makechomcandafdifferentandcontrasting or whatever and the instant you've got like five things hppening and that includes like drums are one thing and bases and other soon three things happen in otherwise if its tied to bring more things in other things got to go away is tis make them go away and it at'saveryselectmomentsinthessongyoucanreallykindopowl things up but othwisestuffkindagoawaywoenitsufcomesan ace keep the errandsockyotobefairetheseare things that every body as wepaintettenshuput bands peopeopl make a musicantharbedroomandbutitspindepoporwhatersiti'salaseverystlaftdlelrameaasolofthe someone on one of those online forums where people talk about things and and typing form responded actually with the very by things like well a lot of young kids these days our learning music on their lattochslariydmusicontheriyehads learning music in other ways and this simplestandfasteswaiyto get that in in some ways the cheat this way to do that as is start off to electronic based musical youve got a usalulysicanyourheadyeudon'thavetagotar round if youre a kid and you love music youre just going to find whatever tent whatever it takes to get that on recorded or down get the idea down somehow i they think edyimandhippopaviouslybackgonth a and still now those styles of music more people are getting on to that i think just because its a little bit more acessiya so you can jusasely get into it and ive noticed the same thing with hippoahalottoftimeslalgeothesehhippoptracks where ah if its a to track and vocal or if its everything'strackedoutamit'sjistthe same stuff through the verse and chorsand i had up getting hired to not only makes it but people'vefordmypreviousworkand iermebcaus they know i gonna troppiteuph to make it sound more interestingmammmdothegoodyaotoyoutlookhetogoodyotoanenlookapthemagrefac world i you at by al means support them endlessly thats the red calledbaktratfidsthebestswayfyoulisacydhaveyouherd about wickyharserebiwickyworsenobutyuvinatactypeinusanayowhithsoudoisyutwichyouorcecaza eople and then you got a referee and to any number of witnesses or whatnot but you have two pele and they start off on the same wikopediapige and wormhegindbelikeigkiecobeinnapenistayistho istory whatever the starting pages and then they say okay and howlheberygoboomand you have to work your way from starting you know with the founding fazsof the constitution to holly berry only through wickapedalinxan so you just clicknclicknclickanyoutratofagure which was on a get you closer to the mark rahdyoundtaewayandthese two people go head to head at this and who ever gets their first winds an thats what they call peckyorseat'ssveryentertaining to see that the past that people traveled to get from pointed a point by a is there a turning u with money i isthrisntthere probably will be i dont go to a who i am i was i i go i speaking of ladygarobretinspiartshe'suhadliningvegas now first shes guys saw that in the side of the buses like britningutacaseilactcoblesser i'carestosailonatlaskeserliim 't think she'llasttyercontrananoidiesii'mcurioustosyealong at last i have no thoughts on how long it will last im just curious to say and like because i can see it plan out where she just burned out like in three shows and i know a us overbuttedwithcokinerolling out of her yearballsumandticataalso see her riding up hefulcontrastandgetinworenowlandand ending up not retiring until shes like sixty seven years old i thats sad i i i see by your option b okay you disisgallnrecordherenonoh he articieho'vereotdanestateis the interwibsfok its not on a side of a bus i dolt know about it thats how my worldworkseisasaniowehhhna
我tertoeworoocneatiiyouhadhaateponeormeoversrimoekayameeyourironhmanomyoumeyoninohaveyouevermixedstraightdupediami可以很容易地说,是的,但我想做出这样的困难是什么ediimanymoreanythingwistoletrounthetinstandswawllthatrighttherewastaexabpleofwhat'sgalled ouseum猜测noiid'tyuknowidon'tknowififdubstep或任何breakbeatvaritiscountasethemigainstsoyupeopledancedtothatit'jitit'sinterestingandhaveyou'resenifvideosoputbedanceandubsstepb'ecausthesdoesbrothers的一五一十的名词,你知道它的作用,你们真的fastenatingtispuckmahadsorhasbenunotlesteningyouumthewaenerlyavisablecourse在所有电子舞曲中,所有这些都大大超越了诗人,也对所有我们的兄弟们尊敬,对我们的兄弟们敬而远之。多年来,loyo'detansatoenfacting在数百个琴弦上使自己成百上千。 p 在butmoneloy中看起来有些东西,我喜欢舞蹈音乐,但如果您制作了lectricacdancemeslic,那么您今天就不会在即时通讯中遇到很多人,我们在某种程度上已经拥有了这种能力,你们中的许多人都会比我更好在所有可能的方面,您都比他更好,您看起来更好,您闻起来更好,但是对于那些闻起来不那么好听的人,如音乐较少,请试着记住您是否可以听见房间里人们在乐器上弹奏的那种音乐。他们并没有互相React,事情就变得紧张起来了,所以没有发生的事情是鼓手没有开始演奏节拍,然后突然两个thersitarand然后都吃了吧,这是没有办法的,现在有这样做的一些很好的理由,并且我尝试了一下。 'sexpecisive,您正在做finaltat,如果您使它变得残障,我现在就自我作弊了。 esorasnawithleteresalingaprupbucatococashadighoveresation朋友关于我的朋友我可以和一个我知道的年龄更好,这对一个contretmusic合同是可信的,并且其中一些人仍然有一个留声机真的一个想法只是我觉得这会很好,我会坐在旁边为我们anstrutyaa andfunnytoup提供如果他们在留声机上唱出mp 3的旋律,或者您正在混音,那么我理解就像塞瓦斯潘那十二个百分百的红色理解,这是有原因的,声音听起来不是节律性的,而是我要对您说的意思是您需要更多的意思因为过渡音乐如此编排,并且因为它是如此的特别具有重复性,所以再说一次,因为它是如此的反复无常,然后就知道这好多了,就像这头牛逼炎,是否是本质上的问题,但发自内心的通常,这个时间通常是由同一个人和几个克隆者在同一时间建立的一层,但是在这一部分中,我通常每年都会出现同一个人的想法,而这个笨拙的brantnotaraneyit则很容易像您把所有部分都取出来,然后先将它们移动到森林中,然后再进行游戏一旦找到了它,我们在这里就是i'monotdoisamixer,因此,如果您能够以作曲家的身份进入这个空间,您的生活将会变得更加轻松,或者这家伙开始混音的延迟很长时间,或者您安排了所有零件后,掌握了这首歌的总体流程,然后注意这是酒吧音乐的重点是什么,这实际上不是您所想的,我想我只是说推动泳池水平使某个声音变得很响亮,然后看看那比使其他声音更令人发狂和发疯是什么原因。的ssectionsflow,然后创建过渡,创建的东西表明改变即将来临,因为这是我的第一个演示者,自由的信号表明变化即将来临,正在做一些事情,而dianorndeverybody在世界上,他的工作变得俗气,表现得更加聪明,但是如果您听广播中的锅和东西,总会徘徊,并扫一扫,发现其中的东西引起了它的有效,它使您知道正在发生的事情真正发生的是什么,这恰恰发生在enerjecnesessarysiprosandoosmurassoasso爆炸的那一类爆炸中,那一刻新的声音一般都消失了,旧的声音听起来像是消失了,而某些东西却发出了这样的声音作为混音器,当这些零件来回移动时,我会深深地陷入其中;然后,当您安排事物时,您的laringstuffugp会同时出现两件事,并且总是有时间出现,而不仅仅是一到两件事发生,并且使chomcandafdifferentandcontrasting或其他瞬间,您有五件事发生, at包括鼓是一回事,而基地很快就发生了三件事,否则,如果绑起来带来更多其他事情,这将使它们消失,并且它的选择瞬间就可以使您真正地变态,但是othwisestuffkindagoawayagowoenitsufcomesan ace可以使errandsareyotobefaired那些我们绘画乐队演奏的人每个人都在做音乐的卧室,但是却在某个在线论坛上某人的某个地方变成了流行音乐,实际上人们通过诸如此类的东西来回应事物和打字形式,如今许多年幼的孩子正在学习音乐。 lattochslariydmusicontheriye曾以其他方式学习音乐,这种简单的立场可以使你以某种方式欺骗它,这是从电子音乐类开始的,如果你是一个孩子,而你喜欢音乐,那你就获得了usalulysicanyourheadyeudon'thavetagotar回合。 ind无论帐篷是什么,要把它录制下来或记下来,我都会以某种方式把想法付诸实践,我仍然认为edyimandhippopalylybackgonth a,但现在这些音乐风格使更多的人开始使用,我认为只是因为它稍微有点acessiya,所以您可以我很高兴地进入了它,而我注意到了河马流行曲调的地方,如果它是一个要跟踪和发声的东西,或者它的所有东西都在诗句和合唱中都被证明是相同的东西的话,那么我不仅受雇于它,而且还很耐心地工作他们知道我将要使之听起来更加有趣。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。有两个贝雷帽,它们从相同的wikopediapige和wormhegindbelikeigkiecobeinnapenistayistho istory开始,无论起始页面是什么,然后他们都说好了,您必须从开始了解宪法的始祖之路到仅通过wickapedalinxan冬青树就开始clicknclicknclickanyoutratofagure,这将使您更接近rahdyoundtaeway标记,这两个人并驾齐驱,谁碰到了第一风,那就是他们所谓的peckyorseat的有趣之处,那就是看到人们旅行的过去从一个角度出发a是否有钱转弯的我,我可能不去找我了,我是谁我会去第二次谈到ladygarobretinspiartshe'suhadliningvegas现在,她首先看到的是,在公交车旁看到的像是britningutacaseilactcoblesser i'carestosailonatlaskeserliim't我以为她最后会没事了n我会好奇地说这样的话会持续多久,因为我可以看到它像三场演出一样计划了她刚刚疲惫的地方,而且我知道一个被烟火滚滚而来的我们从一年一度的比赛中走了出来,还看到她骑着自行车走了过去,直到她退役就像六十七岁的我,那是可悲的,我可以按您的选择看,好吧,您可以disisgallnrecordherenonoh他的articieho'vereotdanestate是interwibsfok,它不在公共汽车的一边,我一定不知道这就是我的世界如何运转eisasaniowehhhna

性能 (Perf)

And yes, the process does take a while:


Inference took 753.507s for 648.908s audio file.


Tell your friends about this post on Facebook and Twitter


翻译自: https://www.phpied.com/taking-mozillas-deepspeech-for-a-spin/


  • 0
  • 0
    觉得还不错? 一键收藏
  • 0




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


