egs2 实例概述
一个简单的表格是这么创建的:
Directory name | Corpus name | Task | Language | URL |
---|---|---|---|---|
aishell | AISHELL-ASR0009-OS1 Open Source Mandarin Speech Corpus | ASR | ZH | http://www.aishelltech.com/kysjcp |
ami | The AMI Meeting Corpus | ASR | EN | http://groups.inf.ed.ac.uk/ami/corpus/ |
an4 | CMU AN4 database | ASR/TTS | EN | http://www.speech.cs.cmu.edu/databases/an4/ |
babel | IARPA Babel corups | ASR | ~20 Languages | https://www.iarpa.gov/index.php/research-programs/babel |
chime4 | The 4th CHiME Speech Separation and Recognition Challenge | ASR/Multichannel ASR | EN | http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/ |
commonvoice | The Mozilla Common Voice | ASR | 13 Languages | https://voice.mozilla.org/datasets |
csj | Corpus of Spontaneous Japanese | ASR | JP | https://pj.ninjal.ac.jp/corpus_center/csj/en/ |
csmsc | Chinese Standard Mandarin Speech Copus | TTS | ZH | https://www.data-baker.com/open_source.html |
dirha_wsj | Distant-speech Interaction for Robust Home Applications | Multi-Array ASR | EN | https://dirha.fbk.eu/, https://github.com/SHINE-FBK/DIRHA_English_wsj |
how2 | How2: A Large-scale Dataset for Multimodal Language Understanding | ASR/Machine Translation/Speech Translation | EN->PT | https://github.com/srvk/how2-dataset |
jsss | JSSS: Japanese speech corpus for summarization and simplification | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/jsss_corpus |
jsut | Japanese speech corpus of Saruwatari-lab., University of Tokyo | ASR/TTS | JP | https://sites.google.com/site/shinnosuketakamichi/publication/jsut |
jvs | JVS (Japanese versatile speech) corpus | TTS | JP | https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus |
laborotv | LaboroTVSpeech (A large-scale Japanese speech corpus on TV recordings) | ASR | JP | https://laboro.ai/column/eg-laboro-tv-corpus-jp |
librispeech | LibriSpeech ASR corpus | ASR | EN | http://www.openslr.org/12 |
ljspeech | The LJ Speech Dataset | TTS | EN | https://keithito.com/LJ-Speech-Dataset/ |
mini_an4 | Mini version of CMU AN4 database for the integration test | ASR/TTS | EN | http://www.speech.cs.cmu.edu/databases/an4/ |
nsc | National Speech Corpus | ASR | EN-SG | https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus |
mls | MLS (A large multilingual corpus derived from LibriVox audiobooks) | ASR | 8 languages | http://www.openslr.org/94/ |
open_li52 | Corpus combination with 52 languages(Commonvocie + voxforge) | Multilingual ASR | 52 languages | |
ru_open_stt | Russian Open Speech To Text (STT/ASR) Dataset | ASR | RU | https://github.com/snakers4/open_stt |
reverb | REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge | ASR | EN | https://reverb2014.dereverberation.com/ |
spgispeech | SPGISpeech 5k corpus | ASR | EN | https://datasets.kensho.com/datasets/scribe |
timit | TIMIT Acoustic-Phonetic Continuous Speech Corpus | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S1 |
vctk | English Multi-speaker Corpus for CSTR Voice Cloning Toolkit | TTS | EN | http://www.udialogue.org/download/cstr-vctk-corpus.html |
vivos | VIVOS (Vietnamese corpus for ASR) | ASR | VI | https://ailab.hcmus.edu.vn/vivos/ |
voxforge | VoxForge | ASR | 7 languages | http://www.voxforge.org/ |
wsj | CSR-I (WSJ0) Complete, CSR-II (WSJ1) Complete | ASR | EN | https://catalog.ldc.upenn.edu/LDC93S6A,https://catalog.ldc.upenn.edu/LDC94S13A |
wsj0_2mix | MERL WSJ0-mix multi-speaker dataset | ASR/SE | EN | http://www.merl.com/demos/deep-clustering |
wsj0_2mix_spatialized | MERL WSJ0-mix multi-speaker dataset (Spatialized version) | ASR/Multichannel ASR/SE | EN | http://www.merl.com/demos/deep-clustering |
yesno | The “yesno” corpus | ASR | HE | http://www.openslr.org/1 |
zeroth_korean | Zeroth-Korean | ASR | KR | http://www.openslr.org/40 |
使用方法
See: https://espnet.github.io/espnet/espnet2_tutorial.html#recipes-using-espnet2