2024 | |
LDC2024T02 | AIDA Scenario 1 Practice Topic Annotation |
LDC2024T06 | AIDA Scenario 2 Practice Topic Annotation |
LDC2024T04 | AIDA Scenario 2 Practice Topic Source Data |
LDC2024T05 | Automatic Content Extraction for Portuguese |
LDC2024S04 | BabyEars Affective Vocalizations |
LDC2024S05 | Call My Net 1 |
LDC2024S06 | Diaspora Tibetan Speech |
LDC2024S01 | KASET - Kurmanji and Sorani Kurdish Speech and Transcripts |
LDC2024T03 | LoReHLT Hausa Representative Language Pack |
LDC2024T01 | LORELEI Farsi Representative Language Pack |
LDC2024S03 | RATS Low Speech Density |
LDC2024S02 | Second Language University Speech Intelligibility Corpus |
2023 | |
LDC2023V01 | 2019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual |
LDC2023S03 | 2019 NIST Speaker Recognition Evaluation Test Set -- CTS Challenge |
LDC2023S06 | 2019 OpenSAT Public Safety Communications Simulation |
LDC2023T10 | AIDA Scenario 1 and 2 Reference Knowledge Base |
LDC2023T11 | AIDA Scenario 1 Practice Topic Source Data |
LDC2023S01 | AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts |
LDC2023S08 | CALLFRIEND Russian Speech |
LDC2023T09 | CALLFRIEND Russian Text |
LDC2023T04 | DEFT English Light and Rich ERE Annotation |
LDC2023S10 | Kasdi-Merbah (University) Emotional Database in Arabic Speech |
LDC2023S07 | LDC Spoken Language Sampler - Sixth Release |
LDC2023T07 | LORELEI Indonesian Representative Language Pack |
LDC2023T01 | LORELEI Swahili Representative Language Pack |
LDC2023T02 | LORELEI Tagalog Representative Language Pack |
LDC2023T03 | LORELEI Tamil Representative Language Pack |
LDC2023T08 | LORELEI Thai Representative Language Pack |
LDC2023T06 | LORELEI Zulu Representative Language Pack |
LDC2023S02 | Mixer 3 Speech |
LDC2023S04 | Mixer 7 Spanish Speech |
LDC2023L01 | Moroccan Arabic - English Lexical Database |
LDC2023T05 | Penn Korean Universal Dependency Treebank |
LDC2023S09 | REMIX Telephone Collection |
LDC2023S05 | Samrómur Queries Icelandic Speech 1.0 |
LDC2023T13 | TAC KBP Belief and Sentiment - Comprehensive Training and Evaluation Data 2016-2017 |
2022 | |
LDC2022S10 | 2017 NIST Language Recognition Evaluation Training and Development Sets |
LDC2022S01 | 2017 NIST OpenSAT Pilot - SSSF |
LDC2022T02 | AttImam |
LDC2022T06 | BOLT English Translation Treebank - Egyptian Arabic SMS/Chat |
LDC2022T07 | CAMIO Transcription Languages |
LDC2022S13 | Global TIMIT Thai |
LDC2022V01 | HAVIC MED Novel 1 Test -- Videos, Metadata and Annotation |
LDC2022V02 | HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation |
LDC2022T05 | LORELEI Bengali Representative Language Pack |
LDC2022T01 | LORELEI Kinyarwanda Incident Language Pack |
LDC2022T03 | LORELEI Wolof Representative Language Pack |
LDC2022S08 | MASRI Synthetic |
LDC2022S04 | NUBUC |
LDC2022T04 | Qatari Corpus of Argumentative Writing |
LDC2022L01 | Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon |
LDC2022S11 | Samrómur Children Icelandic Speech 1.0 |
LDC2022S05 | Samrómur Icelandic Speech 1.0 |
LDC2022S06 | Second DIHARD Challenge Evaluation - Eleven Sources |
LDC2022S07 | Second DIHARD Challenge Evaluation - SEEDLingS |
LDC2022S03 | Spoken Digits in Hindi and Indian English |
LDC2022S02 | The Child Subglottal Resonances Database |
LDC2022S12 | Third DIHARD Challenge Development |
LDC2022S14 | Third DIHARD Challenge Evaluation |
LDC2022S09 | Xi'an Guanzhong Object Naming |
2021 | |
LDC2021S01 | Althingi Parliamentary Speech |
LDC2021T04 | ATIS - Seven Languages |
LDC2021T07 | BOLT Chinese Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech |
LDC2021T11 | BOLT Chinese SMS/Chat Parallel Training Data |
LDC2021T14 | BOLT Egyptian Arabic Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech |
LDC2021T18 | BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech |
LDC2021T15 | BOLT Egyptian Arabic SMS/Chat Parallel Training Data |
LDC2021T12 | BOLT Egyptian Arabic Treebank - Conversational Telephone Speech |
LDC2021T17 | BOLT Egyptian Arabic Treebank - SMS/Chat |
LDC2021T19 | BOLT English Translation Treebank - Chinese SMS/Chat |
LDC2021T03 | BOLT English Treebank - SMS/Chat |
LDC2021T13 | Chinese Abstract Meaning Representation 2.0 |
LDC2021L01 | Classical Arabic Dictionary |
LDC2021S02 | Columbia Games Corpus |
LDC2021T16 | DiscAlign for Penn and RST Discourse Treebanks |
LDC2021T10 | ESPADA |
LDC2021S06 | Ethnobotanical Research and Language Documentation of Nahuatl |
LDC2021S03 | Global TIMIT Mandarin Chinese |
LDC2021V01 | HAVIC MED Training Data -- Videos, Metadata and Annotation |
LDC2021T02 | LORELEI Akan Representative Language Pack |
LDC2021S05 | MyST Children's Conversational Speech |
LDC2021T05 | Penn Discourse Treebank Version 2.0 - German Translation |
LDC2021S08 | RATS Speaker Identification |
LDC2021S10 | Second DIHARD Challenge Development - Eleven Sources |
LDC2021S11 | Second DIHARD Challenge Development - SEEDLingS |
LDC2021T08 | TAC KBP English Sentiment Slot Filling -- Comprehensive Training and Evaluation Data 2013-2014 |
LDC2021T06 | TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010 |
LDC2021S04 | The SSNCE Database of Tamil Dysarthric Speech |
LDC2021S09 | UCLA Speaker Variability Database |
LDC2021S07 | Wikipedia Spanish Speech and Transcripts |
LDC2021T09 | X-SRL: Parallel Cross-lingual Semantic Role Labeling |
2020 | |
LDC2020S04 | 2018 NIST Speaker Recognition Evaluation Test Set |
LDC2020T02 | Abstract Meaning Representation (AMR) Annotation Release 3.0 |
LDC2020T07 | Abstract Meaning Representation 2.0 - Four Translations |
LDC2020T15 | BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training |
LDC2020T05 | BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training |
LDC2020T20 | BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech |
LDC2020T21 | BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech |
LDC2020T09 | BOLT English Translation Treebank - Chinese Discussion Forum |
LDC2020S08 | CALLFRIEND American English-Southern Dialect Second Edition |
LDC2020S06 | CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition |
LDC2020T01 | Chinese CogBank |
LDC2020L02 | Chinese Lexical Resources for Gender, Number, Animacy |
LDC2020T23 | Corpus of Law, Academic, and News |
LDC2020L01 | Database of Word Level Statistics - Mandarin |
LDC2020T19 | DEFT Chinese Light and Rich ERE Annotation |
LDC2020T06 | EVALution |
LDC2020S11 | Global TIMIT Learner Simple English |
LDC2020S09 | Global TIMIT Learner Treebank English |
LDC2020S12 | Global TIMIT Mandarin Chinese-Guanzhong Dialect |
LDC2020S02 | IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b |
LDC2020S07 | IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b |
LDC2020S10 | IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b |
LDC2020S01 | LibriVox Spanish |
LDC2020T10 | LORELEI Entity Detection and Linking Knowledge Base |
LDC2020T11 | LORELEI Oromo Incident Language Pack |
LDC2020T22 | LORELEI Tigrinya Incident Language Pack |
LDC2020T24 | LORELEI Ukrainian Representative Language Pack |
LDC2020T17 | LORELEI Vietnamese Representative Language Pack |
LDC2020T04 | Machine Reading Phase 1 IC Training Data |
LDC2020S03 | Mixer 4 and 5 Speech |
LDC2020S05 | Multi-Language Conversational Telephone Speech 2011 -- Mandarin Chinese |
LDC2020T16 | Penn Parsed Corpora of Historical English |
LDC2020S13 | Phonemes of Arabic |
LDC2020T12 | SemTransCNC |
LDC2020T14 | Speech Sentiment Annotations |
LDC2020T03 | TAC KBP English Event Argument - Training and Evaluation Data 2014-2015 |
LDC2020T13 | TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015 |
LDC2020T08 | TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013 |
LDC2020T18 | TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017 |
2019 | |
LDC2019S20 | 2016 NIST Speaker Recognition Evaluation Test Set |
LDC2019T01 | BOLT Arabic Discussion Forum Parallel Training Data |
LDC2019T13 | BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training |
LDC2019T18 | BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training |
LDC2019T06 | BOLT Egyptian-English Word Alignment -- Discussion Forum Training |
LDC2019T15 | BOLT English Treebank - Discussion Forum |
LDC2019S21 | CALLFRIEND American English-Non-Southern Dialect Second Edition |
LDC2019S18 | CALLFRIEND Canadian French Second Edition |
LDC2019S04 | CALLFRIEND Egyptian Arabic Second Edition |
LDC2019T07 | Chinese Abstract Meaning Representation 1.0 |
LDC2019S07 | CIEMPIESS Experimentation |
LDC2019T11 | Corpus of Conversational Persian Transcripts |
LDC2019T03 | DEFT Chinese Committed Belief Annotation |
LDC2019T16 | DEFT English Committed Belief Annotation |
LDC2019T09 | DEFT Spanish Committed Belief Annotation |
LDC2019S09 | First DIHARD Challenge Development - Eight Sources |
LDC2019S10 | First DIHARD Challenge Development - SEEDLingS |
LDC2019S12 | First DIHARD Challenge Evaluation - Nine Sources |
LDC2019S13 | First DIHARD Challenge Evaluation - SEEDLingS |
LDC2019V01 | HAVIC MED Progress Test -- Videos, Metadata and Annotation |
LDC2019S22 | IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b |
LDC2019S08 | IARPA Babel Guarani Language Pack IARPA-babel305b-v1.0c |
LDC2019S16 | IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c |
LDC2019S03 | IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b |
LDC2019S17 | LDC Spoken Language Sampler - Fifth Release |
LDC2019T14 | Machine Reading Phase 1 NFL Scoring Training Data |
LDC2019S23 | Magic Data Chinese Mandarin Conversational Speech |
LDC2019S02 | Multi-Language Conversational Telephone Speech 2011 -- Arabic Group |
LDC2019S15 | Multi-Language Conversational Telephone Speech 2011 -- East Asian |
LDC2019S06 | Multi-Language Conversational Telephone Speech 2011 -- English Group |
LDC2019T04 | Multilingual ATIS |
LDC2019T05 | Penn Discourse Treebank Version 3.0 |
LDC2019T10 | Phrase Detectives Corpus Version 2 |
LDC2019S19 | Polish Speech Database |
LDC2019S01 | SRI Speech-Based Collaborative Learning Corpus |
LDC2019T08 | TAC KBP Chinese Regular Slot Filling - Comprehensive Training and Evaluation Data 2014 |
LDC2019T17 | TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 |
LDC2019T19 | TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017 |
LDC2019T02 | TAC KBP Entity Discovery and Linking - Comprehensive Training and Evaluation Data 2014-2015 |
LDC2019T12 | TAC KBP Evaluation Source Corpora 2016-2017 |
LDC2019S14 | The DKU-JNU-EMA Electromagnetic Articulography Database |
LDC2019S11 | USC-SFI MALACH Interviews and Transcripts English – Speech Recognition Edition |
LDC2019S05 | VAST Chinese Speech and Transcripts |
2018 | |
LDC2018T08 | 2007 CoNLL Shared Task - Arabic & English |
LDC2018T06 | 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish |
LDC2018T07 | 2007 CoNLL Shared Task - Greek, Hungarian & Italian |
LDC2018S06 | 2011 NIST Language Recognition Evaluation Test Set |
LDC2018S14 | AISHELL-1 |
LDC2018S15 | Avatar Education Portuguese |
LDC2018T10 | BOLT Arabic Discussion Forums |
LDC2018T15 | BOLT Chinese SMS/Chat |
LDC2018T23 | BOLT Egyptian Arabic Treebank - Discussion Forum |
LDC2018T19 | BOLT English SMS/Chat |
LDC2018T18 | BOLT Information Retrieval Comprehensive Training and Evaluation |
LDC2018S09 | CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition |
LDC2018S11 | CIEMPIESS Balance |
LDC2018T20 | Concretely Annotated English Gigaword |
LDC2018T01 | DEFT Spanish Treebank |
LDC2018S01 | DIRHA English WSJ Audio |
LDC2018S05 | GALE Phase 4 Arabic Broadcast News Speech |
LDC2018T14 | GALE Phase 4 Arabic Broadcast News Transcripts |
LDC2018T05 | H2, E2, ERK1 Children's Writing |
LDC2018V01 | HAVIC MED Event E051-E060 -- Videos, Metadata and Annotation |
LDC2018S18 | HUB5 Mandarin Telephone Speech and Transcripts Second Edition |
LDC2018S07 | IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b |
LDC2018S13 | IARPA Babel Kazakh Language Pack IARPA-babel302b-v1.0a |
LDC2018S16 | IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a |
LDC2018S02 | IARPA Babel Tok Pisin Language Pack IARPA-babel207b-v1.0e |
LDC2018T04 | LORELEI Amharic Representative Language Pack - Monolingual and Parallel Text |
LDC2018T11 | LORELEI Somali Representative Language Pack - Monolingual and Parallel Text |
LDC2018S03 | Multi-Language Conversational Telephone Speech 2011 -- Central Asian |
LDC2018S08 | Multi-Language Conversational Telephone Speech 2011 -- Central European |
LDC2018S12 | Multi-Language Conversational Telephone Speech 2011 -- Spanish |
LDC2018S17 | Nautilus Speaker Characterization |
LDC2018S10 | RATS Language Identification |
LDC2018S04 | Rhythm and Pitch |
LDC2018T09 | SPADE |
LDC2018T03 | TAC KBP Comprehensive English Source Corpora 2009-2014 |
LDC2018T16 | TAC KBP English Entity Linking - Comprehensive Training and Evaluation Data 2009-2013 |
LDC2018T22 | TAC KBP English Regular Slot Filling - Comprehensive Training and Evaluation Data 2009-2014 |
LDC2018T24 | TAC Relation Extraction Dataset |
LDC2018T13 | TRAD Arabic-French Parallel Text -- Newsgroup |
LDC2018T21 | TRAD Arabic-French Parallel Text -- Newswire |
LDC2018T02 | TRAD Chinese-French Parallel Text -- Blog |
LDC2018T17 | TRAD Chinese-French Parallel Text -- Broadcast News |
2017 | |
LDC2017S06 | 2010 NIST Speaker Recognition Evaluation Test Set |
LDC2017T13 | 2015-2016 CoNLL Shared Task |
LDC2017T10 | Abstract Meaning Representation (AMR) Annotation Release 2.0 |
LDC2017T14 | Ancient Chinese Corpus |
LDC2017L01 | Arabic Speech Recognition Pronunciation Dictionary |
LDC2017S21 | ASpIRE Development and Development Test Sets |
LDC2017T05 | BOLT Chinese Discussion Forum Parallel Training Data |
LDC2017T07 | BOLT Egyptian Arabic SMS/Chat and Transliteration |
LDC2017T11 | BOLT English Discussion Forums |
LDC2017S07 | CHiME2 Grid |
LDC2017S10 | CHiME2 WSJ0 |
LDC2017S24 | CHiME3 |
LDC2017S23 | CIEMPIESS Light |
LDC2017T15 | English Web Treebank Propbank |
LDC2017T03 | First-Year Law Students' Court Memoranda |
LDC2017T06 | GALE English-Chinese Parallel Aligned Treebank -- Training |
LDC2017T02 | GALE Phase 3 and 4 Chinese Web Parallel Text |
LDC2017S02 | GALE Phase 3 Arabic Broadcast News Speech Part 2 |
LDC2017T04 | GALE Phase 3 Arabic Broadcast News Transcripts Part 2 |
LDC2017S15 | GALE Phase 4 Arabic Broadcast Conversation Speech |
LDC2017T12 | GALE Phase 4 Arabic Broadcast Conversation Transcripts |
LDC2017S25 | GALE Phase 4 Chinese Broadcast News Speech |
LDC2017T18 | GALE Phase 4 Chinese Broadcast News Transcripts |
LDC2017S03 | IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b |
LDC2017S22 | IARPA Babel Kurmanji Kurdish Language Pack IARPA-babel205b-v1.0a |
LDC2017S08 | IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a |
LDC2017S05 | IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d |
LDC2017S13 | IARPA Babel Tamil Language Pack IARPA-babel204b-v1.1b |
LDC2017S01 | IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 |
LDC2017S19 | IARPA Babel Zulu Language Pack IARPA-babel206b-v0.1e |
LDC2017S12 | KSUEmotions |
LDC2017S16 | LDC Spoken Language Sampler - Fourth Release |
LDC2017S11 | Metalogue Multi-Issue Bargaining Dialogue |
LDC2017S14 | Multi-Language Conversational Telephone Speech 2011 -- South Asian |
LDC2017S09 | Multi-Language Conversational Telephone Speech 2011 -- Turkish |
LDC2017T01 | MWE-Aware English Dependency Corpus |
LDC2017T16 | MWE-Aware English Dependency Corpus 2.0 |
LDC2017S04 | Noisy TIMIT Speech |
LDC2017T08 | Phrase Detectives Corpus |
LDC2017S20 | RATS Keyword Spotting |
LDC2017S18 | SRI-FRTIV |
LDC2017T17 | TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014 |
LDC2017T09 | The EventStatus Corpus |
LDC2017V01 | UCLA High-Speed Laryngeal Video and Audio |
LDC2017S17 | Vehicle City Voices Corpus – Part I |
2016 | |
LDC2016T02 | Arabic Treebank - Weblog |
LDC2016T18 | ARL Arabic Dependency Treebank |
LDC2016L01 | Bamanankan Lexicon |
LDC2016T05 | BOLT Chinese Discussion Forums |
LDC2016T19 | BOLT Chinese-English Word Alignment and Tagging -- Discussion Forum Training |
LDC2016T13 | Chinese Treebank 9.0 |
LDC2016T22 | Chinese-English Parallel Sentences Extracted from Patents |
LDC2016S04 | CHM150 |
LDC2016T07 | DEFT Narrative Text |
LDC2016S05 | Digital Archive of Southern Speech - NLP Version |
LDC2016T16 | English Speed Networking Conversational Transcripts |
LDC2016T08 | GALE Phase 3 and 4 Arabic Web Parallel Text |
LDC2016T09 | GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text |
LDC2016T15 | GALE Phase 3 and 4 Chinese Broadcast News Parallel Text |
LDC2016T25 | GALE Phase 3 and 4 Chinese Newswire Parallel Text |
LDC2016S01 | GALE Phase 3 Arabic Broadcast Conversation Speech Part 2 |
LDC2016T06 | GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 2 |
LDC2016S07 | GALE Phase 3 Arabic Broadcast News Speech Part 1 |
LDC2016T17 | GALE Phase 3 Arabic Broadcast News Transcripts Part 1 |
LDC2016T11 | GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences |
LDC2016T20 | GALE Phase 4 Arabic Broadcast News Parallel Sentences |
LDC2016T27 | GALE Phase 4 Arabic Newswire Parallel Sentences |
LDC2016T14 | GALE Phase 4 Arabic Weblog Parallel Sentences |
LDC2016S03 | GALE Phase 4 Chinese Broadcast Conversation Speech |
LDC2016T12 | GALE Phase 4 Chinese Broadcast Conversation Transcripts |
LDC2016T04 | GALE Phase 4 Chinese Weblog Parallel Sentences |
LDC2016T01 | H1 Children's Writing |
LDC2016V01 | HAVIC Pilot Transcription |
LDC2016S06 | IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a |
LDC2016S08 | IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b |
LDC2016S02 | IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c |
LDC2016S12 | IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a |
LDC2016S09 | IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY |
LDC2016S13 | IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g |
LDC2016S10 | IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5 |
LDC2016T24 | JANA: A Human-Human Dialogues Corpus for Egyptian Dialect |
LDC2016T21 | KAFD: Arabic Font Database |
LDC2016S11 | Multi-Language Conversational Telephone Speech 2011 -- Slavic Group |
LDC2016T03 | NewSoMe Corpus of Opinion in Blogs |
LDC2016T23 | Richer Event Description |
LDC2016T10 | SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing |
LDC2016T26 | TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 |
2015 | |
LDC2015T12 | 2006 CoNLL Shared Task - Arabic & Czech |
LDC2015T11 | 2006 CoNLL Shared Task - Ten Languages |
LDC2015T20 | ACE 2007 Spanish DevTest - Pilot Evaluation |
LDC2015S10 | Arabic Learner Corpus |
LDC2015S12 | Articulation Index LSCP |
LDC2015T03 | Avocado Research Email Collection |
LDC2015S07 | CIEMPIESS |
LDC2015T08 | Coordination Annotation for the Penn Treebank |
LDC2015T13 | English News Text Treebank: Penn Treebank Revised |
LDC2015T06 | GALE Chinese-English Parallel Aligned Treebank -- Training |
LDC2015T04 | GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3 |
LDC2015T18 | GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 4 |
LDC2015S01 | GALE Phase 2 Arabic Broadcast News Speech Part 2 |
LDC2015T01 | GALE Phase 2 Arabic Broadcast News Transcripts Part 2 |
LDC2015T05 | GALE Phase 3 and 4 Arabic Broadcast Conversation Parallel Text |
LDC2015T07 | GALE Phase 3 and 4 Arabic Broadcast News Parallel Text |
LDC2015T19 | GALE Phase 3 and 4 Arabic Newswire Parallel Text |
LDC2015S11 | GALE Phase 3 Arabic Broadcast Conversation Speech Part 1 |
LDC2015T16 | GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 1 |
LDC2015S06 | GALE Phase 3 Chinese Broadcast Conversation Speech Part 2 |
LDC2015T09 | GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 2 |
LDC2015S13 | GALE Phase 3 Chinese Broadcast News Speech |
LDC2015T25 | GALE Phase 3 Chinese Broadcast News Transcripts |
LDC2015T14 | GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences |
LDC2015T21 | GALE Phase 4 Chinese Broadcast News Parallel Sentences |
LDC2015T24 | GALE Phase 4 Chinese Newswire Parallel Sentences |
LDC2015T22 | Karlsruhe Children's Text |
LDC2015T23 | KHATT: Handwritten Arabic Text |
LDC2015S09 | LDC Spoken Language Sampler - Third Release |
LDC2015S05 | Mandarin Chinese Phonetic Segmentation and Tone |
LDC2015S04 | Mandarin-English Code-Switching in South-East Asia |
LDC2015T17 | NewSoMe Corpus of Opinion in News Reports |
LDC2015S02 | RATS Speech Activity Detection |
LDC2015T10 | RST Signalling Corpus |
LDC2015T02 | SenSem Databank |
LDC2015L01 | SenSem Lexicons |
LDC2015S03 | The Subglottal Resonances Database |
LDC2015S08 | The Walking Around Corpus |
LDC2015T15 | TS Wikipedia |
2014 | |
LDC2014S06 | 2009 NIST Language Recognition Evaluation Test Set |
LDC2014T12 | Abstract Meaning Representation (AMR) Annotation Release 1.0 |
LDC2014T18 | ACE 2007 Multilingual Training Corpus |
LDC2014T24 | Boulder Lies and Truth |
LDC2014S01 | CALLFRIEND Farsi Second Edition Speech |
LDC2014T01 | CALLFRIEND Farsi Second Edition Transcripts |
LDC2014T21 | Chinese Discourse Treebank 0.5 |
LDC2014T07 | Domain-Specific Hyponym Relations |
LDC2014T06 | ETS Corpus of Non-Native Written English |
LDC2014T23 | Fisher and CALLHOME Spanish--English Speech Translation |
LDC2014T03 | GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 2 |
LDC2014T08 | GALE Arabic-English Parallel Aligned Treebank -- Web Training |
LDC2014T19 | GALE Arabic-English Word Alignment -- Broadcast Training Part 1 |
LDC2014T22 | GALE Arabic-English Word Alignment -- Broadcast Training Part 2 |
LDC2014T05 | GALE Arabic-English Word Alignment Training Part 1 -- Newswire and Web |
LDC2014T10 | GALE Arabic-English Word Alignment Training Part 2 -- Newswire |
LDC2014T14 | GALE Arabic-English Word Alignment Training Part 3 -- Web |
LDC2014T25 | GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2 |
LDC2014S07 | GALE Phase 2 Arabic Broadcast News Speech Part 1 |
LDC2014T17 | GALE Phase 2 Arabic Broadcast News Transcripts Part 1 |
LDC2014T04 | GALE Phase 2 Chinese Broadcast News Parallel Text Part 1 |
LDC2014T11 | GALE Phase 2 Chinese Broadcast News Parallel Text Part 2 |
LDC2014T15 | GALE Phase 2 Chinese Newswire Parallel Text Part 1 |
LDC2014T20 | GALE Phase 2 Chinese Newswire Parallel Text Part 2 |
LDC2014T26 | GALE Phase 2 Chinese Web Parallel Text |
LDC2014S09 | GALE Phase 3 Chinese Broadcast Conversation Speech Part 1 |
LDC2014T28 | GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 1 |
LDC2014S05 | Hispanic-English Database |
LDC2014T09 | HyTER Networks of Selected OpenMT08/09 Sentences |
LDC2014S02 | King Saud University Arabic Speech Database |
LDC2014T13 | MADCAT Chinese Pilot Training Set |
LDC2014S03 | Multi-Channel WSJ Audio |
LDC2014T02 | NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source |
LDC2014T16 | TAC KBP Reference Knowledge Base |
LDC2014S08 | United Nations Proceedings Speech |
LDC2014S04 | USC-SFI MALACH Interviews and Transcripts Czech |
2013 | |
LDC2013T06 | 1993-2007 United Nations Parallel Text |
LDC2013T13 | Chinese Proposition Bank 3.0 |
LDC2013T21 | Chinese Treebank 8.0 |
LDC2013T02 | Chinese-English Biology and Chemistry Abstract Parallel Text |
LDC2013S09 | CSC Deceptive Speech |
LDC2013T14 | GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 1 |
LDC2013T10 | GALE Arabic-English Parallel Aligned Treebank -- Newswire |
LDC2013T23 | GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 1 |
LDC2013T05 | GALE Chinese-English Word Alignment and Tagging Training Part 4 -- Web |
LDC2013S02 | GALE Phase 2 Arabic Broadcast Conversation Speech Part 1 |
LDC2013S07 | GALE Phase 2 Arabic Broadcast Conversation Speech Part 2 |
LDC2013T04 | GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1 |
LDC2013T17 | GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2 |
LDC2013T01 | GALE Phase 2 Arabic Web Parallel Text |
LDC2013T11 | GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 1 |
LDC2013T16 | GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 2 |
LDC2013S04 | GALE Phase 2 Chinese Broadcast Conversation Speech |
LDC2013T08 | GALE Phase 2 Chinese Broadcast Conversation Transcripts |
LDC2013S08 | GALE Phase 2 Chinese Broadcast News Speech |
LDC2013T20 | GALE Phase 2 Chinese Broadcast News Transcripts |
LDC2013S05 | Greybeard |
LDC2013S06 | LDC Spoken Language Sampler - Second Release |
LDC2013T09 | MADCAT Phase 2 Training Set |
LDC2013T15 | MADCAT Phase 3 Training Set |
LDC2013L01 | Maninkakan Lexicon |
LDC2013T12 | Manually Annotated Sub-Corpus Third Release |
LDC2013S03 | Mixer 6 Speech |
LDC2013T07 | NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets |
LDC2013T03 | NIST 2012 Open Machine Translation (OpenMT) Evaluation |
LDC2013T19 | OntoNotes Release 5.0 |
LDC2013T18 | Semantic Textual Similarity (STS) 2013 Machine Translation |
LDC2013T22 | The ARRAU Corpus of Anaphoric Information |
2012 | |
LDC2012V01 | 2005 NIST/USF Evaluation Resources for the VACE Program - Broadcast News |
LDC2012S01 | 2006 NIST Speaker Recognition Evaluation Test Set Part 2 |
LDC2012T03 | 2009 CoNLL Shared Task Part 1 |
LDC2012T04 | 2009 CoNLL Shared Task Part 2 |
LDC2012T11 | American English Nickname Collection |
LDC2012T21 | Annotated English Gigaword |
LDC2012T07 | Arabic Treebank - Broadcast News v1.0 |
LDC2012T09 | Arabic-Dialect/English Parallel Text |
LDC2012T10 | Catalan TimeBank 1.0 |
LDC2012T05 | Chinese Dependency Treebank 1.0 |
LDC2012T22 | Chinese-English Semiconductor Parallel Text |
LDC2012S03 | Digital Archive of Southern Speech |
LDC2012T02 | English Translation Treebank: An-Nahar Newswire |
LDC2012T13 | English Web Treebank |
LDC2012T16 | GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web |
LDC2012T20 | GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire |
LDC2012T24 | GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web |
LDC2012T06 | GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 1 |
LDC2012T14 | GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 2 |
LDC2012T18 | GALE Phase 2 Arabic Broadcast News Parallel Text |
LDC2012T17 | GALE Phase 2 Arabic Newswire Parallel Text |
LDC2012T15 | MADCAT Phase 1 Training Set |
LDC2012S04 | Malto Speech and Transcripts |
LDC2012T01 | ModeS TimeBank 1.0 |
LDC2012T08 | Prague Czech-English Dependency Treebank 2.0 |
LDC2012T23 | Russian-English Computer Security Parallel Text |
LDC2012T12 | Spanish TimeBank 1.0 |
LDC2012S02 | TORGO Database of Dysarthric Articulation |
LDC2012S06 | Turkish Broadcast News Speech and Transcripts |
LDC2012S05 | USC-SFI MALACH Interviews and Transcripts English |
2011 | |
LDC2011S04 | 2005 NIST Speaker Recognition Evaluation Test Data |
LDC2011S01 | 2005 NIST Speaker Recognition Evaluation Training Data |
LDC2011S06 | 2005 Spring NIST Rich Transcription (RT-05S) Evaluation Set |
LDC2011S10 | 2006 NIST Speaker Recognition Evaluation Test Set Part 1 |
LDC2011S09 | 2006 NIST Speaker Recognition Evaluation Training Set |
LDC2011S02 | 2006 NIST Spoken Term Detection Development Set |
LDC2011S03 | 2006 NIST Spoken Term Detection Evaluation Set |
LDC2011V05 | 2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1 |
LDC2011V06 | 2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2 |
LDC2011S11 | 2008 NIST Speaker Recognition Evaluation Supplemental Set |
LDC2011S08 | 2008 NIST Speaker Recognition Evaluation Test Set |
LDC2011S05 | 2008 NIST Speaker Recognition Evaluation Training Set Part 1 |
LDC2011S07 | 2008 NIST Speaker Recognition Evaluation Training Set Part 2 |
LDC2011T05 | 2008/2010 NIST Metrics for Machine Translation (MetricsMaTr) GALE Evaluation Set |
LDC2011T02 | ACE 2005 English SpatialML Annotations Version 2 |
LDC2011T11 | Arabic Gigaword Fifth Edition |
LDC2011T09 | Arabic Treebank: Part 2 v 3.1 |
LDC2011T06 | Broadcast News Lattices |
LDC2011T13 | Chinese Gigaword Fifth Edition |
LDC2011T08 | Datasets for Generic Relation Extraction (reACE) |
LDC2011T07 | English Gigaword Fifth Edition |
LDC2011T10 | French Gigaword Third Edition |
LDC2011T04 | Indian Language Part-of-Speech Tagset: Sanskrit |
LDC2011V03 | NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1 |
LDC2011V04 | NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2 |
LDC2011V01 | NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 1 |
LDC2011V02 | NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 2 |
LDC2011T03 | OntoNotes Release 4.0 |
LDC2011T01 | SemEval-2010 Task 1 OntoNotes English: Coreference Resolution in Multiple Languages |
LDC2011T12 | Spanish Gigaword Third Edition |
2010 | |
LDC2010S03 | 2003 NIST Speaker Recognition Evaluation |
LDC2010T09 | ACE 2005 Mandarin SpatialML Annotations |
LDC2010T18 | ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0 |
LDC2010T13 | Arabic Treebank: Part 1 v 4.1 |
LDC2010T08 | Arabic Treebank: Part 3 v 3.2 |
LDC2010S05 | Asian Elephant Vocalizations |
LDC2010S07 | Asian Spoken Language Sampler |
LDC2010T07 | Chinese Treebank 7.0 |
LDC2010T06 | Chinese Web 5-gram Version 1 |
LDC2010T02 | Czech Broadcast News MDE Transcripts |
LDC2010T04 | Fisher Spanish - Transcripts |
LDC2010S01 | Fisher Spanish Speech |
LDC2010T03 | GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2 |
LDC2010T16 | Indian Language Part-of-Speech Tagset: Bengali |
LDC2010T24 | Indian Language Part-of-Speech Tagset: Hindi |
LDC2010T19 | Korean Newswire Second Edition |
LDC2010L01 | LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1 |
LDC2010T22 | Manually Annotated Sub-Corpus First Release |
LDC2010T15 | Message Understanding Conference 7 Timed (MUC7_T) |
LDC2010T10 | NIST 2002 Open Machine Translation (OpenMT) Evaluation |
LDC2010T11 | NIST 2003 Open Machine Translation (OpenMT) Evaluation |
LDC2010T12 | NIST 2004 Open Machine Translation (OpenMT) Evaluation |
LDC2010T14 | NIST 2005 Open Machine Translation (OpenMT) Evaluation |
LDC2010T17 | NIST 2006 Open Machine Translation (OpenMT) Evaluation |
LDC2010T21 | NIST 2008 Open Machine Translation (OpenMT) Evaluation |
LDC2010T23 | NIST 2009 Open Machine Translation (OpenMT) Evaluation |
LDC2010T01 | NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations |
LDC2010T05 | NPS Internet Chatroom Conversations, Release 1.0 |
LDC2010V01 | TRECVID 2004 Keyframes & Transcripts |
LDC2010V02 | TRECVID 2006 Keyframes |
LDC2010S02 | WTIMIT 1.0 |
2009 | |
LDC2009S05 | 2007 NIST Language Recognition Evaluation Supplemental Training Set |
LDC2009S04 | 2007 NIST Language Recognition Evaluation Test Set |
LDC2009T12 | 2008 CoNLL Shared Task Data |
LDC2009T05 | 2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data |
LDC2009T29 | ACL Anthology Reference Corpus |
LDC2009L01 | An English Dictionary of the Tamil Verb Second Edition |
LDC2009T30 | Arabic Gigaword Fourth Edition |
LDC2009T22 | Arabic Newswire English Translation Collection |
LDC2009V01 | Audiovisual Database of Spoken American English |
LDC2009T04 | BioProp Version 1.0 |
LDC2009T27 | Chinese Gigaword Fourth Edition |
LDC2009S01 | CSLU: Numbers Version 1.3 |
LDC2009S03 | CSLU: S4X Release 1.2 |
LDC2009T20 | Czech Broadcast Conversation MDE Transcripts |
LDC2009S02 | Czech Broadcast Conversation Speech |
LDC2009T01 | English CTS Treebank with Structural Metadata |
LDC2009T13 | English Gigaword Fourth Edition |
LDC2009T23 | FactBank 1.0 |
LDC2009T28 | French Gigaword Second Edition |
LDC2009T03 | GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1 |
LDC2009T09 | GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2 |
LDC2009T02 | GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1 |
LDC2009T06 | GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2 |
LDC2009T15 | GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1 |
LDC2009T08 | Japanese Web N-gram Version 1 |
LDC2009T10 | Language Understanding Annotation Corpus |
LDC2009T26 | NXT Switchboard Annotations |
LDC2009T24 | OntoNotes Release 3.0 |
LDC2009T11 | REFLEX Entity Translation Training/DevTest |
LDC2009T21 | Spanish Gigaword Second Edition |
LDC2009T14 | Tagged Chinese Gigaword Version 2.0 |
LDC2009T07 | Unified Linguistic Annotation Text Collection |
LDC2009T25 | Web 1T 5-gram, 10 European Languages Version 1 |
2008 | |
LDC2008S05 | 2005 NIST Language Recognition Evaluation |
LDC2008T03 | ACE 2005 English SpatialML Annotations |
LDC2008L01 | An English Dictionary of the Tamil Verb |
LDC2008T25 | AQUAINT-2 Information-Retrieval Text Research Collection |
LDC2008T13 | BLLIP North American News Text, Complete |
LDC2008T14 | BLLIP North American News Text, General Release |
LDC2008T17 | CALLHOME Mandarin Chinese Transcripts - XML version |
LDC2008S09 | CHAracterizing INdividual Speakers (CHAINS) |
LDC2008T07 | Chinese Proposition Bank 2.0 |
LDC2008T24 | COMNOM v 1.0 |
LDC2008S06 | CSLU: Alphadigit Version 1.3 |
LDC2008S07 | CSLU: ISOLET Spoken Letter Database Version 1.3 |
LDC2008S02 | CSLU: National Cellular Telephone Speech Release 2.3 |
LDC2008S01 | CSLU: Portland Cellular Telephone Speech Version 1.3 |
LDC2008T22 | Czech Academic Corpus 2.0 |
LDC2008T02 | GALE Phase 1 Arabic Blog Parallel Text |
LDC2008T09 | GALE Phase 1 Arabic Broadcast News Parallel Text - Part 2 |
LDC2008T06 | GALE Phase 1 Chinese Blog Parallel Text |
LDC2008T08 | GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2 |
LDC2008T18 | GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3 |
LDC2008L03 | Global Yoruba Lexical Database v. 1.0 |
LDC2008L02 | Hindi WordNet |
LDC2008T01 | Hungarian-English Parallel Text, Version 1.0 |
LDC2008S08 | LDC Spoken Language Sampler |
LDC2008T23 | NomBank v 1.0 |
LDC2008T15 | North American News Text, Complete |
LDC2008T16 | North American News Text, General Release |
LDC2008T04 | OntoNotes Release 2.0 |
LDC2008T05 | Penn Discourse Treebank Version 2.0 |
LDC2008T20 | PennBioIE CYP 1.0 |
LDC2008T21 | PennBioIE Oncology 1.0 |
LDC2008S03 | STC-TIMIT 1.0 |
LDC2008S04 | West Point Brazilian Portuguese Speech |
2007 | |
LDC2007T22 | 2001 Topic Annotated Enron Email Data Set |
LDC2007S10 | 2003 NIST Rich Transcription Evaluation Data |
LDC2007S12 | 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data |
LDC2007S11 | 2004 Spring NIST Rich Transcription (RT-04S) Development Data |
LDC2007T40 | Arabic Gigaword Third Edition |
LDC2007S03 | ARL Urdu Speech Database, Training Data |
LDC2007T38 | Chinese Gigaword Third Edition |
LDC2007T36 | Chinese Treebank 6.0 |
LDC2007S08 | CSLU: Foreign Accented English Release 1.2 |
LDC2007S18 | CSLU: Kids` Speech Version 1.1 |
LDC2007S13 | CSLU: Apple Words and Phrases |
LDC2007S05 | CSLU: Yes/No Version 1.2 |
LDC2007T02 | English Chinese Translation Treebank v 1.0 |
LDC2007T07 | English Gigaword Third Edition |
LDC2007S02 | Fisher Levantine Arabic Conversational Telephone Speech |
LDC2007T04 | Fisher Levantine Arabic Conversational Telephone Speech, Transcripts |
LDC2007T24 | GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1 |
LDC2007T23 | GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1 |
LDC2007T20 | GALE Phase 1 Distillation Training |
LDC2007T08 | ISI Arabic-English Automatically Extracted Parallel Text |
LDC2007T09 | ISI Chinese-English Automatically Extracted Parallel Text |
LDC2007S01 | Levantine Arabic Conversational Telephone Speech |
LDC2007T01 | Levantine Arabic Conversational Telephone Speech, Transcripts |
LDC2007S09 | Mandarin Affective Speech |
LDC2007T19 | MITRE 1997 Mandarin Broadcast News Speech Translations (HUB-4NE) |
LDC2007S15 | Nationwide Speech Project |
LDC2007T21 | OntoNotes Release 1.0 |
LDC2007T03 | Tagged Chinese Gigaword |
LDC2007V02 | TRECVID 2003 Keyframes & Transcripts |
LDC2007V01 | TRECVID 2005 Keyframes & Transcripts |
2006 | |
LDC2006S31 | 2003 NIST Language Recognition Evaluation |
LDC2006S44 | 2004 NIST Speaker Recognition Evaluation |
LDC2006T06 | ACE 2005 Multilingual Training Corpus |
LDC2006S46 | Arabic Broadcast News Speech |
LDC2006T20 | Arabic Broadcast News Transcripts |
LDC2006T02 | Arabic Gigaword Second Edition |
LDC2006S15 | CSLU: Spelled and Spoken Words |
LDC2006S14 | CSLU: Stories v 1.2 |
LDC2006S35 | CSLU: Multilanguage Telephone Speech Version 1.2 |
LDC2006S39 | CSLU: Names Release 1.3 |
LDC2006S26 | CSLU: Speaker Recognition Version 1.1 |
LDC2006S16 | CSLU: Spoltech Brazilian Portuguese Version 1.0 |
LDC2006S01 | CSLU: Voices |
LDC2006T10 | English-Arabic Treebank v 1.0 |
LDC2006T17 | French Gigaword First Edition |
LDC2006S43 | Gulf Arabic Conversational Telephone Speech |
LDC2006T15 | Gulf Arabic Conversational Telephone Speech, Transcripts |
LDC2006S45 | Iraqi Arabic Conversational Telephone Speech |
LDC2006T16 | Iraqi Arabic Conversational Telephone Speech, Transcripts |
LDC2006S42 | Korean Broadcast News Speech |
LDC2006T14 | Korean Broadcast News Transcripts |
LDC2006T03 | Korean Propbank |
LDC2006T09 | Korean Treebank Annotations Version 2.0 |
LDC2006S29 | Levantine Arabic QT Training Data Set 5, Speech |
LDC2006T07 | Levantine Arabic QT Training Data Set 5, Transcripts |
LDC2006S33 | Middle East Technical University Turkish Microphone Speech v 1.0 |
LDC2006T04 | Multiple-Translation Chinese (MTC) Part 4 |
LDC2006S13 | N4 NATO Native and Non-Native Speech |
LDC2006T01 | Prague Dependency Treebank 2.0 |
LDC2006S34 | Russian through Switched Telephone Network (RuSTeN) |
LDC2006T12 | Spanish Gigaword First Edition |
LDC2006S30 | Speech Controlled Computing |
LDC2006T18 | TDT5 Multilingual Text |
LDC2006T19 | TDT5 Topics and Annotations |
LDC2006T08 | TimeBank 1.2 |
LDC2006T13 | Web 1T 5-gram Version 1 |
LDC2006S37 | West Point Heroico Spanish Speech |
LDC2006S36 | West Point Korean Speech |
2005 | |
LDC2005T09 | ACE 2004 Multilingual Training Corpus |
LDC2005T07 | ACE Time Normalization (TERN) 2004 English Training Data v 1.0 |
LDC2005T35 | American National Corpus (ANC) Second Release |
LDC2005S07 | Arabic CTS Levantine Fisher Training Data Set 3, Speech |
LDC2005T03 | Arabic CTS Levantine Fisher Training Data Set 3, Transcripts |
LDC2005T02 | Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis) |
LDC2005T20 | Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis) |
LDC2005T30 | Arabic Treebank: Part 4 v 1.0 (MPG Annotation) |
LDC2005S22 | Articulation Index |
LDC2005T33 | BBN Pronoun Coreference and Entity Type Corpus |
LDC2005S08 | BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts |
LDC2005T13 | CCGbank |
LDC2005T34 | Chinese <-> English Name Entity Lists v 1.0 |
LDC2005T10 | Chinese English News Magazine Parallel Text |
LDC2005T14 | Chinese Gigaword Second Edition |
LDC2005T06 | Chinese News Translation Text Part 1 |
LDC2005T23 | Chinese Proposition Bank 1.0 |
LDC2005T01 | Chinese Treebank 5.0 |
LDC2005S26 | CSLU: 22 Languages Corpus |
LDC2005T08 | Discourse Graphbank |
LDC2005T12 | English Gigaword Second Edition |
LDC2005S13 | Fisher English Training Part 2, Speech |
LDC2005T19 | Fisher English Training Part 2, Transcripts |
LDC2005T28 | HARD 2004 Text |
LDC2005T29 | HARD 2004 Topics and Annotations |
LDC2005S15 | HKUST Mandarin Telephone Speech, Part 1 |
LDC2005T32 | HKUST Mandarin Telephone Transcript Data, Part 1 |
LDC2005S14 | Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) |
LDC2005L01 | Mawukakan Lexicon |
LDC2005T05 | Multiple-Translation Arabic (MTA) Part 2 |
LDC2005S16 | RT-04 MDE Training Data Speech |
LDC2005T24 | RT-04 MDE Training Data Text/Annotations |
LDC2005S25 | Santa Barbara Corpus of Spoken American English Part IV |
LDC2005S11 | TDT4 Multilingual Broadcast News Speech Corpus |
LDC2005T16 | TDT4 Multilingual Text and Annotations |
LDC2005S30 | West Point Company G3 American English Speech |
LDC2005S28 | West Point Croatian Speech |
2004 | |
LDC2004T15 | 2000 Communicator Dialogue Act Tagged |
LDC2004T16 | 2001 Communicator Dialogue Act Tagged |
LDC2004S04 | 2002 NIST Speaker Recognition Evaluation |
LDC2004S11 | 2002 Rich Transcription Broadcast News and Conversational Telephone Speech |
LDC2004T18 | Arabic English Parallel News Part 1 |
LDC2004T17 | Arabic News Translation Text Part 1 |
LDC2004T02 | Arabic Treebank: Part 2 v 2.0 |
LDC2004T11 | Arabic Treebank: Part 3 v 1.0 |
LDC2004L02 | Buckwalter Arabic Morphological Analyzer Version 2.0 |
LDC2004T05 | Chinese Treebank 4.0 |
LDC2004S01 | Czech Broadcast News Speech |
LDC2004T01 | Czech Broadcast News Transcripts |
LDC2004S13 | Fisher English Training Speech Part 1 Speech |
LDC2004T19 | Fisher English Training Speech Part 1 Transcripts |
LDC2004V01 | FORM1 Kinematic Gesture |
LDC2004T08 | Hong Kong Parallel Text |
LDC2004S02 | ICSI Meeting Speech |
LDC2004T04 | ICSI Meeting Transcripts |
LDC2004S05 | ISL Meeting Speech Part 1 |
LDC2004T10 | ISL Meeting Transcripts Part 1 |
LDC2004L01 | Klex: Finite-State Lexical Transducer for Korean |
LDC2004T03 | Morphologically Annotated Korean Text |
LDC2004T07 | Multiple-Translation Chinese (MTC) Part 3 |
LDC2004S09 | NIST Meeting Pilot Corpus Speech |
LDC2004T13 | NIST Meeting Pilot Corpus Transcripts and Metadata |
LDC2004T23 | Prague Arabic Dependency Treebank 1.0 |
LDC2004T25 | Prague Czech-English Dependency Treebank 1.0 |
LDC2004T14 | Proposition Bank I |
LDC2004S08 | RT-03 MDE Training Data Speech |
LDC2004T12 | RT-03 MDE Training Data Text and Annotations |
LDC2004S10 | Santa Barbara Corpus of Spoken American English Part III |
LDC2004S07 | Switchboard Cellular Part 2 Audio |
LDC2004S12 | TalkBank Ethology Data: Field Recordings of Vervet Monkey Calls |
LDC2004T09 | TIDES Extraction (ACE) 2003 Multilingual Training Data |
2003 | |
LDC2003T03 | 1997 HUB5 German Transcripts |
LDC2003T04 | 1997 HUB5 Spanish Transcripts |
LDC2003T02 | 1998 HUB5 English Transcripts |
LDC2003S01 | 2001 Communicator Evaluation |
LDC2003T01 | 2001 HUB5 Mandarin Transcripts |
LDC2003T11 | ACE-2 Version 1.0 |
LDC2003T12 | Arabic Gigaword |
LDC2003T07 | Arabic Treebank: Part 1 - 10K-word English Translation |
LDC2003T06 | Arabic Treebank: Part 1 v 2.0 |
LDC2003T09 | Chinese Gigaword |
LDC2003T05 | English Gigaword |
LDC2003V01 | FORM2 Kinematic Gesture |
LDC2003L01 | Grassfields Bantu Fieldwork: Dschang Lexicon |
LDC2003S02 | Grassfields Bantu Fieldwork: Dschang Tone Paradigms |
LDC2003S07 | Korean Telephone Conversations Complete Set |
LDC2003L02 | Korean Telephone Conversations Lexicon |
LDC2003S03 | Korean Telephone Conversations Speech |
LDC2003T08 | Korean Telephone Conversations Transcripts |
LDC2003T13 | Message Understanding Conference (MUC) 6 |
LDC2003T18 | Multiple-Translation Arabic (MTA) Part 1 |
LDC2003T17 | Multiple-Translation Chinese (MTC) Part 2 |
LDC2003T10 | SAID |
LDC2003S06 | Santa Barbara Corpus of Spoken American English Part II |
LDC2003T15 | SLX Corpus of Classic Sociolinguistic Interviews |
LDC2003T16 | SummBank 1.0 |
LDC2003S05 | West Point Russian Speech |
2002 | |
LDC2002S11 | 1997 HUB4 English Evaluation Speech and Transcripts |
LDC2002S22 | 1997 HUB5 Arabic Evaluation |
LDC2002T39 | 1997 HUB5 Arabic Transcripts |
LDC2002S23 | 1997 HUB5 English Evaluation |
LDC2002S24 | 1997 HUB5 German Evaluation |
LDC2003T03 | 1997 HUB5 German Transcripts |
LDC2002S25 | 1997 HUB5 Spanish Evaluation |
LDC2003T04 | 1997 HUB5 Spanish Transcripts |
LDC2002S10 | 1998 HUB5 English Evaluation |
LDC2003T02 | 1998 HUB5 English Transcripts |
LDC2002S56 | 2000 Communicator Evaluation |
LDC2002S09 | 2000 HUB5 English Evaluation Speech |
LDC2002T43 | 2000 HUB5 English Evaluation Transcripts |
LDC2002S13 | 2001 HUB5 English Evaluation |
LDC2002S12 | 2001 HUB5 Mandarin Evaluation |
LDC2003T01 | 2001 HUB5 Mandarin Transcripts |
LDC2002S34 | 2001 NIST Speaker Recognition Evaluation Corpus |
LDC2002L49 | Buckwalter Arabic Morphological Analyzer Version 1.0 |
LDC2002S37 | CALLHOME Egyptian Arabic Speech Supplement |
LDC2002T38 | CALLHOME Egyptian Arabic Transcripts Supplement |
LDC2002L27 | Chinese-English Translation Lexicon Version 3.0 |
LDC2002S28 | Emotional Prosody Speech and Transcripts |
LDC2001S16 | Grassfields Bantu Fieldwork: Ngomba Tone Paradigms |
LDC2002T26 | Korean English Treebank Annotations |
LDC2002T01 | Multiple-Translation Chinese Corpus |
LDC2002T07 | RST Discourse Treebank |
LDC2001S08 | Speech in Noisy Environments (SPINE2) Part 3 Audio |
LDC2001T09 | Speech in Noisy Environments (SPINE2) Part 3 Transcripts |
LDC2002S06 | Switchboard-2 Phase III Audio |
LDC2002T31 | The AQUAINT Corpus of English News Text |
LDC2002S04 | Translanguage English Database (TED) Speech |
LDC2002T03 | Translanguage English Database (TED) Transcripts |
LDC2002S35 | Voicemail Corpus Part II |
LDC2002S02 | West Point Arabic Speech |
2001 | |
LDC2001S91 | 1997 HUB4 Broadcast News Evaluation Non-English Test Material |
LDC2001S97 | 2000 NIST Speaker Recognition Evaluation |
LDC2001T55 | Arabic Newswire Part 1 |
LDC2001T61 | CALLHOME Spanish Dialogue Act Annotation |
LDC2001T62 | CETEMpublico |
LDC2001T11 | Chinese Treebank 2.0 |
LDC2001S16 | Grassfields Bantu Fieldwork: Ngomba Tone Paradigms |
LDC2001T02 | Message Understanding Conference (MUC) 7 |
LDC2001T10 | Prague Dependency Treebank 1.0 |
LDC2001S04 | Speech in Noisy Environments (SPINE2) Part 1 Audio |
LDC2001T05 | Speech in Noisy Environments (SPINE2) Part 1 Transcripts |
LDC2001S06 | Speech in Noisy Environments (SPINE2) Part 2 Audio |
LDC2001T07 | Speech in Noisy Environments (SPINE2) Part 2 Transcripts |
LDC2001S08 | Speech in Noisy Environments (SPINE2) Part 3 Audio |
LDC2001T09 | Speech in Noisy Environments (SPINE2) Part 3 Transcripts |
LDC2001S99 | Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio |
LDC2001S13 | Switchboard Cellular Part 1 Audio |
LDC2001S15 | Switchboard Cellular Part 1 Transcribed Audio |
LDC2001T14 | Switchboard Cellular Part 1 Transcription |
LDC2001T60 | Syllable-Final /s/ Lenition |
LDC2001S93 | TDT2 Mandarin Audio Corpus |
LDC2001T57 | TDT2 Multilanguage Text Version 4.0 |
LDC2001S94 | TDT3 English Audio |
LDC2001S95 | TDT3 Mandarin Audio |
LDC2001T58 | TDT3 Multilanguage Text Version 2.0 |
2000 | |
LDC2000S86 | 1998 HUB4 Broadcast News Evaluation English Test Material |
LDC2000S88 | 1999 HUB4 Broadcast News Evaluation English Test Material |
LDC2000T43 | BLLIP 1987-89 WSJ Corpus Release 1 |
LDC2000T50 | Hong Kong Hansards Parallel Text |
LDC2000T47 | Hong Kong Laws Parallel Text |
LDC2000T46 | Hong Kong News Parallel Text |
LDC2000T45 | Korean Newswire |
LDC2000S85 | Santa Barbara Corpus of Spoken American English Part I |
LDC2000S96 | Speech in Noisy Environments (SPINE) Evaluation Audio |
LDC2000T54 | Speech in Noisy Environments (SPINE) Evaluation Transcripts |
LDC2000S87 | Speech in Noisy Environments (SPINE) Training Audio |
LDC2000T49 | Speech in Noisy Environments (SPINE) Training Transcripts |
LDC2000S92 | TDT2 Careful Transcription Audio |
LDC2000T44 | TDT2 Careful Transcription Text |
LDC2000T52 | TREC Mandarin |
LDC2000T51 | TREC Spanish |
LDC2000S89 | Voice of America (VOA) Czech Broadcast News Audio |
LDC2000T53 | Voice of America (VOA) Czech Broadcast News Transcripts |
1999 | |
LDC99S80 | 1997 Speaker Recognition Benchmark |
LDC99S81 | 1999 Speaker Recognition Benchmark |
LDC99L23 | American English Spoken Lexicon |
LDC99L22 | Egyptian Colloquial Arabic Lexicon |
LDC99T34 | Japanese Business News Text Supplement |
LDC99T40 | Portuguese Newswire Text |
LDC99T41 | Spanish Newswire Text, Volume 2 |
LDC99S78 | SUSAS |
LDC99T33 | SUSAS Transcripts |
LDC99S79 | Switchboard-2 Phase II |
LDC99S83 | Tactical Speaker Identification Speech Corpus (TSID) |
LDC99S84 | TDT2 English Audio |
LDC99T42 | Treebank-3 |
LDC99S82 | USC Marketplace Broadcast News Speech |
LDC99T36 | USC Marketplace Broadcast News Transcripts |
1998 | |
LDC98T31 | 1996 CSR HUB4 Language Model |
LDC97S66 | 1996 English Broadcast News Dev and Eval (HUB4) |
LDC97S44 | 1996 English Broadcast News Speech (HUB4) |
LDC97T22 | 1996 English Broadcast News Transcripts (HUB4) |
LDC98S71 | 1997 English Broadcast News Speech (HUB4) |
LDC98T28 | 1997 English Broadcast News Transcripts (HUB4) |
LDC98S73 | 1997 Mandarin Broadcast News Speech (HUB4-NE) |
LDC98T24 | 1997 Mandarin Broadcast News Transcripts (HUB4-NE) |
LDC98S74 | 1997 Spanish Broadcast News Speech (HUB4-NE) |
LDC98T29 | 1997 Spanish Broadcast News Transcripts (HUB4-NE) |
LDC98S76 | 1998 Speaker Recognition Benchmark |
LDC98L21 | COMLEX English Syntax Lexicon |
LDC96T11 | COMLEX Syntax Text Corpus Version 2.0 |
LDC95S23 | CSR-III Speech |
LDC95T6 | CSR-III Text |
LDC98S67 | HTIMIT |
LDC98S69 | HUB5 Mandarin Telephone Speech Corpus |
LDC98T26 | HUB5 Mandarin Transcripts |
LDC98S70 | HUB5 Spanish Telephone Speech Corpus |
LDC98T27 | HUB5 Spanish Transcripts |
LDC98T32 | JURIS |
LDC95S22 | KING Speaker Verification |
LDC98S68 | LLHDB |
LDC98T30 | North American News Text Supplement |
LDC98S75 | Switchboard-2 Phase I |
LDC98S72 | Taiwanese Putonghua Speech and Transcripts |
LDC98T25 | TDT Pilot Study Corpus |
LDC98S77 | Voicemail Corpus Part I |
LDC94S16 | YOHO Speaker Verification |
1997 | |
LDC97S66 | 1996 English Broadcast News Dev and Eval (HUB4) |
LDC97S44 | 1996 English Broadcast News Speech (HUB4) |
LDC97T22 | 1996 English Broadcast News Transcripts (HUB4) |
LDC96S61 | 1996 Speaker Recognition Benchmark |
LDC94S14A | Air Traffic Control Complete |
LDC96S36 | Boston University Radio Speech Corpus |
LDC96S46 | CALLFRIEND American English-Non-Southern Dialect |
LDC96S47 | CALLFRIEND American English-Southern Dialect |
LDC96S48 | CALLFRIEND Canadian French |
LDC96S49 | CALLFRIEND Egyptian Arabic |
LDC96S50 | CALLFRIEND Farsi |
LDC96S51 | CALLFRIEND German |
LDC96S52 | CALLFRIEND Hindi |
LDC96S53 | CALLFRIEND Japanese |
LDC96S54 | CALLFRIEND Korean |
LDC96S55 | CALLFRIEND Mandarin Chinese-Mainland Dialect |
LDC96S56 | CALLFRIEND Mandarin Chinese-Taiwan Dialect |
LDC96S57 | CALLFRIEND Spanish-Caribbean Dialect |
LDC96S58 | CALLFRIEND Spanish-Non-Caribbean Dialect |
LDC96S59 | CALLFRIEND Tamil |
LDC96S60 | CALLFRIEND Vietnamese |
LDC97L20 | CALLHOME American English Lexicon (PRONLEX) |
LDC97S42 | CALLHOME American English Speech |
LDC97T14 | CALLHOME American English Transcripts |
LDC97S45 | CALLHOME Egyptian Arabic Speech |
LDC97T19 | CALLHOME Egyptian Arabic Transcripts |
LDC97L18 | CALLHOME German Lexicon |
LDC97S43 | CALLHOME German Speech |
LDC97T15 | CALLHOME German Transcripts |
LDC96L17 | CALLHOME Japanese Lexicon |
LDC96S37 | CALLHOME Japanese Speech |
LDC96T18 | CALLHOME Japanese Transcripts |
LDC96L15 | CALLHOME Mandarin Chinese Lexicon |
LDC96S34 | CALLHOME Mandarin Chinese Speech |
LDC96T16 | CALLHOME Mandarin Chinese Transcripts |
LDC96L16 | CALLHOME Spanish Lexicon |
LDC96S35 | CALLHOME Spanish Speech |
LDC96T17 | CALLHOME Spanish Transcripts |
LDC94S13A | CSR-II (WSJ1) Complete |
LDC94S13B | CSR-II (WSJ1) Sennheiser |
LDC97T12 | DSO Corpus of Sense-Tagged English |
LDC99L22 | Egyptian Colloquial Arabic Lexicon |
LDC95T20 | Hansard French/English |
LDC96S64-1 | JEIDA/JCSD-Channel 0 City Names |
LDC96S64 | JEIDA/JCSD-Channel 0 Complete |
LDC96S64-2 | JEIDA/JCSD-Channel 0 Control Words |
LDC96S64-4 | JEIDA/JCSD-Channel 0 Four Digit Sequences |
LDC96S64-3 | JEIDA/JCSD-Channel 0 Isolated Digits |
LDC96S64-5 | JEIDA/JCSD-Channel 0 Mono Syllables |
LDC96S65-1 | JEIDA/JCSD-Channel 1 City Names |
LDC96S65 | JEIDA/JCSD-Channel 1 Complete |
LDC96S65-2 | JEIDA/JCSD-Channel 1 Control Words |
LDC96S65-4 | JEIDA/JCSD-Channel 1 Four Digit Sequences |
LDC96S65-3 | JEIDA/JCSD-Channel 1 Isolated Digits |
LDC96S65-5 | JEIDA/JCSD-Channel 1 Mono Syllables |
LDC95T13 | Mandarin Chinese News Text |
LDC95T21 | North American News Text Corpus |
LDC94S15 | SPIDRE |
LDC97S62 | Switchboard-1 Release 2 |
LDC97S63 | The CMU Kids Corpus |
1996 | |
LDC96S61 | 1996 Speaker Recognition Benchmark |
LDC96S36 | Boston University Radio Speech Corpus |
LDC94S20 | BRAMSHILL |
LDC96S46 | CALLFRIEND American English-Non-Southern Dialect |
LDC96S47 | CALLFRIEND American English-Southern Dialect |
LDC96S48 | CALLFRIEND Canadian French |
LDC96S49 | CALLFRIEND Egyptian Arabic |
LDC96S50 | CALLFRIEND Farsi |
LDC96S51 | CALLFRIEND German |
LDC96S52 | CALLFRIEND Hindi |
LDC96S53 | CALLFRIEND Japanese |
LDC96S54 | CALLFRIEND Korean |
LDC96S55 | CALLFRIEND Mandarin Chinese-Mainland Dialect |
LDC96S56 | CALLFRIEND Mandarin Chinese-Taiwan Dialect |
LDC96S57 | CALLFRIEND Spanish-Caribbean Dialect |
LDC96S58 | CALLFRIEND Spanish-Non-Caribbean Dialect |
LDC96S59 | CALLFRIEND Tamil |
LDC96S60 | CALLFRIEND Vietnamese |
LDC97L20 | CALLHOME American English Lexicon (PRONLEX) |
LDC96L17 | CALLHOME Japanese Lexicon |
LDC96S37 | CALLHOME Japanese Speech |
LDC96T18 | CALLHOME Japanese Transcripts |
LDC96L15 | CALLHOME Mandarin Chinese Lexicon |
LDC96S34 | CALLHOME Mandarin Chinese Speech |
LDC96T16 | CALLHOME Mandarin Chinese Transcripts |
LDC96L16 | CALLHOME Spanish Lexicon |
LDC96S35 | CALLHOME Spanish Speech |
LDC96T17 | CALLHOME Spanish Transcripts |
LDC96L14 | CELEX2 |
LDC98L21 | COMLEX English Syntax Lexicon |
LDC96T11 | COMLEX Syntax Text Corpus Version 2.0 |
LDC93S6A | CSR-I (WSJ0) Complete |
LDC93S6C | CSR-I (WSJ0) Other |
LDC93S6B | CSR-I (WSJ0) Sennheiser |
LDC96S33 | CSR-IV HUB3 |
LDC96S31 | CSR-IV HUB4 |
LDC96S30 | CTIMIT |
LDC96S38 | DCIEM/HCRC |
LDC95T11 | European Language Newspaper Text |
LDC96S32 | FFMTIMIT |
LDC96S29 | Frontiers in Speech Processing 93 |
LDC96S40 | Frontiers in Speech Processing 94 |
LDC95T20 | Hansard French/English |
LDC93S12 | HCRC Map Task Corpus |
LDC96S64-1 | JEIDA/JCSD-Channel 0 City Names |
LDC96S64 | JEIDA/JCSD-Channel 0 Complete |
LDC96S64-2 | JEIDA/JCSD-Channel 0 Control Words |
LDC96S64-4 | JEIDA/JCSD-Channel 0 Four Digit Sequences |
LDC96S64-3 | JEIDA/JCSD-Channel 0 Isolated Digits |
LDC96S64-5 | JEIDA/JCSD-Channel 0 Mono Syllables |
LDC96S65-1 | JEIDA/JCSD-Channel 1 City Names |
LDC96S65 | JEIDA/JCSD-Channel 1 Complete |
LDC96S65-2 | JEIDA/JCSD-Channel 1 Control Words |
LDC96S65-4 | JEIDA/JCSD-Channel 1 Four Digit Sequences |
LDC96S65-3 | JEIDA/JCSD-Channel 1 Isolated Digits |
LDC96S65-5 | JEIDA/JCSD-Channel 1 Mono Syllables |
LDC95T13 | Mandarin Chinese News Text |
LDC96T10 | Message Understanding Conference (MUC) 6 Additional News Text |
LDC95T21 | North American News Text Corpus |
LDC93S3A | Resource Management Complete Set 2.0 |
LDC93S3B | Resource Management RM1 2.0 |
LDC93S3C | Resource Management RM2 2.0 |
LDC96S39 | RM Isolated and Spelled Word Data |
LDC95T9 | Spanish News Text |
LDC96S41 | VAHA (POLYPHONE II) |
1995 | |
LDC95S26 | ATIS3 Test Data |
LDC97L20 | CALLHOME American English Lexicon (PRONLEX) |
LDC96L14 | CELEX2 |
LDC98L21 | COMLEX English Syntax Lexicon |
LDC95S23 | CSR-III Speech |
LDC95T6 | CSR-III Text |
LDC95T11 | European Language Newspaper Text |
LDC95T20 | Hansard French/English |
LDC95T8 | Japanese Business News Text |
LDC95S22 | KING Speaker Verification |
LDC95S28 | LATINO-40 Spanish Read News |
LDC95T13 | Mandarin Chinese News Text |
LDC95T21 | North American News Text Corpus |
LDC95S27 | PhoneBook: NYNEX Isolated Words |
LDC95T9 | Spanish News Text |
LDC95S25 | TRAINS Spoken Dialog Corpus |
LDC95T7 | Treebank-2 |
LDC95S24 | WSJCAM0 Cambridge Read News |
1994 | |
LDC94S14B | Air Traffic Control BOS |
LDC94S14A | Air Traffic Control Complete |
LDC94S14C | Air Traffic Control DCA |
LDC94S14D | Air Traffic Control DFW |
LDC94S19 | ATIS3 Training Data |
LDC94S20 | BRAMSHILL |
LDC97L20 | CALLHOME American English Lexicon (PRONLEX) |
LDC98L21 | COMLEX English Syntax Lexicon |
LDC94S13A | CSR-II (WSJ1) Complete |
LDC94S13C | CSR-II (WSJ1) Other |
LDC94S13B | CSR-II (WSJ1) Sennheiser |
LDC94T5 | ECI Multilingual Text |
LDC94S21 | MACROPHONE |
LDC94S17 | OGI Multilanguage Corpus |
LDC94S18 | OGI Spelled and Spoken Word |
LDC94S15 | SPIDRE |
LDC94T4A | UN Parallel Text (Complete) |
LDC94T4B-1 | UN Parallel Text (English) |
LDC94T4B-2 | UN Parallel Text (French) |
LDC94T4B-3 | UN Parallel Text (Spanish) |
LDC94S16 | YOHO Speaker Verification |
1993 | |
LDC93T1 | ACL/DCI |
LDC93S4A | ATIS0 Complete |
LDC93S4B | ATIS0 Pilot |
LDC93S4B-2 | ATIS0 Read |
LDC93S4B-3 | ATIS0 SD Read |
LDC93S5 | ATIS2 |
LDC93S6A | CSR-I (WSJ0) Complete |
LDC93S6C | CSR-I (WSJ0) Other |
LDC93S6B | CSR-I (WSJ0) Sennheiser |
LDC93S12 | HCRC Map Task Corpus |
LDC93S2 | NTIMIT |
LDC93S3A | Resource Management Complete Set 2.0 |
LDC93S3B | Resource Management RM1 2.0 |
LDC93S3C | Resource Management RM2 2.0 |
LDC93S11 | Road Rally |
LDC93S8 | Switchboard Credit Card |
LDC97S62 | Switchboard-1 Release 2 |
LDC93S9 | TI 46-Word |
LDC93S10 | TIDIGITS |
LDC93S1W | TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version) |
LDC93S1 | TIMIT Acoustic-Phonetic Continuous Speech Corpus |
LDC93T3A | TIPSTER Complete |
LDC93T3B | TIPSTER Volume 1 |
LDC93T3C | TIPSTER Volume 2 |
LDC93T3D | TIPSTER Volume 3 |
LDC(Linguistic Data Consortium)历年份数据集汇总
于 2024-07-12 00:31:23 首次发布