LDC(Linguistic Data Consortium)历年份数据集汇总

2024
LDC2024T02AIDA Scenario 1 Practice Topic Annotation
LDC2024T06AIDA Scenario 2 Practice Topic Annotation
LDC2024T04AIDA Scenario 2 Practice Topic Source Data
LDC2024T05Automatic Content Extraction for Portuguese
LDC2024S04BabyEars Affective Vocalizations
LDC2024S05Call My Net 1
LDC2024S06Diaspora Tibetan Speech
LDC2024S01KASET - Kurmanji and Sorani Kurdish Speech and Transcripts
LDC2024T03LoReHLT Hausa Representative Language Pack
LDC2024T01LORELEI Farsi Representative Language Pack
LDC2024S03RATS Low Speech Density
LDC2024S02Second Language University Speech Intelligibility Corpus
2023
LDC2023V012019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual
LDC2023S032019 NIST Speaker Recognition Evaluation Test Set -- CTS Challenge
LDC2023S062019 OpenSAT Public Safety Communications Simulation
LDC2023T10AIDA Scenario 1 and 2 Reference Knowledge Base
LDC2023T11AIDA Scenario 1 Practice Topic Source Data
LDC2023S01AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts
LDC2023S08CALLFRIEND Russian Speech
LDC2023T09CALLFRIEND Russian Text
LDC2023T04DEFT English Light and Rich ERE Annotation
LDC2023S10Kasdi-Merbah (University) Emotional Database in Arabic Speech
LDC2023S07LDC Spoken Language Sampler - Sixth Release
LDC2023T07LORELEI Indonesian Representative Language Pack
LDC2023T01LORELEI Swahili Representative Language Pack
LDC2023T02LORELEI Tagalog Representative Language Pack
LDC2023T03LORELEI Tamil Representative Language Pack
LDC2023T08LORELEI Thai Representative Language Pack
LDC2023T06LORELEI Zulu Representative Language Pack
LDC2023S02Mixer 3 Speech
LDC2023S04Mixer 7 Spanish Speech
LDC2023L01Moroccan Arabic - English Lexical Database
LDC2023T05Penn Korean Universal Dependency Treebank
LDC2023S09REMIX Telephone Collection
LDC2023S05Samrómur Queries Icelandic Speech 1.0
LDC2023T13TAC KBP Belief and Sentiment - Comprehensive Training and Evaluation Data 2016-2017
2022
LDC2022S102017 NIST Language Recognition Evaluation Training and Development Sets
LDC2022S012017 NIST OpenSAT Pilot - SSSF
LDC2022T02AttImam
LDC2022T06BOLT English Translation Treebank - Egyptian Arabic SMS/Chat
LDC2022T07CAMIO Transcription Languages
LDC2022S13Global TIMIT Thai
LDC2022V01HAVIC MED Novel 1 Test -- Videos, Metadata and Annotation
LDC2022V02HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation
LDC2022T05LORELEI Bengali Representative Language Pack
LDC2022T01LORELEI Kinyarwanda Incident Language Pack
LDC2022T03LORELEI Wolof Representative Language Pack
LDC2022S08MASRI Synthetic
LDC2022S04NUBUC
LDC2022T04Qatari Corpus of Argumentative Writing
LDC2022L01Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon
LDC2022S11Samrómur Children Icelandic Speech 1.0
LDC2022S05Samrómur Icelandic Speech 1.0
LDC2022S06Second DIHARD Challenge Evaluation - Eleven Sources
LDC2022S07Second DIHARD Challenge Evaluation - SEEDLingS
LDC2022S03Spoken Digits in Hindi and Indian English
LDC2022S02The Child Subglottal Resonances Database
LDC2022S12Third DIHARD Challenge Development
LDC2022S14Third DIHARD Challenge Evaluation
LDC2022S09Xi'an Guanzhong Object Naming
2021
LDC2021S01Althingi Parliamentary Speech
LDC2021T04ATIS - Seven Languages
LDC2021T07BOLT Chinese Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
LDC2021T11BOLT Chinese SMS/Chat Parallel Training Data
LDC2021T14BOLT Egyptian Arabic Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
LDC2021T18BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
LDC2021T15BOLT Egyptian Arabic SMS/Chat Parallel Training Data
LDC2021T12BOLT Egyptian Arabic Treebank - Conversational Telephone Speech
LDC2021T17BOLT Egyptian Arabic Treebank - SMS/Chat
LDC2021T19BOLT English Translation Treebank - Chinese SMS/Chat
LDC2021T03BOLT English Treebank - SMS/Chat
LDC2021T13Chinese Abstract Meaning Representation 2.0
LDC2021L01Classical Arabic Dictionary
LDC2021S02Columbia Games Corpus
LDC2021T16DiscAlign for Penn and RST Discourse Treebanks
LDC2021T10ESPADA
LDC2021S06Ethnobotanical Research and Language Documentation of Nahuatl
LDC2021S03Global TIMIT Mandarin Chinese
LDC2021V01HAVIC MED Training Data -- Videos, Metadata and Annotation
LDC2021T02LORELEI Akan Representative Language Pack
LDC2021S05MyST Children's Conversational Speech
LDC2021T05Penn Discourse Treebank Version 2.0 - German Translation
LDC2021S08RATS Speaker Identification
LDC2021S10Second DIHARD Challenge Development - Eleven Sources
LDC2021S11Second DIHARD Challenge Development - SEEDLingS
LDC2021T08TAC KBP English Sentiment Slot Filling -- Comprehensive Training and Evaluation Data 2013-2014
LDC2021T06TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010
LDC2021S04The SSNCE Database of Tamil Dysarthric Speech
LDC2021S09UCLA Speaker Variability Database
LDC2021S07Wikipedia Spanish Speech and Transcripts
LDC2021T09X-SRL: Parallel Cross-lingual Semantic Role Labeling
2020
LDC2020S042018 NIST Speaker Recognition Evaluation Test Set
LDC2020T02Abstract Meaning Representation (AMR) Annotation Release 3.0
LDC2020T07Abstract Meaning Representation 2.0 - Four Translations
LDC2020T15BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training
LDC2020T05BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training
LDC2020T20BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
LDC2020T21BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
LDC2020T09BOLT English Translation Treebank - Chinese Discussion Forum
LDC2020S08CALLFRIEND American English-Southern Dialect Second Edition
LDC2020S06CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition
LDC2020T01Chinese CogBank
LDC2020L02Chinese Lexical Resources for Gender, Number, Animacy
LDC2020T23Corpus of Law, Academic, and News
LDC2020L01Database of Word Level Statistics - Mandarin
LDC2020T19DEFT Chinese Light and Rich ERE Annotation
LDC2020T06EVALution
LDC2020S11Global TIMIT Learner Simple English
LDC2020S09Global TIMIT Learner Treebank English
LDC2020S12Global TIMIT Mandarin Chinese-Guanzhong Dialect
LDC2020S02IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b
LDC2020S07IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b
LDC2020S10IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b
LDC2020S01LibriVox Spanish
LDC2020T10LORELEI Entity Detection and Linking Knowledge Base
LDC2020T11LORELEI Oromo Incident Language Pack
LDC2020T22LORELEI Tigrinya Incident Language Pack
LDC2020T24LORELEI Ukrainian Representative Language Pack
LDC2020T17LORELEI Vietnamese Representative Language Pack
LDC2020T04Machine Reading Phase 1 IC Training Data
LDC2020S03Mixer 4 and 5 Speech
LDC2020S05Multi-Language Conversational Telephone Speech 2011 -- Mandarin Chinese
LDC2020T16Penn Parsed Corpora of Historical English
LDC2020S13Phonemes of Arabic
LDC2020T12SemTransCNC
LDC2020T14Speech Sentiment Annotations
LDC2020T03TAC KBP English Event Argument - Training and Evaluation Data 2014-2015
LDC2020T13TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015
LDC2020T08TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013
LDC2020T18TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017
2019
LDC2019S202016 NIST Speaker Recognition Evaluation Test Set
LDC2019T01BOLT Arabic Discussion Forum Parallel Training Data
LDC2019T13BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training
LDC2019T18BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training
LDC2019T06BOLT Egyptian-English Word Alignment -- Discussion Forum Training
LDC2019T15BOLT English Treebank - Discussion Forum
LDC2019S21CALLFRIEND American English-Non-Southern Dialect Second Edition
LDC2019S18CALLFRIEND Canadian French Second Edition
LDC2019S04CALLFRIEND Egyptian Arabic Second Edition
LDC2019T07Chinese Abstract Meaning Representation 1.0
LDC2019S07CIEMPIESS Experimentation
LDC2019T11Corpus of Conversational Persian Transcripts
LDC2019T03DEFT Chinese Committed Belief Annotation
LDC2019T16DEFT English Committed Belief Annotation
LDC2019T09DEFT Spanish Committed Belief Annotation
LDC2019S09First DIHARD Challenge Development - Eight Sources
LDC2019S10First DIHARD Challenge Development - SEEDLingS
LDC2019S12First DIHARD Challenge Evaluation - Nine Sources
LDC2019S13First DIHARD Challenge Evaluation - SEEDLingS
LDC2019V01HAVIC MED Progress Test -- Videos, Metadata and Annotation
LDC2019S22IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b
LDC2019S08IARPA Babel Guarani Language Pack IARPA-babel305b-v1.0c
LDC2019S16IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c
LDC2019S03IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b
LDC2019S17LDC Spoken Language Sampler - Fifth Release
LDC2019T14Machine Reading Phase 1 NFL Scoring Training Data
LDC2019S23Magic Data Chinese Mandarin Conversational Speech
LDC2019S02Multi-Language Conversational Telephone Speech 2011 -- Arabic Group
LDC2019S15Multi-Language Conversational Telephone Speech 2011 -- East Asian
LDC2019S06Multi-Language Conversational Telephone Speech 2011 -- English Group
LDC2019T04Multilingual ATIS
LDC2019T05Penn Discourse Treebank Version 3.0
LDC2019T10Phrase Detectives Corpus Version 2
LDC2019S19Polish Speech Database
LDC2019S01SRI Speech-Based Collaborative Learning Corpus
LDC2019T08TAC KBP Chinese Regular Slot Filling - Comprehensive Training and Evaluation Data 2014
LDC2019T17TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017
LDC2019T19TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017
LDC2019T02TAC KBP Entity Discovery and Linking - Comprehensive Training and Evaluation Data 2014-2015
LDC2019T12TAC KBP Evaluation Source Corpora 2016-2017
LDC2019S14The DKU-JNU-EMA Electromagnetic Articulography Database
LDC2019S11USC-SFI MALACH Interviews and Transcripts English – Speech Recognition Edition
LDC2019S05VAST Chinese Speech and Transcripts
2018
LDC2018T082007 CoNLL Shared Task - Arabic & English
LDC2018T062007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish
LDC2018T072007 CoNLL Shared Task - Greek, Hungarian & Italian
LDC2018S062011 NIST Language Recognition Evaluation Test Set
LDC2018S14AISHELL-1
LDC2018S15Avatar Education Portuguese
LDC2018T10BOLT Arabic Discussion Forums
LDC2018T15BOLT Chinese SMS/Chat
LDC2018T23BOLT Egyptian Arabic Treebank - Discussion Forum
LDC2018T19BOLT English SMS/Chat
LDC2018T18BOLT Information Retrieval Comprehensive Training and Evaluation
LDC2018S09CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition
LDC2018S11CIEMPIESS Balance
LDC2018T20Concretely Annotated English Gigaword
LDC2018T01DEFT Spanish Treebank
LDC2018S01DIRHA English WSJ Audio
LDC2018S05GALE Phase 4 Arabic Broadcast News Speech
LDC2018T14GALE Phase 4 Arabic Broadcast News Transcripts
LDC2018T05H2, E2, ERK1 Children's Writing
LDC2018V01HAVIC MED Event E051-E060 -- Videos, Metadata and Annotation
LDC2018S18HUB5 Mandarin Telephone Speech and Transcripts Second Edition
LDC2018S07IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b
LDC2018S13IARPA Babel Kazakh Language Pack IARPA-babel302b-v1.0a
LDC2018S16IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a
LDC2018S02IARPA Babel Tok Pisin Language Pack IARPA-babel207b-v1.0e
LDC2018T04LORELEI Amharic Representative Language Pack - Monolingual and Parallel Text
LDC2018T11LORELEI Somali Representative Language Pack - Monolingual and Parallel Text
LDC2018S03Multi-Language Conversational Telephone Speech 2011 -- Central Asian
LDC2018S08Multi-Language Conversational Telephone Speech 2011 -- Central European
LDC2018S12Multi-Language Conversational Telephone Speech 2011 -- Spanish
LDC2018S17Nautilus Speaker Characterization
LDC2018S10RATS Language Identification
LDC2018S04Rhythm and Pitch
LDC2018T09SPADE
LDC2018T03TAC KBP Comprehensive English Source Corpora 2009-2014
LDC2018T16TAC KBP English Entity Linking - Comprehensive Training and Evaluation Data 2009-2013
LDC2018T22TAC KBP English Regular Slot Filling - Comprehensive Training and Evaluation Data 2009-2014
LDC2018T24TAC Relation Extraction Dataset
LDC2018T13TRAD Arabic-French Parallel Text -- Newsgroup
LDC2018T21TRAD Arabic-French Parallel Text -- Newswire
LDC2018T02TRAD Chinese-French Parallel Text -- Blog
LDC2018T17TRAD Chinese-French Parallel Text -- Broadcast News
2017
LDC2017S062010 NIST Speaker Recognition Evaluation Test Set
LDC2017T132015-2016 CoNLL Shared Task
LDC2017T10Abstract Meaning Representation (AMR) Annotation Release 2.0
LDC2017T14Ancient Chinese Corpus
LDC2017L01Arabic Speech Recognition Pronunciation Dictionary
LDC2017S21ASpIRE Development and Development Test Sets
LDC2017T05BOLT Chinese Discussion Forum Parallel Training Data
LDC2017T07BOLT Egyptian Arabic SMS/Chat and Transliteration
LDC2017T11BOLT English Discussion Forums
LDC2017S07CHiME2 Grid
LDC2017S10CHiME2 WSJ0
LDC2017S24CHiME3
LDC2017S23CIEMPIESS Light
LDC2017T15English Web Treebank Propbank
LDC2017T03First-Year Law Students' Court Memoranda
LDC2017T06GALE English-Chinese Parallel Aligned Treebank -- Training
LDC2017T02GALE Phase 3 and 4 Chinese Web Parallel Text
LDC2017S02GALE Phase 3 Arabic Broadcast News Speech Part 2
LDC2017T04GALE Phase 3 Arabic Broadcast News Transcripts Part 2
LDC2017S15GALE Phase 4 Arabic Broadcast Conversation Speech
LDC2017T12GALE Phase 4 Arabic Broadcast Conversation Transcripts
LDC2017S25GALE Phase 4 Chinese Broadcast News Speech
LDC2017T18GALE Phase 4 Chinese Broadcast News Transcripts
LDC2017S03IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b
LDC2017S22IARPA Babel Kurmanji Kurdish Language Pack IARPA-babel205b-v1.0a
LDC2017S08IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a
LDC2017S05IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d
LDC2017S13IARPA Babel Tamil Language Pack IARPA-babel204b-v1.1b
LDC2017S01IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7
LDC2017S19IARPA Babel Zulu Language Pack IARPA-babel206b-v0.1e
LDC2017S12KSUEmotions
LDC2017S16LDC Spoken Language Sampler - Fourth Release
LDC2017S11Metalogue Multi-Issue Bargaining Dialogue
LDC2017S14Multi-Language Conversational Telephone Speech 2011 -- South Asian
LDC2017S09Multi-Language Conversational Telephone Speech 2011 -- Turkish
LDC2017T01MWE-Aware English Dependency Corpus
LDC2017T16MWE-Aware English Dependency Corpus 2.0
LDC2017S04Noisy TIMIT Speech
LDC2017T08Phrase Detectives Corpus
LDC2017S20RATS Keyword Spotting
LDC2017S18SRI-FRTIV
LDC2017T17TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014
LDC2017T09The EventStatus Corpus
LDC2017V01UCLA High-Speed Laryngeal Video and Audio
LDC2017S17Vehicle City Voices Corpus – Part I
2016
LDC2016T02Arabic Treebank - Weblog
LDC2016T18ARL Arabic Dependency Treebank
LDC2016L01Bamanankan Lexicon
LDC2016T05BOLT Chinese Discussion Forums
LDC2016T19BOLT Chinese-English Word Alignment and Tagging -- Discussion Forum Training
LDC2016T13Chinese Treebank 9.0
LDC2016T22Chinese-English Parallel Sentences Extracted from Patents
LDC2016S04CHM150
LDC2016T07DEFT Narrative Text
LDC2016S05Digital Archive of Southern Speech - NLP Version
LDC2016T16English Speed Networking Conversational Transcripts
LDC2016T08GALE Phase 3 and 4 Arabic Web Parallel Text
LDC2016T09GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text
LDC2016T15GALE Phase 3 and 4 Chinese Broadcast News Parallel Text
LDC2016T25GALE Phase 3 and 4 Chinese Newswire Parallel Text
LDC2016S01GALE Phase 3 Arabic Broadcast Conversation Speech Part 2
LDC2016T06GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 2
LDC2016S07GALE Phase 3 Arabic Broadcast News Speech Part 1
LDC2016T17GALE Phase 3 Arabic Broadcast News Transcripts Part 1
LDC2016T11GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences
LDC2016T20GALE Phase 4 Arabic Broadcast News Parallel Sentences
LDC2016T27GALE Phase 4 Arabic Newswire Parallel Sentences
LDC2016T14GALE Phase 4 Arabic Weblog Parallel Sentences
LDC2016S03GALE Phase 4 Chinese Broadcast Conversation Speech
LDC2016T12GALE Phase 4 Chinese Broadcast Conversation Transcripts
LDC2016T04GALE Phase 4 Chinese Weblog Parallel Sentences
LDC2016T01H1 Children's Writing
LDC2016V01HAVIC Pilot Transcription
LDC2016S06IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a
LDC2016S08IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b
LDC2016S02IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c
LDC2016S12IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a
LDC2016S09IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY
LDC2016S13IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g
LDC2016S10IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5
LDC2016T24JANA: A Human-Human Dialogues Corpus for Egyptian Dialect
LDC2016T21KAFD: Arabic Font Database
LDC2016S11Multi-Language Conversational Telephone Speech 2011 -- Slavic Group
LDC2016T03NewSoMe Corpus of Opinion in Blogs
LDC2016T23Richer Event Description
LDC2016T10SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing
LDC2016T26TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014
2015
LDC2015T122006 CoNLL Shared Task - Arabic & Czech
LDC2015T112006 CoNLL Shared Task - Ten Languages
LDC2015T20ACE 2007 Spanish DevTest - Pilot Evaluation
LDC2015S10Arabic Learner Corpus
LDC2015S12Articulation Index LSCP
LDC2015T03Avocado Research Email Collection
LDC2015S07CIEMPIESS
LDC2015T08Coordination Annotation for the Penn Treebank
LDC2015T13English News Text Treebank: Penn Treebank Revised
LDC2015T06GALE Chinese-English Parallel Aligned Treebank -- Training
LDC2015T04GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3
LDC2015T18GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 4
LDC2015S01GALE Phase 2 Arabic Broadcast News Speech Part 2
LDC2015T01GALE Phase 2 Arabic Broadcast News Transcripts Part 2
LDC2015T05GALE Phase 3 and 4 Arabic Broadcast Conversation Parallel Text
LDC2015T07GALE Phase 3 and 4 Arabic Broadcast News Parallel Text
LDC2015T19GALE Phase 3 and 4 Arabic Newswire Parallel Text
LDC2015S11GALE Phase 3 Arabic Broadcast Conversation Speech Part 1
LDC2015T16GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 1
LDC2015S06GALE Phase 3 Chinese Broadcast Conversation Speech Part 2
LDC2015T09GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 2
LDC2015S13GALE Phase 3 Chinese Broadcast News Speech
LDC2015T25GALE Phase 3 Chinese Broadcast News Transcripts
LDC2015T14GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences
LDC2015T21GALE Phase 4 Chinese Broadcast News Parallel Sentences
LDC2015T24GALE Phase 4 Chinese Newswire Parallel Sentences
LDC2015T22Karlsruhe Children's Text
LDC2015T23KHATT: Handwritten Arabic Text
LDC2015S09LDC Spoken Language Sampler - Third Release
LDC2015S05Mandarin Chinese Phonetic Segmentation and Tone
LDC2015S04Mandarin-English Code-Switching in South-East Asia
LDC2015T17NewSoMe Corpus of Opinion in News Reports
LDC2015S02RATS Speech Activity Detection
LDC2015T10RST Signalling Corpus
LDC2015T02SenSem Databank
LDC2015L01SenSem Lexicons
LDC2015S03The Subglottal Resonances Database
LDC2015S08The Walking Around Corpus
LDC2015T15TS Wikipedia
2014
LDC2014S062009 NIST Language Recognition Evaluation Test Set
LDC2014T12Abstract Meaning Representation (AMR) Annotation Release 1.0
LDC2014T18ACE 2007 Multilingual Training Corpus
LDC2014T24Boulder Lies and Truth
LDC2014S01CALLFRIEND Farsi Second Edition Speech
LDC2014T01CALLFRIEND Farsi Second Edition Transcripts
LDC2014T21Chinese Discourse Treebank 0.5
LDC2014T07Domain-Specific Hyponym Relations
LDC2014T06ETS Corpus of Non-Native Written English
LDC2014T23Fisher and CALLHOME Spanish--English Speech Translation
LDC2014T03GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 2
LDC2014T08GALE Arabic-English Parallel Aligned Treebank -- Web Training
LDC2014T19GALE Arabic-English Word Alignment -- Broadcast Training Part 1
LDC2014T22GALE Arabic-English Word Alignment -- Broadcast Training Part 2
LDC2014T05GALE Arabic-English Word Alignment Training Part 1 -- Newswire and Web
LDC2014T10GALE Arabic-English Word Alignment Training Part 2 -- Newswire
LDC2014T14GALE Arabic-English Word Alignment Training Part 3 -- Web
LDC2014T25GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2
LDC2014S07GALE Phase 2 Arabic Broadcast News Speech Part 1
LDC2014T17GALE Phase 2 Arabic Broadcast News Transcripts Part 1
LDC2014T04GALE Phase 2 Chinese Broadcast News Parallel Text Part 1
LDC2014T11GALE Phase 2 Chinese Broadcast News Parallel Text Part 2
LDC2014T15GALE Phase 2 Chinese Newswire Parallel Text Part 1
LDC2014T20GALE Phase 2 Chinese Newswire Parallel Text Part 2
LDC2014T26GALE Phase 2 Chinese Web Parallel Text
LDC2014S09GALE Phase 3 Chinese Broadcast Conversation Speech Part 1
LDC2014T28GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 1
LDC2014S05Hispanic-English Database
LDC2014T09HyTER Networks of Selected OpenMT08/09 Sentences
LDC2014S02King Saud University Arabic Speech Database
LDC2014T13MADCAT Chinese Pilot Training Set
LDC2014S03Multi-Channel WSJ Audio
LDC2014T02NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source
LDC2014T16TAC KBP Reference Knowledge Base
LDC2014S08United Nations Proceedings Speech
LDC2014S04USC-SFI MALACH Interviews and Transcripts Czech
2013
LDC2013T061993-2007 United Nations Parallel Text
LDC2013T13Chinese Proposition Bank 3.0
LDC2013T21Chinese Treebank 8.0
LDC2013T02Chinese-English Biology and Chemistry Abstract Parallel Text
LDC2013S09CSC Deceptive Speech
LDC2013T14GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 1
LDC2013T10GALE Arabic-English Parallel Aligned Treebank -- Newswire
LDC2013T23GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 1
LDC2013T05GALE Chinese-English Word Alignment and Tagging Training Part 4 -- Web
LDC2013S02GALE Phase 2 Arabic Broadcast Conversation Speech Part 1
LDC2013S07GALE Phase 2 Arabic Broadcast Conversation Speech Part 2
LDC2013T04GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1
LDC2013T17GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2
LDC2013T01GALE Phase 2 Arabic Web Parallel Text
LDC2013T11GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 1
LDC2013T16GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 2
LDC2013S04GALE Phase 2 Chinese Broadcast Conversation Speech
LDC2013T08GALE Phase 2 Chinese Broadcast Conversation Transcripts
LDC2013S08GALE Phase 2 Chinese Broadcast News Speech
LDC2013T20GALE Phase 2 Chinese Broadcast News Transcripts
LDC2013S05Greybeard
LDC2013S06LDC Spoken Language Sampler - Second Release
LDC2013T09MADCAT Phase 2 Training Set
LDC2013T15MADCAT Phase 3 Training Set
LDC2013L01Maninkakan Lexicon
LDC2013T12Manually Annotated Sub-Corpus Third Release
LDC2013S03Mixer 6 Speech
LDC2013T07NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets
LDC2013T03NIST 2012 Open Machine Translation (OpenMT) Evaluation
LDC2013T19OntoNotes Release 5.0
LDC2013T18Semantic Textual Similarity (STS) 2013 Machine Translation
LDC2013T22The ARRAU Corpus of Anaphoric Information
2012
LDC2012V012005 NIST/USF Evaluation Resources for the VACE Program - Broadcast News
LDC2012S012006 NIST Speaker Recognition Evaluation Test Set Part 2
LDC2012T032009 CoNLL Shared Task Part 1
LDC2012T042009 CoNLL Shared Task Part 2
LDC2012T11American English Nickname Collection
LDC2012T21Annotated English Gigaword
LDC2012T07Arabic Treebank - Broadcast News v1.0
LDC2012T09Arabic-Dialect/English Parallel Text
LDC2012T10Catalan TimeBank 1.0
LDC2012T05Chinese Dependency Treebank 1.0
LDC2012T22Chinese-English Semiconductor Parallel Text
LDC2012S03Digital Archive of Southern Speech
LDC2012T02English Translation Treebank: An-Nahar Newswire
LDC2012T13English Web Treebank
LDC2012T16GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web
LDC2012T20GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire
LDC2012T24GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web
LDC2012T06GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 1
LDC2012T14GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 2
LDC2012T18GALE Phase 2 Arabic Broadcast News Parallel Text
LDC2012T17GALE Phase 2 Arabic Newswire Parallel Text
LDC2012T15MADCAT Phase 1 Training Set
LDC2012S04Malto Speech and Transcripts
LDC2012T01ModeS TimeBank 1.0
LDC2012T08Prague Czech-English Dependency Treebank 2.0
LDC2012T23Russian-English Computer Security Parallel Text
LDC2012T12Spanish TimeBank 1.0
LDC2012S02TORGO Database of Dysarthric Articulation
LDC2012S06Turkish Broadcast News Speech and Transcripts
LDC2012S05USC-SFI MALACH Interviews and Transcripts English
2011
LDC2011S042005 NIST Speaker Recognition Evaluation Test Data
LDC2011S012005 NIST Speaker Recognition Evaluation Training Data
LDC2011S062005 Spring NIST Rich Transcription (RT-05S) Evaluation Set
LDC2011S102006 NIST Speaker Recognition Evaluation Test Set Part 1
LDC2011S092006 NIST Speaker Recognition Evaluation Training Set
LDC2011S022006 NIST Spoken Term Detection Development Set
LDC2011S032006 NIST Spoken Term Detection Evaluation Set
LDC2011V052006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1
LDC2011V062006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2
LDC2011S112008 NIST Speaker Recognition Evaluation Supplemental Set
LDC2011S082008 NIST Speaker Recognition Evaluation Test Set
LDC2011S052008 NIST Speaker Recognition Evaluation Training Set Part 1
LDC2011S072008 NIST Speaker Recognition Evaluation Training Set Part 2
LDC2011T052008/2010 NIST Metrics for Machine Translation (MetricsMaTr) GALE Evaluation Set
LDC2011T02ACE 2005 English SpatialML Annotations Version 2
LDC2011T11Arabic Gigaword Fifth Edition
LDC2011T09Arabic Treebank: Part 2 v 3.1
LDC2011T06Broadcast News Lattices
LDC2011T13Chinese Gigaword Fifth Edition
LDC2011T08Datasets for Generic Relation Extraction (reACE)
LDC2011T07English Gigaword Fifth Edition
LDC2011T10French Gigaword Third Edition
LDC2011T04Indian Language Part-of-Speech Tagset: Sanskrit
LDC2011V03NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1
LDC2011V04NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2
LDC2011V01NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 1
LDC2011V02NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 2
LDC2011T03OntoNotes Release 4.0
LDC2011T01SemEval-2010 Task 1 OntoNotes English: Coreference Resolution in Multiple Languages
LDC2011T12Spanish Gigaword Third Edition
2010
LDC2010S032003 NIST Speaker Recognition Evaluation
LDC2010T09ACE 2005 Mandarin SpatialML Annotations
LDC2010T18ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0
LDC2010T13Arabic Treebank: Part 1 v 4.1
LDC2010T08Arabic Treebank: Part 3 v 3.2
LDC2010S05Asian Elephant Vocalizations
LDC2010S07Asian Spoken Language Sampler
LDC2010T07Chinese Treebank 7.0
LDC2010T06Chinese Web 5-gram Version 1
LDC2010T02Czech Broadcast News MDE Transcripts
LDC2010T04Fisher Spanish - Transcripts
LDC2010S01Fisher Spanish Speech
LDC2010T03GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2
LDC2010T16Indian Language Part-of-Speech Tagset: Bengali
LDC2010T24Indian Language Part-of-Speech Tagset: Hindi
LDC2010T19Korean Newswire Second Edition
LDC2010L01LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1
LDC2010T22Manually Annotated Sub-Corpus First Release
LDC2010T15Message Understanding Conference 7 Timed (MUC7_T)
LDC2010T10NIST 2002 Open Machine Translation (OpenMT) Evaluation
LDC2010T11NIST 2003 Open Machine Translation (OpenMT) Evaluation
LDC2010T12NIST 2004 Open Machine Translation (OpenMT) Evaluation
LDC2010T14NIST 2005 Open Machine Translation (OpenMT) Evaluation
LDC2010T17NIST 2006 Open Machine Translation (OpenMT) Evaluation
LDC2010T21NIST 2008 Open Machine Translation (OpenMT) Evaluation
LDC2010T23NIST 2009 Open Machine Translation (OpenMT) Evaluation
LDC2010T01NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations
LDC2010T05NPS Internet Chatroom Conversations, Release 1.0
LDC2010V01TRECVID 2004 Keyframes & Transcripts
LDC2010V02TRECVID 2006 Keyframes
LDC2010S02WTIMIT 1.0
2009
LDC2009S052007 NIST Language Recognition Evaluation Supplemental Training Set
LDC2009S042007 NIST Language Recognition Evaluation Test Set
LDC2009T122008 CoNLL Shared Task Data
LDC2009T052008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data
LDC2009T29ACL Anthology Reference Corpus
LDC2009L01An English Dictionary of the Tamil Verb Second Edition
LDC2009T30Arabic Gigaword Fourth Edition
LDC2009T22Arabic Newswire English Translation Collection
LDC2009V01Audiovisual Database of Spoken American English
LDC2009T04BioProp Version 1.0
LDC2009T27Chinese Gigaword Fourth Edition
LDC2009S01CSLU: Numbers Version 1.3
LDC2009S03CSLU: S4X Release 1.2
LDC2009T20Czech Broadcast Conversation MDE Transcripts
LDC2009S02Czech Broadcast Conversation Speech
LDC2009T01English CTS Treebank with Structural Metadata
LDC2009T13English Gigaword Fourth Edition
LDC2009T23FactBank 1.0
LDC2009T28French Gigaword Second Edition
LDC2009T03GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1
LDC2009T09GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2
LDC2009T02GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1
LDC2009T06GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2
LDC2009T15GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1
LDC2009T08Japanese Web N-gram Version 1
LDC2009T10Language Understanding Annotation Corpus
LDC2009T26NXT Switchboard Annotations
LDC2009T24OntoNotes Release 3.0
LDC2009T11REFLEX Entity Translation Training/DevTest
LDC2009T21Spanish Gigaword Second Edition
LDC2009T14Tagged Chinese Gigaword Version 2.0
LDC2009T07Unified Linguistic Annotation Text Collection
LDC2009T25Web 1T 5-gram, 10 European Languages Version 1
2008
LDC2008S052005 NIST Language Recognition Evaluation
LDC2008T03ACE 2005 English SpatialML Annotations
LDC2008L01An English Dictionary of the Tamil Verb
LDC2008T25AQUAINT-2 Information-Retrieval Text Research Collection
LDC2008T13BLLIP North American News Text, Complete
LDC2008T14BLLIP North American News Text, General Release
LDC2008T17CALLHOME Mandarin Chinese Transcripts - XML version
LDC2008S09CHAracterizing INdividual Speakers (CHAINS)
LDC2008T07Chinese Proposition Bank 2.0
LDC2008T24COMNOM v 1.0
LDC2008S06CSLU: Alphadigit Version 1.3
LDC2008S07CSLU: ISOLET Spoken Letter Database Version 1.3
LDC2008S02CSLU: National Cellular Telephone Speech Release 2.3
LDC2008S01CSLU: Portland Cellular Telephone Speech Version 1.3
LDC2008T22Czech Academic Corpus 2.0
LDC2008T02GALE Phase 1 Arabic Blog Parallel Text
LDC2008T09GALE Phase 1 Arabic Broadcast News Parallel Text - Part 2
LDC2008T06GALE Phase 1 Chinese Blog Parallel Text
LDC2008T08GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2
LDC2008T18GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3
LDC2008L03Global Yoruba Lexical Database v. 1.0
LDC2008L02Hindi WordNet
LDC2008T01Hungarian-English Parallel Text, Version 1.0
LDC2008S08LDC Spoken Language Sampler
LDC2008T23NomBank v 1.0
LDC2008T15North American News Text, Complete
LDC2008T16North American News Text, General Release
LDC2008T04OntoNotes Release 2.0
LDC2008T05Penn Discourse Treebank Version 2.0
LDC2008T20PennBioIE CYP 1.0
LDC2008T21PennBioIE Oncology 1.0
LDC2008S03STC-TIMIT 1.0
LDC2008S04West Point Brazilian Portuguese Speech
2007
LDC2007T222001 Topic Annotated Enron Email Data Set
LDC2007S102003 NIST Rich Transcription Evaluation Data
LDC2007S122004 Spring NIST Rich Transcription (RT-04S) Evaluation Data
LDC2007S112004 Spring NIST Rich Transcription (RT-04S) Development Data
LDC2007T40Arabic Gigaword Third Edition
LDC2007S03ARL Urdu Speech Database, Training Data
LDC2007T38Chinese Gigaword Third Edition
LDC2007T36Chinese Treebank 6.0
LDC2007S08CSLU: Foreign Accented English Release 1.2
LDC2007S18CSLU: Kids` Speech Version 1.1
LDC2007S13CSLU: Apple Words and Phrases
LDC2007S05CSLU: Yes/No Version 1.2
LDC2007T02English Chinese Translation Treebank v 1.0
LDC2007T07English Gigaword Third Edition
LDC2007S02Fisher Levantine Arabic Conversational Telephone Speech
LDC2007T04Fisher Levantine Arabic Conversational Telephone Speech, Transcripts
LDC2007T24GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1
LDC2007T23GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1
LDC2007T20GALE Phase 1 Distillation Training
LDC2007T08ISI Arabic-English Automatically Extracted Parallel Text
LDC2007T09ISI Chinese-English Automatically Extracted Parallel Text
LDC2007S01Levantine Arabic Conversational Telephone Speech
LDC2007T01Levantine Arabic Conversational Telephone Speech, Transcripts
LDC2007S09Mandarin Affective Speech
LDC2007T19MITRE 1997 Mandarin Broadcast News Speech Translations (HUB-4NE)
LDC2007S15Nationwide Speech Project
LDC2007T21OntoNotes Release 1.0
LDC2007T03Tagged Chinese Gigaword
LDC2007V02TRECVID 2003 Keyframes & Transcripts
LDC2007V01TRECVID 2005 Keyframes & Transcripts
2006
LDC2006S312003 NIST Language Recognition Evaluation
LDC2006S442004 NIST Speaker Recognition Evaluation
LDC2006T06ACE 2005 Multilingual Training Corpus
LDC2006S46Arabic Broadcast News Speech
LDC2006T20Arabic Broadcast News Transcripts
LDC2006T02Arabic Gigaword Second Edition
LDC2006S15CSLU: Spelled and Spoken Words
LDC2006S14CSLU: Stories v 1.2
LDC2006S35CSLU: Multilanguage Telephone Speech Version 1.2
LDC2006S39CSLU: Names Release 1.3
LDC2006S26CSLU: Speaker Recognition Version 1.1
LDC2006S16CSLU: Spoltech Brazilian Portuguese Version 1.0
LDC2006S01CSLU: Voices
LDC2006T10English-Arabic Treebank v 1.0
LDC2006T17French Gigaword First Edition
LDC2006S43Gulf Arabic Conversational Telephone Speech
LDC2006T15Gulf Arabic Conversational Telephone Speech, Transcripts
LDC2006S45Iraqi Arabic Conversational Telephone Speech
LDC2006T16Iraqi Arabic Conversational Telephone Speech, Transcripts
LDC2006S42Korean Broadcast News Speech
LDC2006T14Korean Broadcast News Transcripts
LDC2006T03Korean Propbank
LDC2006T09Korean Treebank Annotations Version 2.0
LDC2006S29Levantine Arabic QT Training Data Set 5, Speech
LDC2006T07Levantine Arabic QT Training Data Set 5, Transcripts
LDC2006S33Middle East Technical University Turkish Microphone Speech v 1.0
LDC2006T04Multiple-Translation Chinese (MTC) Part 4
LDC2006S13N4 NATO Native and Non-Native Speech
LDC2006T01Prague Dependency Treebank 2.0
LDC2006S34Russian through Switched Telephone Network (RuSTeN)
LDC2006T12Spanish Gigaword First Edition
LDC2006S30Speech Controlled Computing
LDC2006T18TDT5 Multilingual Text
LDC2006T19TDT5 Topics and Annotations
LDC2006T08TimeBank 1.2
LDC2006T13Web 1T 5-gram Version 1
LDC2006S37West Point Heroico Spanish Speech
LDC2006S36West Point Korean Speech
2005
LDC2005T09ACE 2004 Multilingual Training Corpus
LDC2005T07ACE Time Normalization (TERN) 2004 English Training Data v 1.0
LDC2005T35American National Corpus (ANC) Second Release
LDC2005S07Arabic CTS Levantine Fisher Training Data Set 3, Speech
LDC2005T03Arabic CTS Levantine Fisher Training Data Set 3, Transcripts
LDC2005T02Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis)
LDC2005T20Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis)
LDC2005T30Arabic Treebank: Part 4 v 1.0 (MPG Annotation)
LDC2005S22Articulation Index
LDC2005T33BBN Pronoun Coreference and Entity Type Corpus
LDC2005S08BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts
LDC2005T13CCGbank
LDC2005T34Chinese <-> English Name Entity Lists v 1.0
LDC2005T10Chinese English News Magazine Parallel Text
LDC2005T14Chinese Gigaword Second Edition
LDC2005T06Chinese News Translation Text Part 1
LDC2005T23Chinese Proposition Bank 1.0
LDC2005T01Chinese Treebank 5.0
LDC2005S26CSLU: 22 Languages Corpus
LDC2005T08Discourse Graphbank
LDC2005T12English Gigaword Second Edition
LDC2005S13Fisher English Training Part 2, Speech
LDC2005T19Fisher English Training Part 2, Transcripts
LDC2005T28HARD 2004 Text
LDC2005T29HARD 2004 Topics and Annotations
LDC2005S15HKUST Mandarin Telephone Speech, Part 1
LDC2005T32HKUST Mandarin Telephone Transcript Data, Part 1
LDC2005S14Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)
LDC2005L01Mawukakan Lexicon
LDC2005T05Multiple-Translation Arabic (MTA) Part 2
LDC2005S16RT-04 MDE Training Data Speech
LDC2005T24RT-04 MDE Training Data Text/Annotations
LDC2005S25Santa Barbara Corpus of Spoken American English Part IV
LDC2005S11TDT4 Multilingual Broadcast News Speech Corpus
LDC2005T16TDT4 Multilingual Text and Annotations
LDC2005S30West Point Company G3 American English Speech
LDC2005S28West Point Croatian Speech
2004
LDC2004T152000 Communicator Dialogue Act Tagged
LDC2004T162001 Communicator Dialogue Act Tagged
LDC2004S042002 NIST Speaker Recognition Evaluation
LDC2004S112002 Rich Transcription Broadcast News and Conversational Telephone Speech
LDC2004T18Arabic English Parallel News Part 1
LDC2004T17Arabic News Translation Text Part 1
LDC2004T02Arabic Treebank: Part 2 v 2.0
LDC2004T11Arabic Treebank: Part 3 v 1.0
LDC2004L02Buckwalter Arabic Morphological Analyzer Version 2.0
LDC2004T05Chinese Treebank 4.0
LDC2004S01Czech Broadcast News Speech
LDC2004T01Czech Broadcast News Transcripts
LDC2004S13Fisher English Training Speech Part 1 Speech
LDC2004T19Fisher English Training Speech Part 1 Transcripts
LDC2004V01FORM1 Kinematic Gesture
LDC2004T08Hong Kong Parallel Text
LDC2004S02ICSI Meeting Speech
LDC2004T04ICSI Meeting Transcripts
LDC2004S05ISL Meeting Speech Part 1
LDC2004T10ISL Meeting Transcripts Part 1
LDC2004L01Klex: Finite-State Lexical Transducer for Korean
LDC2004T03Morphologically Annotated Korean Text
LDC2004T07Multiple-Translation Chinese (MTC) Part 3
LDC2004S09NIST Meeting Pilot Corpus Speech
LDC2004T13NIST Meeting Pilot Corpus Transcripts and Metadata
LDC2004T23Prague Arabic Dependency Treebank 1.0
LDC2004T25Prague Czech-English Dependency Treebank 1.0
LDC2004T14Proposition Bank I
LDC2004S08RT-03 MDE Training Data Speech
LDC2004T12RT-03 MDE Training Data Text and Annotations
LDC2004S10Santa Barbara Corpus of Spoken American English Part III
LDC2004S07Switchboard Cellular Part 2 Audio
LDC2004S12TalkBank Ethology Data: Field Recordings of Vervet Monkey Calls
LDC2004T09TIDES Extraction (ACE) 2003 Multilingual Training Data
2003
LDC2003T031997 HUB5 German Transcripts
LDC2003T041997 HUB5 Spanish Transcripts
LDC2003T021998 HUB5 English Transcripts
LDC2003S012001 Communicator Evaluation
LDC2003T012001 HUB5 Mandarin Transcripts
LDC2003T11ACE-2 Version 1.0
LDC2003T12Arabic Gigaword
LDC2003T07Arabic Treebank: Part 1 - 10K-word English Translation
LDC2003T06Arabic Treebank: Part 1 v 2.0
LDC2003T09Chinese Gigaword
LDC2003T05English Gigaword
LDC2003V01FORM2 Kinematic Gesture
LDC2003L01Grassfields Bantu Fieldwork: Dschang Lexicon
LDC2003S02Grassfields Bantu Fieldwork: Dschang Tone Paradigms
LDC2003S07Korean Telephone Conversations Complete Set
LDC2003L02Korean Telephone Conversations Lexicon
LDC2003S03Korean Telephone Conversations Speech
LDC2003T08Korean Telephone Conversations Transcripts
LDC2003T13Message Understanding Conference (MUC) 6
LDC2003T18Multiple-Translation Arabic (MTA) Part 1
LDC2003T17Multiple-Translation Chinese (MTC) Part 2
LDC2003T10SAID
LDC2003S06Santa Barbara Corpus of Spoken American English Part II
LDC2003T15SLX Corpus of Classic Sociolinguistic Interviews
LDC2003T16SummBank 1.0
LDC2003S05West Point Russian Speech
2002
LDC2002S111997 HUB4 English Evaluation Speech and Transcripts
LDC2002S221997 HUB5 Arabic Evaluation
LDC2002T391997 HUB5 Arabic Transcripts
LDC2002S231997 HUB5 English Evaluation
LDC2002S241997 HUB5 German Evaluation
LDC2003T031997 HUB5 German Transcripts
LDC2002S251997 HUB5 Spanish Evaluation
LDC2003T041997 HUB5 Spanish Transcripts
LDC2002S101998 HUB5 English Evaluation
LDC2003T021998 HUB5 English Transcripts
LDC2002S562000 Communicator Evaluation
LDC2002S092000 HUB5 English Evaluation Speech
LDC2002T432000 HUB5 English Evaluation Transcripts
LDC2002S132001 HUB5 English Evaluation
LDC2002S122001 HUB5 Mandarin Evaluation
LDC2003T012001 HUB5 Mandarin Transcripts
LDC2002S342001 NIST Speaker Recognition Evaluation Corpus
LDC2002L49Buckwalter Arabic Morphological Analyzer Version 1.0
LDC2002S37CALLHOME Egyptian Arabic Speech Supplement
LDC2002T38CALLHOME Egyptian Arabic Transcripts Supplement
LDC2002L27Chinese-English Translation Lexicon Version 3.0
LDC2002S28Emotional Prosody Speech and Transcripts
LDC2001S16Grassfields Bantu Fieldwork: Ngomba Tone Paradigms
LDC2002T26Korean English Treebank Annotations
LDC2002T01Multiple-Translation Chinese Corpus
LDC2002T07RST Discourse Treebank
LDC2001S08Speech in Noisy Environments (SPINE2) Part 3 Audio
LDC2001T09Speech in Noisy Environments (SPINE2) Part 3 Transcripts
LDC2002S06Switchboard-2 Phase III Audio
LDC2002T31The AQUAINT Corpus of English News Text
LDC2002S04Translanguage English Database (TED) Speech
LDC2002T03Translanguage English Database (TED) Transcripts
LDC2002S35Voicemail Corpus Part II
LDC2002S02West Point Arabic Speech
2001
LDC2001S911997 HUB4 Broadcast News Evaluation Non-English Test Material
LDC2001S972000 NIST Speaker Recognition Evaluation
LDC2001T55Arabic Newswire Part 1
LDC2001T61CALLHOME Spanish Dialogue Act Annotation
LDC2001T62CETEMpublico
LDC2001T11Chinese Treebank 2.0
LDC2001S16Grassfields Bantu Fieldwork: Ngomba Tone Paradigms
LDC2001T02Message Understanding Conference (MUC) 7
LDC2001T10Prague Dependency Treebank 1.0
LDC2001S04Speech in Noisy Environments (SPINE2) Part 1 Audio
LDC2001T05Speech in Noisy Environments (SPINE2) Part 1 Transcripts
LDC2001S06Speech in Noisy Environments (SPINE2) Part 2 Audio
LDC2001T07Speech in Noisy Environments (SPINE2) Part 2 Transcripts
LDC2001S08Speech in Noisy Environments (SPINE2) Part 3 Audio
LDC2001T09Speech in Noisy Environments (SPINE2) Part 3 Transcripts
LDC2001S99Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio
LDC2001S13Switchboard Cellular Part 1 Audio
LDC2001S15Switchboard Cellular Part 1 Transcribed Audio
LDC2001T14Switchboard Cellular Part 1 Transcription
LDC2001T60Syllable-Final /s/ Lenition
LDC2001S93TDT2 Mandarin Audio Corpus
LDC2001T57TDT2 Multilanguage Text Version 4.0
LDC2001S94TDT3 English Audio
LDC2001S95TDT3 Mandarin Audio
LDC2001T58TDT3 Multilanguage Text Version 2.0
2000
LDC2000S861998 HUB4 Broadcast News Evaluation English Test Material
LDC2000S881999 HUB4 Broadcast News Evaluation English Test Material
LDC2000T43BLLIP 1987-89 WSJ Corpus Release 1
LDC2000T50Hong Kong Hansards Parallel Text
LDC2000T47Hong Kong Laws Parallel Text
LDC2000T46Hong Kong News Parallel Text
LDC2000T45Korean Newswire
LDC2000S85Santa Barbara Corpus of Spoken American English Part I
LDC2000S96Speech in Noisy Environments (SPINE) Evaluation Audio
LDC2000T54Speech in Noisy Environments (SPINE) Evaluation Transcripts
LDC2000S87Speech in Noisy Environments (SPINE) Training Audio
LDC2000T49Speech in Noisy Environments (SPINE) Training Transcripts
LDC2000S92TDT2 Careful Transcription Audio
LDC2000T44TDT2 Careful Transcription Text
LDC2000T52TREC Mandarin
LDC2000T51TREC Spanish
LDC2000S89Voice of America (VOA) Czech Broadcast News Audio
LDC2000T53Voice of America (VOA) Czech Broadcast News Transcripts
1999
LDC99S801997 Speaker Recognition Benchmark
LDC99S811999 Speaker Recognition Benchmark
LDC99L23American English Spoken Lexicon
LDC99L22Egyptian Colloquial Arabic Lexicon
LDC99T34Japanese Business News Text Supplement
LDC99T40Portuguese Newswire Text
LDC99T41Spanish Newswire Text, Volume 2
LDC99S78SUSAS
LDC99T33SUSAS Transcripts
LDC99S79Switchboard-2 Phase II
LDC99S83Tactical Speaker Identification Speech Corpus (TSID)
LDC99S84TDT2 English Audio
LDC99T42Treebank-3
LDC99S82USC Marketplace Broadcast News Speech
LDC99T36USC Marketplace Broadcast News Transcripts
1998
LDC98T311996 CSR HUB4 Language Model
LDC97S661996 English Broadcast News Dev and Eval (HUB4)
LDC97S441996 English Broadcast News Speech (HUB4)
LDC97T221996 English Broadcast News Transcripts (HUB4)
LDC98S711997 English Broadcast News Speech (HUB4)
LDC98T281997 English Broadcast News Transcripts (HUB4)
LDC98S731997 Mandarin Broadcast News Speech (HUB4-NE)
LDC98T241997 Mandarin Broadcast News Transcripts (HUB4-NE)
LDC98S741997 Spanish Broadcast News Speech (HUB4-NE)
LDC98T291997 Spanish Broadcast News Transcripts (HUB4-NE)
LDC98S761998 Speaker Recognition Benchmark
LDC98L21COMLEX English Syntax Lexicon
LDC96T11COMLEX Syntax Text Corpus Version 2.0
LDC95S23CSR-III Speech
LDC95T6CSR-III Text
LDC98S67HTIMIT
LDC98S69HUB5 Mandarin Telephone Speech Corpus
LDC98T26HUB5 Mandarin Transcripts
LDC98S70HUB5 Spanish Telephone Speech Corpus
LDC98T27HUB5 Spanish Transcripts
LDC98T32JURIS
LDC95S22KING Speaker Verification
LDC98S68LLHDB
LDC98T30North American News Text Supplement
LDC98S75Switchboard-2 Phase I
LDC98S72Taiwanese Putonghua Speech and Transcripts
LDC98T25TDT Pilot Study Corpus
LDC98S77Voicemail Corpus Part I
LDC94S16YOHO Speaker Verification
1997
LDC97S661996 English Broadcast News Dev and Eval (HUB4)
LDC97S441996 English Broadcast News Speech (HUB4)
LDC97T221996 English Broadcast News Transcripts (HUB4)
LDC96S611996 Speaker Recognition Benchmark
LDC94S14AAir Traffic Control Complete
LDC96S36Boston University Radio Speech Corpus
LDC96S46CALLFRIEND American English-Non-Southern Dialect
LDC96S47CALLFRIEND American English-Southern Dialect
LDC96S48CALLFRIEND Canadian French
LDC96S49CALLFRIEND Egyptian Arabic
LDC96S50CALLFRIEND Farsi
LDC96S51CALLFRIEND German
LDC96S52CALLFRIEND Hindi
LDC96S53CALLFRIEND Japanese
LDC96S54CALLFRIEND Korean
LDC96S55CALLFRIEND Mandarin Chinese-Mainland Dialect
LDC96S56CALLFRIEND Mandarin Chinese-Taiwan Dialect
LDC96S57CALLFRIEND Spanish-Caribbean Dialect
LDC96S58CALLFRIEND Spanish-Non-Caribbean Dialect
LDC96S59CALLFRIEND Tamil
LDC96S60CALLFRIEND Vietnamese
LDC97L20CALLHOME American English Lexicon (PRONLEX)
LDC97S42CALLHOME American English Speech
LDC97T14CALLHOME American English Transcripts
LDC97S45CALLHOME Egyptian Arabic Speech
LDC97T19CALLHOME Egyptian Arabic Transcripts
LDC97L18CALLHOME German Lexicon
LDC97S43CALLHOME German Speech
LDC97T15CALLHOME German Transcripts
LDC96L17CALLHOME Japanese Lexicon
LDC96S37CALLHOME Japanese Speech
LDC96T18CALLHOME Japanese Transcripts
LDC96L15CALLHOME Mandarin Chinese Lexicon
LDC96S34CALLHOME Mandarin Chinese Speech
LDC96T16CALLHOME Mandarin Chinese Transcripts
LDC96L16CALLHOME Spanish Lexicon
LDC96S35CALLHOME Spanish Speech
LDC96T17CALLHOME Spanish Transcripts
LDC94S13ACSR-II (WSJ1) Complete
LDC94S13BCSR-II (WSJ1) Sennheiser
LDC97T12DSO Corpus of Sense-Tagged English
LDC99L22Egyptian Colloquial Arabic Lexicon
LDC95T20Hansard French/English
LDC96S64-1JEIDA/JCSD-Channel 0 City Names
LDC96S64JEIDA/JCSD-Channel 0 Complete
LDC96S64-2JEIDA/JCSD-Channel 0 Control Words
LDC96S64-4JEIDA/JCSD-Channel 0 Four Digit Sequences
LDC96S64-3JEIDA/JCSD-Channel 0 Isolated Digits
LDC96S64-5JEIDA/JCSD-Channel 0 Mono Syllables
LDC96S65-1JEIDA/JCSD-Channel 1 City Names
LDC96S65JEIDA/JCSD-Channel 1 Complete
LDC96S65-2JEIDA/JCSD-Channel 1 Control Words
LDC96S65-4JEIDA/JCSD-Channel 1 Four Digit Sequences
LDC96S65-3JEIDA/JCSD-Channel 1 Isolated Digits
LDC96S65-5JEIDA/JCSD-Channel 1 Mono Syllables
LDC95T13Mandarin Chinese News Text
LDC95T21North American News Text Corpus
LDC94S15SPIDRE
LDC97S62Switchboard-1 Release 2
LDC97S63The CMU Kids Corpus
1996
LDC96S611996 Speaker Recognition Benchmark
LDC96S36Boston University Radio Speech Corpus
LDC94S20BRAMSHILL
LDC96S46CALLFRIEND American English-Non-Southern Dialect
LDC96S47CALLFRIEND American English-Southern Dialect
LDC96S48CALLFRIEND Canadian French
LDC96S49CALLFRIEND Egyptian Arabic
LDC96S50CALLFRIEND Farsi
LDC96S51CALLFRIEND German
LDC96S52CALLFRIEND Hindi
LDC96S53CALLFRIEND Japanese
LDC96S54CALLFRIEND Korean
LDC96S55CALLFRIEND Mandarin Chinese-Mainland Dialect
LDC96S56CALLFRIEND Mandarin Chinese-Taiwan Dialect
LDC96S57CALLFRIEND Spanish-Caribbean Dialect
LDC96S58CALLFRIEND Spanish-Non-Caribbean Dialect
LDC96S59CALLFRIEND Tamil
LDC96S60CALLFRIEND Vietnamese
LDC97L20CALLHOME American English Lexicon (PRONLEX)
LDC96L17CALLHOME Japanese Lexicon
LDC96S37CALLHOME Japanese Speech
LDC96T18CALLHOME Japanese Transcripts
LDC96L15CALLHOME Mandarin Chinese Lexicon
LDC96S34CALLHOME Mandarin Chinese Speech
LDC96T16CALLHOME Mandarin Chinese Transcripts
LDC96L16CALLHOME Spanish Lexicon
LDC96S35CALLHOME Spanish Speech
LDC96T17CALLHOME Spanish Transcripts
LDC96L14CELEX2
LDC98L21COMLEX English Syntax Lexicon
LDC96T11COMLEX Syntax Text Corpus Version 2.0
LDC93S6ACSR-I (WSJ0) Complete
LDC93S6CCSR-I (WSJ0) Other
LDC93S6BCSR-I (WSJ0) Sennheiser
LDC96S33CSR-IV HUB3
LDC96S31CSR-IV HUB4
LDC96S30CTIMIT
LDC96S38DCIEM/HCRC
LDC95T11European Language Newspaper Text
LDC96S32FFMTIMIT
LDC96S29Frontiers in Speech Processing 93
LDC96S40Frontiers in Speech Processing 94
LDC95T20Hansard French/English
LDC93S12HCRC Map Task Corpus
LDC96S64-1JEIDA/JCSD-Channel 0 City Names
LDC96S64JEIDA/JCSD-Channel 0 Complete
LDC96S64-2JEIDA/JCSD-Channel 0 Control Words
LDC96S64-4JEIDA/JCSD-Channel 0 Four Digit Sequences
LDC96S64-3JEIDA/JCSD-Channel 0 Isolated Digits
LDC96S64-5JEIDA/JCSD-Channel 0 Mono Syllables
LDC96S65-1JEIDA/JCSD-Channel 1 City Names
LDC96S65JEIDA/JCSD-Channel 1 Complete
LDC96S65-2JEIDA/JCSD-Channel 1 Control Words
LDC96S65-4JEIDA/JCSD-Channel 1 Four Digit Sequences
LDC96S65-3JEIDA/JCSD-Channel 1 Isolated Digits
LDC96S65-5JEIDA/JCSD-Channel 1 Mono Syllables
LDC95T13Mandarin Chinese News Text
LDC96T10Message Understanding Conference (MUC) 6 Additional News Text
LDC95T21North American News Text Corpus
LDC93S3AResource Management Complete Set 2.0
LDC93S3BResource Management RM1 2.0
LDC93S3CResource Management RM2 2.0
LDC96S39RM Isolated and Spelled Word Data
LDC95T9Spanish News Text
LDC96S41VAHA (POLYPHONE II)
1995
LDC95S26ATIS3 Test Data
LDC97L20CALLHOME American English Lexicon (PRONLEX)
LDC96L14CELEX2
LDC98L21COMLEX English Syntax Lexicon
LDC95S23CSR-III Speech
LDC95T6CSR-III Text
LDC95T11European Language Newspaper Text
LDC95T20Hansard French/English
LDC95T8Japanese Business News Text
LDC95S22KING Speaker Verification
LDC95S28LATINO-40 Spanish Read News
LDC95T13Mandarin Chinese News Text
LDC95T21North American News Text Corpus
LDC95S27PhoneBook: NYNEX Isolated Words
LDC95T9Spanish News Text
LDC95S25TRAINS Spoken Dialog Corpus
LDC95T7Treebank-2
LDC95S24WSJCAM0 Cambridge Read News
1994
LDC94S14BAir Traffic Control BOS
LDC94S14AAir Traffic Control Complete
LDC94S14CAir Traffic Control DCA
LDC94S14DAir Traffic Control DFW
LDC94S19ATIS3 Training Data
LDC94S20BRAMSHILL
LDC97L20CALLHOME American English Lexicon (PRONLEX)
LDC98L21COMLEX English Syntax Lexicon
LDC94S13ACSR-II (WSJ1) Complete
LDC94S13CCSR-II (WSJ1) Other
LDC94S13BCSR-II (WSJ1) Sennheiser
LDC94T5ECI Multilingual Text
LDC94S21MACROPHONE
LDC94S17OGI Multilanguage Corpus
LDC94S18OGI Spelled and Spoken Word
LDC94S15SPIDRE
LDC94T4AUN Parallel Text (Complete)
LDC94T4B-1UN Parallel Text (English)
LDC94T4B-2UN Parallel Text (French)
LDC94T4B-3UN Parallel Text (Spanish)
LDC94S16YOHO Speaker Verification
1993
LDC93T1ACL/DCI
LDC93S4AATIS0 Complete
LDC93S4BATIS0 Pilot
LDC93S4B-2ATIS0 Read
LDC93S4B-3ATIS0 SD Read
LDC93S5ATIS2
LDC93S6ACSR-I (WSJ0) Complete
LDC93S6CCSR-I (WSJ0) Other
LDC93S6BCSR-I (WSJ0) Sennheiser
LDC93S12HCRC Map Task Corpus
LDC93S2NTIMIT
LDC93S3AResource Management Complete Set 2.0
LDC93S3BResource Management RM1 2.0
LDC93S3CResource Management RM2 2.0
LDC93S11Road Rally
LDC93S8Switchboard Credit Card
LDC97S62Switchboard-1 Release 2
LDC93S9TI 46-Word
LDC93S10TIDIGITS
LDC93S1WTIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)
LDC93S1TIMIT Acoustic-Phonetic Continuous Speech Corpus
LDC93T3ATIPSTER Complete
LDC93T3BTIPSTER Volume 1
LDC93T3CTIPSTER Volume 2
LDC93T3DTIPSTER Volume 3
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值