LDC(Linguistic Data Consortium)历年份数据汇总

2024

LDC2024T02        AIDA Scenario 1 Practice Topic Annotation

LDC2024T06        AIDA Scenario 2 Practice Topic Annotation

LDC2024T04        AIDA Scenario 2 Practice Topic Source Data

LDC2024T05        Automatic Content Extraction for Portuguese

LDC2024S04        BabyEars Affective Vocalizations

LDC2024S05        Call My Net 1

LDC2024S06        Diaspora Tibetan Speech

LDC2024S01        KASET - Kurmanji and Sorani Kurdish Speech and Transcripts

LDC2024T03        LoReHLT Hausa Representative Language Pack

LDC2024T01        LORELEI Farsi Representative Language Pack

LDC2024S03        RATS Low Speech Density

LDC2024S02        Second Language University Speech Intelligibility Corpus

2023

LDC2023V01        2019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual

LDC2023S03        2019 NIST Speaker Recognition Evaluation Test Set -- CTS Challenge

LDC2023S06        2019 OpenSAT Public Safety Communications Simulation

LDC2023T10        AIDA Scenario 1 and 2 Reference Knowledge Base

LDC2023T11        AIDA Scenario 1 Practice Topic Source Data

LDC2023S01        AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts

LDC2023S08        CALLFRIEND Russian Speech

LDC2023T09        CALLFRIEND Russian Text

LDC2023T04        DEFT English Light and Rich ERE Annotation

LDC2023S10        Kasdi-Merbah (University) Emotional Database in Arabic Speech

LDC2023S07        LDC Spoken Language Sampler - Sixth Release

LDC2023T07        LORELEI Indonesian Representative Language Pack

LDC2023T01        LORELEI Swahili Representative Language Pack

LDC2023T02        LORELEI Tagalog Representative Language Pack

LDC2023T03        LORELEI Tamil Representative Language Pack

LDC2023T08        LORELEI Thai Representative Language Pack

LDC2023T06        LORELEI Zulu Representative Language Pack

LDC2023S02        Mixer 3 Speech

LDC2023S04        Mixer 7 Spanish Speech

LDC2023L01        Moroccan Arabic - English Lexical Database

LDC2023T05        Penn Korean Universal Dependency Treebank

LDC2023S09        REMIX Telephone Collection

LDC2023S05        Samrómur Queries Icelandic Speech 1.0

LDC2023T13        TAC KBP Belief and Sentiment - Comprehensive Training and Evaluation Data 2016-2017

2022

LDC2022S10        2017 NIST Language Recognition Evaluation Training and Development Sets

LDC2022S01        2017 NIST OpenSAT Pilot - SSSF

LDC2022T02        AttImam

LDC2022T06        BOLT English Translation Treebank - Egyptian Arabic SMS/Chat

LDC2022T07        CAMIO Transcription Languages

LDC2022S13        Global TIMIT Thai

LDC2022V01        HAVIC MED Novel 1 Test -- Videos, Metadata and Annotation

LDC2022V02        HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation

LDC2022T05        LORELEI Bengali Representative Language Pack

LDC2022T01        LORELEI Kinyarwanda Incident Language Pack

LDC2022T03        LORELEI Wolof Representative Language Pack

LDC2022S08        MASRI Synthetic

LDC2022S04        NUBUC

LDC2022T04        Qatari Corpus of Argumentative Writing

LDC2022L01        Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon

LDC2022S11        Samrómur Children Icelandic Speech 1.0

LDC2022S05        Samrómur Icelandic Speech 1.0

LDC2022S06        Second DIHARD Challenge Evaluation - Eleven Sources

LDC2022S07        Second DIHARD Challenge Evaluation - SEEDLingS

LDC2022S03        Spoken Digits in Hindi and Indian English

LDC2022S02        The Child Subglottal Resonances Database

LDC2022S12        Third DIHARD Challenge Development

LDC2022S14        Third DIHARD Challenge Evaluation

LDC2022S09        Xi'an Guanzhong Object Naming

2021

LDC2021S01        Althingi Parliamentary Speech

LDC2021T04        ATIS - Seven Languages

LDC2021T07        BOLT Chinese Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

LDC2021T11        BOLT Chinese SMS/Chat Parallel Training Data

LDC2021T14        BOLT Egyptian Arabic Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

LDC2021T18        BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

LDC2021T15        BOLT Egyptian Arabic SMS/Chat Parallel Training Data

LDC2021T12        BOLT Egyptian Arabic Treebank - Conversational Telephone Speech

LDC2021T17        BOLT Egyptian Arabic Treebank - SMS/Chat

LDC2021T19        BOLT English Translation Treebank - Chinese SMS/Chat

LDC2021T03        BOLT English Treebank - SMS/Chat

LDC2021T13        Chinese Abstract Meaning Representation 2.0

LDC2021L01        Classical Arabic Dictionary

LDC2021S02        Columbia Games Corpus

LDC2021T16        DiscAlign for Penn and RST Discourse Treebanks

LDC2021T10        ESPADA

LDC2021S06        Ethnobotanical Research and Language Documentation of Nahuatl

LDC2021S03        Global TIMIT Mandarin Chinese

LDC2021V01        HAVIC MED Training Data -- Videos, Metadata and Annotation

LDC2021T02        LORELEI Akan Representative Language Pack

LDC2021S05        MyST Children's Conversational Speech

LDC2021T05        Penn Discourse Treebank Version 2.0 - German Translation

LDC2021S08        RATS Speaker Identification

LDC2021S10        Second DIHARD Challenge Development - Eleven Sources

LDC2021S11        Second DIHARD Challenge Development - SEEDLingS

LDC2021T08        TAC KBP English Sentiment Slot Filling -- Comprehensive Training and Evaluation Data 2013-2014

LDC2021T06        TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010

LDC2021S04        The SSNCE Database of Tamil Dysarthric Speech

LDC2021S09        UCLA Speaker Variability Database

LDC2021S07        Wikipedia Spanish Speech and Transcripts

LDC2021T09        X-SRL: Parallel Cross-lingual Semantic Role Labeling

2020

LDC2020S04        2018 NIST Speaker Recognition Evaluation Test Set

LDC2020T02        Abstract Meaning Representation (AMR) Annotation Release 3.0

LDC2020T07        Abstract Meaning Representation 2.0 - Four Translations

LDC2020T15        BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training

LDC2020T05        BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training

LDC2020T20        BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

LDC2020T21        BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

LDC2020T09        BOLT English Translation Treebank - Chinese Discussion Forum

LDC2020S08        CALLFRIEND American English-Southern Dialect Second Edition

LDC2020S06        CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition

LDC2020T01        Chinese CogBank

LDC2020L02        Chinese Lexical Resources for Gender, Number, Animacy

LDC2020T23        Corpus of Law, Academic, and News

LDC2020L01        Database of Word Level Statistics - Mandarin

LDC2020T19        DEFT Chinese Light and Rich ERE Annotation

LDC2020T06        EVALution

LDC2020S11        Global TIMIT Learner Simple English

LDC2020S09        Global TIMIT Learner Treebank English

LDC2020S12        Global TIMIT Mandarin Chinese-Guanzhong Dialect

LDC2020S02        IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b

LDC2020S07        IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b

LDC2020S10        IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b

LDC2020S01        LibriVox Spanish

LDC2020T10        LORELEI Entity Detection and Linking Knowledge Base

LDC2020T11        LORELEI Oromo Incident Language Pack

LDC2020T22        LORELEI Tigrinya Incident Language Pack

LDC2020T24        LORELEI Ukrainian Representative Language Pack

LDC2020T17        LORELEI Vietnamese Representative Language Pack

LDC2020T04        Machine Reading Phase 1 IC Training Data

LDC2020S03        Mixer 4 and 5 Speech

LDC2020S05        Multi-Language Conversational Telephone Speech 2011 -- Mandarin Chinese

LDC2020T16        Penn Parsed Corpora of Historical English

LDC2020S13        Phonemes of Arabic

LDC2020T12        SemTransCNC

LDC2020T14        Speech Sentiment Annotations

LDC2020T03        TAC KBP English Event Argument - Training and Evaluation Data 2014-2015

LDC2020T13        TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015

LDC2020T08        TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013

LDC2020T18        TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017

2019

LDC2019S20        2016 NIST Speaker Recognition Evaluation Test Set

LDC2019T01        BOLT Arabic Discussion Forum Parallel Training Data

LDC2019T13        BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training

LDC2019T18        BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training

LDC2019T06        BOLT Egyptian-English Word Alignment -- Discussion Forum Training

LDC2019T15        BOLT English Treebank - Discussion Forum

LDC2019S21        CALLFRIEND American English-Non-Southern Dialect Second Edition

LDC2019S18        CALLFRIEND Canadian French Second Edition

LDC2019S04        CALLFRIEND Egyptian Arabic Second Edition

LDC2019T07        Chinese Abstract Meaning Representation 1.0

LDC2019S07        CIEMPIESS Experimentation

LDC2019T11        Corpus of Conversational Persian Transcripts

LDC2019T03        DEFT Chinese Committed Belief Annotation

LDC2019T16        DEFT English Committed Belief Annotation

LDC2019T09        DEFT Spanish Committed Belief Annotation

LDC2019S09        First DIHARD Challenge Development - Eight Sources

LDC2019S10        First DIHARD Challenge Development - SEEDLingS

LDC2019S12        First DIHARD Challenge Evaluation - Nine Sources

LDC2019S13        First DIHARD Challenge Evaluation - SEEDLingS

LDC2019V01        HAVIC MED Progress Test -- Videos, Metadata and Annotation

LDC2019S22        IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b

LDC2019S08        IARPA Babel Guarani Language Pack IARPA-babel305b-v1.0c

LDC2019S16        IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c

LDC2019S03        IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b

LDC2019S17        LDC Spoken Language Sampler - Fifth Release

LDC2019T14        Machine Reading Phase 1 NFL Scoring Training Data

LDC2019S23        Magic Data Chinese Mandarin Conversational Speech

LDC2019S02        Multi-Language Conversational Telephone Speech 2011 -- Arabic Group

LDC2019S15        Multi-Language Conversational Telephone Speech 2011 -- East Asian

LDC2019S06        Multi-Language Conversational Telephone Speech 2011 -- English Group

LDC2019T04        Multilingual ATIS

LDC2019T05        Penn Discourse Treebank Version 3.0

LDC2019T10        Phrase Detectives Corpus Version 2

LDC2019S19        Polish Speech Database

LDC2019S01        SRI Speech-Based Collaborative Learning Corpus

LDC2019T08        TAC KBP Chinese Regular Slot Filling - Comprehensive Training and Evaluation Data 2014

LDC2019T17        TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017

LDC2019T19        TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017

LDC2019T02        TAC KBP Entity Discovery and Linking - Comprehensive Training and Evaluation Data 2014-2015

LDC2019T12        TAC KBP Evaluation Source Corpora 2016-2017

LDC2019S14        The DKU-JNU-EMA Electromagnetic Articulography Database

LDC2019S11        USC-SFI MALACH Interviews and Transcripts English – Speech Recognition Edition

LDC2019S05        VAST Chinese Speech and Transcripts

2018

LDC2018T08        2007 CoNLL Shared Task - Arabic & English

LDC2018T06        2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish

LDC2018T07        2007 CoNLL Shared Task - Greek, Hungarian & Italian

LDC2018S06        2011 NIST Language Recognition Evaluation Test Set

LDC2018S14        AISHELL-1

LDC2018S15        Avatar Education Portuguese

LDC2018T10        BOLT Arabic Discussion Forums

LDC2018T15        BOLT Chinese SMS/Chat

LDC2018T23        BOLT Egyptian Arabic Treebank - Discussion Forum

LDC2018T19        BOLT English SMS/Chat

LDC2018T18        BOLT Information Retrieval Comprehensive Training and Evaluation

LDC2018S09        CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition

LDC2018S11        CIEMPIESS Balance

LDC2018T20        Concretely Annotated English Gigaword

LDC2018T01        DEFT Spanish Treebank

LDC2018S01        DIRHA English WSJ Audio

LDC2018S05        GALE Phase 4 Arabic Broadcast News Speech

LDC2018T14        GALE Phase 4 Arabic Broadcast News Transcripts

LDC2018T05        H2, E2, ERK1 Children's Writing

LDC2018V01        HAVIC MED Event E051-E060 -- Videos, Metadata and Annotation

LDC2018S18        HUB5 Mandarin Telephone Speech and Transcripts Second Edition

LDC2018S07        IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b

LDC2018S13        IARPA Babel Kazakh Language Pack IARPA-babel302b-v1.0a

LDC2018S16        IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a

LDC2018S02        IARPA Babel Tok Pisin Language Pack IARPA-babel207b-v1.0e

LDC2018T04        LORELEI Amharic Representative Language Pack - Monolingual and Parallel Text

LDC2018T11        LORELEI Somali Representative Language Pack - Monolingual and Parallel Text

LDC2018S03        Multi-Language Conversational Telephone Speech 2011 -- Central Asian

LDC2018S08        Multi-Language Conversational Telephone Speech 2011 -- Central European

LDC2018S12        Multi-Language Conversational Telephone Speech 2011 -- Spanish

LDC2018S17        Nautilus Speaker Characterization

LDC2018S10        RATS Language Identification

LDC2018S04        Rhythm and Pitch

LDC2018T09        SPADE

LDC2018T03        TAC KBP Comprehensive English Source Corpora 2009-2014

LDC2018T16        TAC KBP English Entity Linking - Comprehensive Training and Evaluation Data 2009-2013

LDC2018T22        TAC KBP English Regular Slot Filling - Comprehensive Training and Evaluation Data 2009-2014

LDC2018T24        TAC Relation Extraction Dataset

LDC2018T13        TRAD Arabic-French Parallel Text -- Newsgroup

LDC2018T21        TRAD Arabic-French Parallel Text -- Newswire

LDC2018T02        TRAD Chinese-French Parallel Text -- Blog

LDC2018T17        TRAD Chinese-French Parallel Text -- Broadcast News

2017

LDC2017S06        2010 NIST Speaker Recognition Evaluation Test Set

LDC2017T13        2015-2016 CoNLL Shared Task

LDC2017T10        Abstract Meaning Representation (AMR) Annotation Release 2.0

LDC2017T14        Ancient Chinese Corpus

LDC2017L01        Arabic Speech Recognition Pronunciation Dictionary

LDC2017S21        ASpIRE Development and Development Test Sets

LDC2017T05        BOLT Chinese Discussion Forum Parallel Training Data

LDC2017T07        BOLT Egyptian Arabic SMS/Chat and Transliteration

LDC2017T11        BOLT English Discussion Forums

LDC2017S07        CHiME2 Grid

LDC2017S10        CHiME2 WSJ0

LDC2017S24        CHiME3

LDC2017S23        CIEMPIESS Light

LDC2017T15        English Web Treebank Propbank

LDC2017T03        First-Year Law Students' Court Memoranda

LDC2017T06        GALE English-Chinese Parallel Aligned Treebank -- Training

LDC2017T02        GALE Phase 3 and 4 Chinese Web Parallel Text

LDC2017S02        GALE Phase 3 Arabic Broadcast News Speech Part 2

LDC2017T04        GALE Phase 3 Arabic Broadcast News Transcripts Part 2

LDC2017S15        GALE Phase 4 Arabic Broadcast Conversation Speech

LDC2017T12        GALE Phase 4 Arabic Broadcast Conversation Transcripts

LDC2017S25        GALE Phase 4 Chinese Broadcast News Speech

LDC2017T18        GALE Phase 4 Chinese Broadcast News Transcripts

LDC2017S03        IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b

LDC2017S22        IARPA Babel Kurmanji Kurdish Language Pack IARPA-babel205b-v1.0a

LDC2017S08        IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a

LDC2017S05        IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d

LDC2017S13        IARPA Babel Tamil Language Pack IARPA-babel204b-v1.1b

LDC2017S01        IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7

LDC2017S19        IARPA Babel Zulu Language Pack IARPA-babel206b-v0.1e

LDC2017S12        KSUEmotions

LDC2017S16        LDC Spoken Language Sampler - Fourth Release

LDC2017S11        Metalogue Multi-Issue Bargaining Dialogue

LDC2017S14        Multi-Language Conversational Telephone Speech 2011 -- South Asian

LDC2017S09        Multi-Language Conversational Telephone Speech 2011 -- Turkish

LDC2017T01        MWE-Aware English Dependency Corpus

LDC2017T16        MWE-Aware English Dependency Corpus 2.0

LDC2017S04        Noisy TIMIT Speech

LDC2017T08        Phrase Detectives Corpus

LDC2017S20        RATS Keyword Spotting

LDC2017S18        SRI-FRTIV

LDC2017T17        TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014

LDC2017T09        The EventStatus Corpus

LDC2017V01        UCLA High-Speed Laryngeal Video and Audio

LDC2017S17        Vehicle City Voices Corpus – Part I

2016

LDC2016T02        Arabic Treebank - Weblog

LDC2016T18        ARL Arabic Dependency Treebank

LDC2016L01        Bamanankan Lexicon

LDC2016T05        BOLT Chinese Discussion Forums

LDC2016T19        BOLT Chinese-English Word Alignment and Tagging -- Discussion Forum Training

LDC2016T13        Chinese Treebank 9.0

LDC2016T22        Chinese-English Parallel Sentences Extracted from Patents

LDC2016S04        CHM150

LDC2016T07        DEFT Narrative Text

LDC2016S05        Digital Archive of Southern Speech - NLP Version

LDC2016T16        English Speed Networking Conversational Transcripts

LDC2016T08        GALE Phase 3 and 4 Arabic Web Parallel Text

LDC2016T09        GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text

LDC2016T15        GALE Phase 3 and 4 Chinese Broadcast News Parallel Text

LDC2016T25        GALE Phase 3 and 4 Chinese Newswire Parallel Text

LDC2016S01        GALE Phase 3 Arabic Broadcast Conversation Speech Part 2

LDC2016T06        GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 2

LDC2016S07        GALE Phase 3 Arabic Broadcast News Speech Part 1

LDC2016T17        GALE Phase 3 Arabic Broadcast News Transcripts Part 1

LDC2016T11        GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences

LDC2016T20        GALE Phase 4 Arabic Broadcast News Parallel Sentences

LDC2016T27        GALE Phase 4 Arabic Newswire Parallel Sentences

LDC2016T14        GALE Phase 4 Arabic Weblog Parallel Sentences

LDC2016S03        GALE Phase 4 Chinese Broadcast Conversation Speech

LDC2016T12        GALE Phase 4 Chinese Broadcast Conversation Transcripts

LDC2016T04        GALE Phase 4 Chinese Weblog Parallel Sentences

LDC2016T01        H1 Children's Writing

LDC2016V01        HAVIC Pilot Transcription

LDC2016S06        IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a

LDC2016S08        IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b

LDC2016S02        IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c

LDC2016S12        IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a

LDC2016S09        IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY

LDC2016S13        IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g

LDC2016S10        IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5

LDC2016T24        JANA: A Human-Human Dialogues Corpus for Egyptian Dialect

LDC2016T21        KAFD: Arabic Font Database

LDC2016S11        Multi-Language Conversational Telephone Speech 2011 -- Slavic Group

LDC2016T03        NewSoMe Corpus of Opinion in Blogs

LDC2016T23        Richer Event Description

LDC2016T10        SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing

LDC2016T26        TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014

2015

LDC2015T12        2006 CoNLL Shared Task - Arabic & Czech

LDC2015T11        2006 CoNLL Shared Task - Ten Languages

LDC2015T20        ACE 2007 Spanish DevTest - Pilot Evaluation

LDC2015S10        Arabic Learner Corpus

LDC2015S12        Articulation Index LSCP

LDC2015T03        Avocado Research Email Collection

LDC2015S07        CIEMPIESS

LDC2015T08        Coordination Annotation for the Penn Treebank

LDC2015T13        English News Text Treebank: Penn Treebank Revised

LDC2015T06        GALE Chinese-English Parallel Aligned Treebank -- Training

LDC2015T04        GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3

LDC2015T18        GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 4

LDC2015S01        GALE Phase 2 Arabic Broadcast News Speech Part 2

LDC2015T01        GALE Phase 2 Arabic Broadcast News Transcripts Part 2

LDC2015T05        GALE Phase 3 and 4 Arabic Broadcast Conversation Parallel Text

LDC2015T07        GALE Phase 3 and 4 Arabic Broadcast News Parallel Text

LDC2015T19        GALE Phase 3 and 4 Arabic Newswire Parallel Text

LDC2015S11        GALE Phase 3 Arabic Broadcast Conversation Speech Part 1

LDC2015T16        GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 1

LDC2015S06        GALE Phase 3 Chinese Broadcast Conversation Speech Part 2

LDC2015T09        GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 2

LDC2015S13        GALE Phase 3 Chinese Broadcast News Speech

LDC2015T25        GALE Phase 3 Chinese Broadcast News Transcripts

LDC2015T14        GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences

LDC2015T21        GALE Phase 4 Chinese Broadcast News Parallel Sentences

LDC2015T24        GALE Phase 4 Chinese Newswire Parallel Sentences

LDC2015T22        Karlsruhe Children's Text

LDC2015T23        KHATT: Handwritten Arabic Text

LDC2015S09        LDC Spoken Language Sampler - Third Release

LDC2015S05        Mandarin Chinese Phonetic Segmentation and Tone

LDC2015S04        Mandarin-English Code-Switching in South-East Asia

LDC2015T17        NewSoMe Corpus of Opinion in News Reports

LDC2015S02        RATS Speech Activity Detection

LDC2015T10        RST Signalling Corpus

LDC2015T02        SenSem Databank

LDC2015L01        SenSem Lexicons

LDC2015S03        The Subglottal Resonances Database

LDC2015S08        The Walking Around Corpus

LDC2015T15        TS Wikipedia

2014

LDC2014S06        2009 NIST Language Recognition Evaluation Test Set

LDC2014T12        Abstract Meaning Representation (AMR) Annotation Release 1.0

LDC2014T18        ACE 2007 Multilingual Training Corpus

LDC2014T24        Boulder Lies and Truth

LDC2014S01        CALLFRIEND Farsi Second Edition Speech

LDC2014T01        CALLFRIEND Farsi Second Edition Transcripts

LDC2014T21        Chinese Discourse Treebank 0.5

LDC2014T07        Domain-Specific Hyponym Relations

LDC2014T06        ETS Corpus of Non-Native Written English

LDC2014T23        Fisher and CALLHOME Spanish--English Speech Translation

LDC2014T03        GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 2

LDC2014T08        GALE Arabic-English Parallel Aligned Treebank -- Web Training

LDC2014T19        GALE Arabic-English Word Alignment -- Broadcast Training Part 1

LDC2014T22        GALE Arabic-English Word Alignment -- Broadcast Training Part 2

LDC2014T05        GALE Arabic-English Word Alignment Training Part 1 -- Newswire and Web

LDC2014T10        GALE Arabic-English Word Alignment Training Part 2 -- Newswire

LDC2014T14        GALE Arabic-English Word Alignment Training Part 3 -- Web

LDC2014T25        GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2

LDC2014S07        GALE Phase 2 Arabic Broadcast News Speech Part 1

LDC2014T17        GALE Phase 2 Arabic Broadcast News Transcripts Part 1

LDC2014T04        GALE Phase 2 Chinese Broadcast News Parallel Text Part 1

LDC2014T11        GALE Phase 2 Chinese Broadcast News Parallel Text Part 2

LDC2014T15        GALE Phase 2 Chinese Newswire Parallel Text Part 1

LDC2014T20        GALE Phase 2 Chinese Newswire Parallel Text Part 2

LDC2014T26        GALE Phase 2 Chinese Web Parallel Text

LDC2014S09        GALE Phase 3 Chinese Broadcast Conversation Speech Part 1

LDC2014T28        GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 1

LDC2014S05        Hispanic-English Database

LDC2014T09        HyTER Networks of Selected OpenMT08/09 Sentences

LDC2014S02        King Saud University Arabic Speech Database

LDC2014T13        MADCAT Chinese Pilot Training Set

LDC2014S03        Multi-Channel WSJ Audio

LDC2014T02        NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source

LDC2014T16        TAC KBP Reference Knowledge Base

LDC2014S08        United Nations Proceedings Speech

LDC2014S04        USC-SFI MALACH Interviews and Transcripts Czech

2013

LDC2013T06        1993-2007 United Nations Parallel Text

LDC2013T13        Chinese Proposition Bank 3.0

LDC2013T21        Chinese Treebank 8.0

LDC2013T02        Chinese-English Biology and Chemistry Abstract Parallel Text

LDC2013S09        CSC Deceptive Speech

LDC2013T14        GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 1

LDC2013T10        GALE Arabic-English Parallel Aligned Treebank -- Newswire

LDC2013T23        GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 1

LDC2013T05        GALE Chinese-English Word Alignment and Tagging Training Part 4 -- Web

LDC2013S02        GALE Phase 2 Arabic Broadcast Conversation Speech Part 1

LDC2013S07        GALE Phase 2 Arabic Broadcast Conversation Speech Part 2

LDC2013T04        GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1

LDC2013T17        GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2

LDC2013T01        GALE Phase 2 Arabic Web Parallel Text

LDC2013T11        GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 1

LDC2013T16        GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 2

LDC2013S04        GALE Phase 2 Chinese Broadcast Conversation Speech

LDC2013T08        GALE Phase 2 Chinese Broadcast Conversation Transcripts

LDC2013S08        GALE Phase 2 Chinese Broadcast News Speech

LDC2013T20        GALE Phase 2 Chinese Broadcast News Transcripts

LDC2013S05        Greybeard

LDC2013S06        LDC Spoken Language Sampler - Second Release

LDC2013T09        MADCAT Phase 2 Training Set

LDC2013T15        MADCAT Phase 3 Training Set

LDC2013L01        Maninkakan Lexicon

LDC2013T12        Manually Annotated Sub-Corpus Third Release

LDC2013S03        Mixer 6 Speech

LDC2013T07        NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets

LDC2013T03        NIST 2012 Open Machine Translation (OpenMT) Evaluation

LDC2013T19        OntoNotes Release 5.0

LDC2013T18        Semantic Textual Similarity (STS) 2013 Machine Translation

LDC2013T22        The ARRAU Corpus of Anaphoric Information

2012

LDC2012V01        2005 NIST/USF Evaluation Resources for the VACE Program - Broadcast News

LDC2012S01        2006 NIST Speaker Recognition Evaluation Test Set Part 2

LDC2012T03        2009 CoNLL Shared Task Part 1

LDC2012T04        2009 CoNLL Shared Task Part 2

LDC2012T11        American English Nickname Collection

LDC2012T21        Annotated English Gigaword

LDC2012T07        Arabic Treebank - Broadcast News v1.0

LDC2012T09        Arabic-Dialect/English Parallel Text

LDC2012T10        Catalan TimeBank 1.0

LDC2012T05        Chinese Dependency Treebank 1.0

LDC2012T22        Chinese-English Semiconductor Parallel Text

LDC2012S03        Digital Archive of Southern Speech

LDC2012T02        English Translation Treebank: An-Nahar Newswire

LDC2012T13        English Web Treebank

LDC2012T16        GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web

LDC2012T20        GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire

LDC2012T24        GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web

LDC2012T06        GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 1

LDC2012T14        GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 2

LDC2012T18        GALE Phase 2 Arabic Broadcast News Parallel Text

LDC2012T17        GALE Phase 2 Arabic Newswire Parallel Text

LDC2012T15        MADCAT Phase 1 Training Set

LDC2012S04        Malto Speech and Transcripts

LDC2012T01        ModeS TimeBank 1.0

LDC2012T08        Prague Czech-English Dependency Treebank 2.0

LDC2012T23        Russian-English Computer Security Parallel Text

LDC2012T12        Spanish TimeBank 1.0

LDC2012S02        TORGO Database of Dysarthric Articulation

LDC2012S06        Turkish Broadcast News Speech and Transcripts

LDC2012S05        USC-SFI MALACH Interviews and Transcripts English

2011

LDC2011S04        2005 NIST Speaker Recognition Evaluation Test Data

LDC2011S01        2005 NIST Speaker Recognition Evaluation Training Data

LDC2011S06        2005 Spring NIST Rich Transcription (RT-05S) Evaluation Set

LDC2011S10        2006 NIST Speaker Recognition Evaluation Test Set Part 1

LDC2011S09        2006 NIST Speaker Recognition Evaluation Training Set

LDC2011S02        2006 NIST Spoken Term Detection Development Set

LDC2011S03        2006 NIST Spoken Term Detection Evaluation Set

LDC2011V05        2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1

LDC2011V06        2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2

LDC2011S11        2008 NIST Speaker Recognition Evaluation Supplemental Set

LDC2011S08        2008 NIST Speaker Recognition Evaluation Test Set

LDC2011S05        2008 NIST Speaker Recognition Evaluation Training Set Part 1

LDC2011S07        2008 NIST Speaker Recognition Evaluation Training Set Part 2

LDC2011T05        2008/2010 NIST Metrics for Machine Translation (MetricsMaTr) GALE Evaluation Set

LDC2011T02        ACE 2005 English SpatialML Annotations Version 2

LDC2011T11        Arabic Gigaword Fifth Edition

LDC2011T09        Arabic Treebank: Part 2 v 3.1

LDC2011T06        Broadcast News Lattices

LDC2011T13        Chinese Gigaword Fifth Edition

LDC2011T08        Datasets for Generic Relation Extraction (reACE)

LDC2011T07        English Gigaword Fifth Edition

LDC2011T10        French Gigaword Third Edition

LDC2011T04        Indian Language Part-of-Speech Tagset: Sanskrit

LDC2011V03        NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1

LDC2011V04        NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2

LDC2011V01        NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 1

LDC2011V02        NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 2

LDC2011T03        OntoNotes Release 4.0

LDC2011T01        SemEval-2010 Task 1 OntoNotes English: Coreference Resolution in Multiple Languages

LDC2011T12        Spanish Gigaword Third Edition

2010

LDC2010S03        2003 NIST Speaker Recognition Evaluation

LDC2010T09        ACE 2005 Mandarin SpatialML Annotations

LDC2010T18        ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0

LDC2010T13        Arabic Treebank: Part 1 v 4.1

LDC2010T08        Arabic Treebank: Part 3 v 3.2

LDC2010S05        Asian Elephant Vocalizations

LDC2010S07        Asian Spoken Language Sampler

LDC2010T07        Chinese Treebank 7.0

LDC2010T06        Chinese Web 5-gram Version 1

LDC2010T02        Czech Broadcast News MDE Transcripts

LDC2010T04        Fisher Spanish - Transcripts

LDC2010S01        Fisher Spanish Speech

LDC2010T03        GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2

LDC2010T16        Indian Language Part-of-Speech Tagset: Bengali

LDC2010T24        Indian Language Part-of-Speech Tagset: Hindi

LDC2010T19        Korean Newswire Second Edition

LDC2010L01        LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1

LDC2010T22        Manually Annotated Sub-Corpus First Release

LDC2010T15        Message Understanding Conference 7 Timed (MUC7_T)

LDC2010T10        NIST 2002 Open Machine Translation (OpenMT) Evaluation

LDC2010T11        NIST 2003 Open Machine Translation (OpenMT) Evaluation

LDC2010T12        NIST 2004 Open Machine Translation (OpenMT) Evaluation

LDC2010T14        NIST 2005 Open Machine Translation (OpenMT) Evaluation

LDC2010T17        NIST 2006 Open Machine Translation (OpenMT) Evaluation

LDC2010T21        NIST 2008 Open Machine Translation (OpenMT) Evaluation

LDC2010T23        NIST 2009 Open Machine Translation (OpenMT) Evaluation

LDC2010T01        NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations

LDC2010T05        NPS Internet Chatroom Conversations, Release 1.0

LDC2010V01        TRECVID 2004 Keyframes & Transcripts

LDC2010V02        TRECVID 2006 Keyframes

LDC2010S02        WTIMIT 1.0

2009

LDC2009S05        2007 NIST Language Recognition Evaluation Supplemental Training Set

LDC2009S04        2007 NIST Language Recognition Evaluation Test Set

LDC2009T12        2008 CoNLL Shared Task Data

LDC2009T05        2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data

LDC2009T29        ACL Anthology Reference Corpus

LDC2009L01        An English Dictionary of the Tamil Verb Second Edition

LDC2009T30        Arabic Gigaword Fourth Edition

LDC2009T22        Arabic Newswire English Translation Collection

LDC2009V01        Audiovisual Database of Spoken American English

LDC2009T04        BioProp Version 1.0

LDC2009T27        Chinese Gigaword Fourth Edition

LDC2009S01        CSLU: Numbers Version 1.3

LDC2009S03        CSLU: S4X Release 1.2

LDC2009T20        Czech Broadcast Conversation MDE Transcripts

LDC2009S02        Czech Broadcast Conversation Speech

LDC2009T01        English CTS Treebank with Structural Metadata

LDC2009T13        English Gigaword Fourth Edition

LDC2009T23        FactBank 1.0

LDC2009T28        French Gigaword Second Edition

LDC2009T03        GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1

LDC2009T09        GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2

LDC2009T02        GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1

LDC2009T06        GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2

LDC2009T15        GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1

LDC2009T08        Japanese Web N-gram Version 1

LDC2009T10        Language Understanding Annotation Corpus

LDC2009T26        NXT Switchboard Annotations

LDC2009T24        OntoNotes Release 3.0

LDC2009T11        REFLEX Entity Translation Training/DevTest

LDC2009T21        Spanish Gigaword Second Edition

LDC2009T14        Tagged Chinese Gigaword Version 2.0

LDC2009T07        Unified Linguistic Annotation Text Collection

LDC2009T25        Web 1T 5-gram, 10 European Languages Version 1

2008

LDC2008S05        2005 NIST Language Recognition Evaluation

LDC2008T03        ACE 2005 English SpatialML Annotations

LDC2008L01        An English Dictionary of the Tamil Verb

LDC2008T25        AQUAINT-2 Information-Retrieval Text Research Collection

LDC2008T13        BLLIP North American News Text, Complete

LDC2008T14        BLLIP North American News Text, General Release

LDC2008T17        CALLHOME Mandarin Chinese Transcripts - XML version

LDC2008S09        CHAracterizing INdividual Speakers (CHAINS)

LDC2008T07        Chinese Proposition Bank 2.0

LDC2008T24        COMNOM v 1.0

LDC2008S06        CSLU: Alphadigit Version 1.3

LDC2008S07        CSLU: ISOLET Spoken Letter Database Version 1.3

LDC2008S02        CSLU: National Cellular Telephone Speech Release 2.3

LDC2008S01        CSLU: Portland Cellular Telephone Speech Version 1.3

LDC2008T22        Czech Academic Corpus 2.0

LDC2008T02        GALE Phase 1 Arabic Blog Parallel Text

LDC2008T09        GALE Phase 1 Arabic Broadcast News Parallel Text - Part 2

LDC2008T06        GALE Phase 1 Chinese Blog Parallel Text

LDC2008T08        GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2

LDC2008T18        GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3

LDC2008L03        Global Yoruba Lexical Database v. 1.0

LDC2008L02        Hindi WordNet

LDC2008T01        Hungarian-English Parallel Text, Version 1.0

LDC2008S08        LDC Spoken Language Sampler

LDC2008T23        NomBank v 1.0

LDC2008T15        North American News Text, Complete

LDC2008T16        North American News Text, General Release

LDC2008T04        OntoNotes Release 2.0

LDC2008T05        Penn Discourse Treebank Version 2.0

LDC2008T20        PennBioIE CYP 1.0

LDC2008T21        PennBioIE Oncology 1.0

LDC2008S03        STC-TIMIT 1.0

LDC2008S04        West Point Brazilian Portuguese Speech

2007

LDC2007T22        2001 Topic Annotated Enron Email Data Set

LDC2007S10        2003 NIST Rich Transcription Evaluation Data

LDC2007S12        2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data

LDC2007S11        2004 Spring NIST Rich Transcription (RT-04S) Development Data

LDC2007T40        Arabic Gigaword Third Edition

LDC2007S03        ARL Urdu Speech Database, Training Data

LDC2007T38        Chinese Gigaword Third Edition

LDC2007T36        Chinese Treebank 6.0

LDC2007S08        CSLU: Foreign Accented English Release 1.2

LDC2007S18        CSLU: Kids` Speech Version 1.1

LDC2007S13        CSLU: Apple Words and Phrases

LDC2007S05        CSLU: Yes/No Version 1.2

LDC2007T02        English Chinese Translation Treebank v 1.0

LDC2007T07        English Gigaword Third Edition

LDC2007S02        Fisher Levantine Arabic Conversational Telephone Speech

LDC2007T04        Fisher Levantine Arabic Conversational Telephone Speech, Transcripts

LDC2007T24        GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1

LDC2007T23        GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1

LDC2007T20        GALE Phase 1 Distillation Training

LDC2007T08        ISI Arabic-English Automatically Extracted Parallel Text

LDC2007T09        ISI Chinese-English Automatically Extracted Parallel Text

LDC2007S01        Levantine Arabic Conversational Telephone Speech

LDC2007T01        Levantine Arabic Conversational Telephone Speech, Transcripts

LDC2007S09        Mandarin Affective Speech

LDC2007T19        MITRE 1997 Mandarin Broadcast News Speech Translations (HUB-4NE)

LDC2007S15        Nationwide Speech Project

LDC2007T21        OntoNotes Release 1.0

LDC2007T03        Tagged Chinese Gigaword

LDC2007V02        TRECVID 2003 Keyframes & Transcripts

LDC2007V01        TRECVID 2005 Keyframes & Transcripts

2006

LDC2006S31        2003 NIST Language Recognition Evaluation

LDC2006S44        2004 NIST Speaker Recognition Evaluation

LDC2006T06        ACE 2005 Multilingual Training Corpus

LDC2006S46        Arabic Broadcast News Speech

LDC2006T20        Arabic Broadcast News Transcripts

LDC2006T02        Arabic Gigaword Second Edition

LDC2006S15        CSLU: Spelled and Spoken Words

LDC2006S14        CSLU: Stories v 1.2

LDC2006S35        CSLU: Multilanguage Telephone Speech Version 1.2

LDC2006S39        CSLU: Names Release 1.3

LDC2006S26        CSLU: Speaker Recognition Version 1.1

LDC2006S16        CSLU: Spoltech Brazilian Portuguese Version 1.0

LDC2006S01        CSLU: Voices

LDC2006T10        English-Arabic Treebank v 1.0

LDC2006T17        French Gigaword First Edition

LDC2006S43        Gulf Arabic Conversational Telephone Speech

LDC2006T15        Gulf Arabic Conversational Telephone Speech, Transcripts

LDC2006S45        Iraqi Arabic Conversational Telephone Speech

LDC2006T16        Iraqi Arabic Conversational Telephone Speech, Transcripts

LDC2006S42        Korean Broadcast News Speech

LDC2006T14        Korean Broadcast News Transcripts

LDC2006T03        Korean Propbank

LDC2006T09        Korean Treebank Annotations Version 2.0

LDC2006S29        Levantine Arabic QT Training Data Set 5, Speech

LDC2006T07        Levantine Arabic QT Training Data Set 5, Transcripts

LDC2006S33        Middle East Technical University Turkish Microphone Speech v 1.0

LDC2006T04        Multiple-Translation Chinese (MTC) Part 4

LDC2006S13        N4 NATO Native and Non-Native Speech

LDC2006T01        Prague Dependency Treebank 2.0

LDC2006S34        Russian through Switched Telephone Network (RuSTeN)

LDC2006T12        Spanish Gigaword First Edition

LDC2006S30        Speech Controlled Computing

LDC2006T18        TDT5 Multilingual Text

LDC2006T19        TDT5 Topics and Annotations

LDC2006T08        TimeBank 1.2

LDC2006T13        Web 1T 5-gram Version 1

LDC2006S37        West Point Heroico Spanish Speech

LDC2006S36        West Point Korean Speech

2005

LDC2005T09        ACE 2004 Multilingual Training Corpus

LDC2005T07        ACE Time Normalization (TERN) 2004 English Training Data v 1.0

LDC2005T35        American National Corpus (ANC) Second Release

LDC2005S07        Arabic CTS Levantine Fisher Training Data Set 3, Speech

LDC2005T03        Arabic CTS Levantine Fisher Training Data Set 3, Transcripts

LDC2005T02        Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis)

LDC2005T20        Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis)

LDC2005T30        Arabic Treebank: Part 4 v 1.0 (MPG Annotation)

LDC2005S22        Articulation Index

LDC2005T33        BBN Pronoun Coreference and Entity Type Corpus

LDC2005S08        BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts

LDC2005T13        CCGbank

LDC2005T34        Chinese <-> English Name Entity Lists v 1.0

LDC2005T10        Chinese English News Magazine Parallel Text

LDC2005T14        Chinese Gigaword Second Edition

LDC2005T06        Chinese News Translation Text Part 1

LDC2005T23        Chinese Proposition Bank 1.0

LDC2005T01        Chinese Treebank 5.0

LDC2005S26        CSLU: 22 Languages Corpus

LDC2005T08        Discourse Graphbank

LDC2005T12        English Gigaword Second Edition

LDC2005S13        Fisher English Training Part 2, Speech

LDC2005T19        Fisher English Training Part 2, Transcripts

LDC2005T28        HARD 2004 Text

LDC2005T29        HARD 2004 Topics and Annotations

LDC2005S15        HKUST Mandarin Telephone Speech, Part 1

LDC2005T32        HKUST Mandarin Telephone Transcript Data, Part 1

LDC2005S14        Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)

LDC2005L01        Mawukakan Lexicon

LDC2005T05        Multiple-Translation Arabic (MTA) Part 2

LDC2005S16        RT-04 MDE Training Data Speech

LDC2005T24        RT-04 MDE Training Data Text/Annotations

LDC2005S25        Santa Barbara Corpus of Spoken American English Part IV

LDC2005S11        TDT4 Multilingual Broadcast News Speech Corpus

LDC2005T16        TDT4 Multilingual Text and Annotations

LDC2005S30        West Point Company G3 American English Speech

LDC2005S28        West Point Croatian Speech

2004

LDC2004T15        2000 Communicator Dialogue Act Tagged

LDC2004T16        2001 Communicator Dialogue Act Tagged

LDC2004S04        2002 NIST Speaker Recognition Evaluation

LDC2004S11        2002 Rich Transcription Broadcast News and Conversational Telephone Speech

LDC2004T18        Arabic English Parallel News Part 1

LDC2004T17        Arabic News Translation Text Part 1

LDC2004T02        Arabic Treebank: Part 2 v 2.0

LDC2004T11        Arabic Treebank: Part 3 v 1.0

LDC2004L02        Buckwalter Arabic Morphological Analyzer Version 2.0

LDC2004T05        Chinese Treebank 4.0

LDC2004S01        Czech Broadcast News Speech

LDC2004T01        Czech Broadcast News Transcripts

LDC2004S13        Fisher English Training Speech Part 1 Speech

LDC2004T19        Fisher English Training Speech Part 1 Transcripts

LDC2004V01        FORM1 Kinematic Gesture

LDC2004T08        Hong Kong Parallel Text

LDC2004S02        ICSI Meeting Speech

LDC2004T04        ICSI Meeting Transcripts

LDC2004S05        ISL Meeting Speech Part 1

LDC2004T10        ISL Meeting Transcripts Part 1

LDC2004L01        Klex: Finite-State Lexical Transducer for Korean

LDC2004T03        Morphologically Annotated Korean Text

LDC2004T07        Multiple-Translation Chinese (MTC) Part 3

LDC2004S09        NIST Meeting Pilot Corpus Speech

LDC2004T13        NIST Meeting Pilot Corpus Transcripts and Metadata

LDC2004T23        Prague Arabic Dependency Treebank 1.0

LDC2004T25        Prague Czech-English Dependency Treebank 1.0

LDC2004T14        Proposition Bank I

LDC2004S08        RT-03 MDE Training Data Speech

LDC2004T12        RT-03 MDE Training Data Text and Annotations

LDC2004S10        Santa Barbara Corpus of Spoken American English Part III

LDC2004S07        Switchboard Cellular Part 2 Audio

LDC2004S12        TalkBank Ethology Data: Field Recordings of Vervet Monkey Calls

LDC2004T09        TIDES Extraction (ACE) 2003 Multilingual Training Data

2003

LDC2003T03        1997 HUB5 German Transcripts

LDC2003T04        1997 HUB5 Spanish Transcripts

LDC2003T02        1998 HUB5 English Transcripts

LDC2003S01        2001 Communicator Evaluation

LDC2003T01        2001 HUB5 Mandarin Transcripts

LDC2003T11        ACE-2 Version 1.0

LDC2003T12        Arabic Gigaword

LDC2003T07        Arabic Treebank: Part 1 - 10K-word English Translation

LDC2003T06        Arabic Treebank: Part 1 v 2.0

LDC2003T09        Chinese Gigaword

LDC2003T05        English Gigaword

LDC2003V01        FORM2 Kinematic Gesture

LDC2003L01        Grassfields Bantu Fieldwork: Dschang Lexicon

LDC2003S02        Grassfields Bantu Fieldwork: Dschang Tone Paradigms

LDC2003S07        Korean Telephone Conversations Complete Set

LDC2003L02        Korean Telephone Conversations Lexicon

LDC2003S03        Korean Telephone Conversations Speech

LDC2003T08        Korean Telephone Conversations Transcripts

LDC2003T13        Message Understanding Conference (MUC) 6

LDC2003T18        Multiple-Translation Arabic (MTA) Part 1

LDC2003T17        Multiple-Translation Chinese (MTC) Part 2

LDC2003T10        SAID

LDC2003S06        Santa Barbara Corpus of Spoken American English Part II

LDC2003T15        SLX Corpus of Classic Sociolinguistic Interviews

LDC2003T16        SummBank 1.0

LDC2003S05        West Point Russian Speech

2002

LDC2002S11        1997 HUB4 English Evaluation Speech and Transcripts

LDC2002S22        1997 HUB5 Arabic Evaluation

LDC2002T39        1997 HUB5 Arabic Transcripts

LDC2002S23        1997 HUB5 English Evaluation

LDC2002S24        1997 HUB5 German Evaluation

LDC2003T03        1997 HUB5 German Transcripts

LDC2002S25        1997 HUB5 Spanish Evaluation

LDC2003T04        1997 HUB5 Spanish Transcripts

LDC2002S10        1998 HUB5 English Evaluation

LDC2003T02        1998 HUB5 English Transcripts

LDC2002S56        2000 Communicator Evaluation

LDC2002S09        2000 HUB5 English Evaluation Speech

LDC2002T43        2000 HUB5 English Evaluation Transcripts

LDC2002S13        2001 HUB5 English Evaluation

LDC2002S12        2001 HUB5 Mandarin Evaluation

LDC2003T01        2001 HUB5 Mandarin Transcripts

LDC2002S34        2001 NIST Speaker Recognition Evaluation Corpus

LDC2002L49        Buckwalter Arabic Morphological Analyzer Version 1.0

LDC2002S37        CALLHOME Egyptian Arabic Speech Supplement

LDC2002T38        CALLHOME Egyptian Arabic Transcripts Supplement

LDC2002L27        Chinese-English Translation Lexicon Version 3.0

LDC2002S28        Emotional Prosody Speech and Transcripts

LDC2001S16        Grassfields Bantu Fieldwork: Ngomba Tone Paradigms

LDC2002T26        Korean English Treebank Annotations

LDC2002T01        Multiple-Translation Chinese Corpus

LDC2002T07        RST Discourse Treebank

LDC2001S08        Speech in Noisy Environments (SPINE2) Part 3 Audio

LDC2001T09        Speech in Noisy Environments (SPINE2) Part 3 Transcripts

LDC2002S06        Switchboard-2 Phase III Audio

LDC2002T31        The AQUAINT Corpus of English News Text

LDC2002S04        Translanguage English Database (TED) Speech

LDC2002T03        Translanguage English Database (TED) Transcripts

LDC2002S35        Voicemail Corpus Part II

LDC2002S02        West Point Arabic Speech

2001

LDC2001S91        1997 HUB4 Broadcast News Evaluation Non-English Test Material

LDC2001S97        2000 NIST Speaker Recognition Evaluation

LDC2001T55        Arabic Newswire Part 1

LDC2001T61        CALLHOME Spanish Dialogue Act Annotation

LDC2001T62        CETEMpublico

LDC2001T11        Chinese Treebank 2.0

LDC2001S16        Grassfields Bantu Fieldwork: Ngomba Tone Paradigms

LDC2001T02        Message Understanding Conference (MUC) 7

LDC2001T10        Prague Dependency Treebank 1.0

LDC2001S04        Speech in Noisy Environments (SPINE2) Part 1 Audio

LDC2001T05        Speech in Noisy Environments (SPINE2) Part 1 Transcripts

LDC2001S06        Speech in Noisy Environments (SPINE2) Part 2 Audio

LDC2001T07        Speech in Noisy Environments (SPINE2) Part 2 Transcripts

LDC2001S08        Speech in Noisy Environments (SPINE2) Part 3 Audio

LDC2001T09        Speech in Noisy Environments (SPINE2) Part 3 Transcripts

LDC2001S99        Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio

LDC2001S13        Switchboard Cellular Part 1 Audio

LDC2001S15        Switchboard Cellular Part 1 Transcribed Audio

LDC2001T14        Switchboard Cellular Part 1 Transcription

LDC2001T60        Syllable-Final /s/ Lenition

LDC2001S93        TDT2 Mandarin Audio Corpus

LDC2001T57        TDT2 Multilanguage Text Version 4.0

LDC2001S94        TDT3 English Audio

LDC2001S95        TDT3 Mandarin Audio

LDC2001T58        TDT3 Multilanguage Text Version 2.0

2000

LDC2000S86        1998 HUB4 Broadcast News Evaluation English Test Material

LDC2000S88        1999 HUB4 Broadcast News Evaluation English Test Material

LDC2000T43        BLLIP 1987-89 WSJ Corpus Release 1

LDC2000T50        Hong Kong Hansards Parallel Text

LDC2000T47        Hong Kong Laws Parallel Text

LDC2000T46        Hong Kong News Parallel Text

LDC2000T45        Korean Newswire

LDC2000S85        Santa Barbara Corpus of Spoken American English Part I

LDC2000S96        Speech in Noisy Environments (SPINE) Evaluation Audio

LDC2000T54        Speech in Noisy Environments (SPINE) Evaluation Transcripts

LDC2000S87        Speech in Noisy Environments (SPINE) Training Audio

LDC2000T49        Speech in Noisy Environments (SPINE) Training Transcripts

LDC2000S92        TDT2 Careful Transcription Audio

LDC2000T44        TDT2 Careful Transcription Text

LDC2000T52        TREC Mandarin

LDC2000T51        TREC Spanish

LDC2000S89        Voice of America (VOA) Czech Broadcast News Audio

LDC2000T53        Voice of America (VOA) Czech Broadcast News Transcripts

1999

LDC99S80        1997 Speaker Recognition Benchmark

LDC99S81        1999 Speaker Recognition Benchmark

LDC99L23        American English Spoken Lexicon

LDC99L22        Egyptian Colloquial Arabic Lexicon

LDC99T34        Japanese Business News Text Supplement

LDC99T40        Portuguese Newswire Text

LDC99T41        Spanish Newswire Text, Volume 2

LDC99S78        SUSAS

LDC99T33        SUSAS Transcripts

LDC99S79        Switchboard-2 Phase II

LDC99S83        Tactical Speaker Identification Speech Corpus (TSID)

LDC99S84        TDT2 English Audio

LDC99T42        Treebank-3

LDC99S82        USC Marketplace Broadcast News Speech

LDC99T36        USC Marketplace Broadcast News Transcripts

1998

LDC98T31        1996 CSR HUB4 Language Model

LDC97S66        1996 English Broadcast News Dev and Eval (HUB4)

LDC97S44        1996 English Broadcast News Speech (HUB4)

LDC97T22        1996 English Broadcast News Transcripts (HUB4)

LDC98S71        1997 English Broadcast News Speech (HUB4)

LDC98T28        1997 English Broadcast News Transcripts (HUB4)

LDC98S73        1997 Mandarin Broadcast News Speech (HUB4-NE)

LDC98T24        1997 Mandarin Broadcast News Transcripts (HUB4-NE)

LDC98S74        1997 Spanish Broadcast News Speech (HUB4-NE)

LDC98T29        1997 Spanish Broadcast News Transcripts (HUB4-NE)

LDC98S76        1998 Speaker Recognition Benchmark

LDC98L21        COMLEX English Syntax Lexicon

LDC96T11        COMLEX Syntax Text Corpus Version 2.0

LDC95S23        CSR-III Speech

LDC95T6        CSR-III Text

LDC98S67        HTIMIT

LDC98S69        HUB5 Mandarin Telephone Speech Corpus

LDC98T26        HUB5 Mandarin Transcripts

LDC98S70        HUB5 Spanish Telephone Speech Corpus

LDC98T27        HUB5 Spanish Transcripts

LDC98T32        JURIS

LDC95S22        KING Speaker Verification

LDC98S68        LLHDB

LDC98T30        North American News Text Supplement

LDC98S75        Switchboard-2 Phase I

LDC98S72        Taiwanese Putonghua Speech and Transcripts

LDC98T25        TDT Pilot Study Corpus

LDC98S77        Voicemail Corpus Part I

LDC94S16        YOHO Speaker Verification

1997

LDC97S66        1996 English Broadcast News Dev and Eval (HUB4)

LDC97S44        1996 English Broadcast News Speech (HUB4)

LDC97T22        1996 English Broadcast News Transcripts (HUB4)

LDC96S61        1996 Speaker Recognition Benchmark

LDC94S14A        Air Traffic Control Complete

LDC96S36        Boston University Radio Speech Corpus

LDC96S46        CALLFRIEND American English-Non-Southern Dialect

LDC96S47        CALLFRIEND American English-Southern Dialect

LDC96S48        CALLFRIEND Canadian French

LDC96S49        CALLFRIEND Egyptian Arabic

LDC96S50        CALLFRIEND Farsi

LDC96S51        CALLFRIEND German

LDC96S52        CALLFRIEND Hindi

LDC96S53        CALLFRIEND Japanese

LDC96S54        CALLFRIEND Korean

LDC96S55        CALLFRIEND Mandarin Chinese-Mainland Dialect

LDC96S56        CALLFRIEND Mandarin Chinese-Taiwan Dialect

LDC96S57        CALLFRIEND Spanish-Caribbean Dialect

LDC96S58        CALLFRIEND Spanish-Non-Caribbean Dialect

LDC96S59        CALLFRIEND Tamil

LDC96S60        CALLFRIEND Vietnamese

LDC97L20        CALLHOME American English Lexicon (PRONLEX)

LDC97S42        CALLHOME American English Speech

LDC97T14        CALLHOME American English Transcripts

LDC97S45        CALLHOME Egyptian Arabic Speech

LDC97T19        CALLHOME Egyptian Arabic Transcripts

LDC97L18        CALLHOME German Lexicon

LDC97S43        CALLHOME German Speech

LDC97T15        CALLHOME German Transcripts

LDC96L17        CALLHOME Japanese Lexicon

LDC96S37        CALLHOME Japanese Speech

LDC96T18        CALLHOME Japanese Transcripts

LDC96L15        CALLHOME Mandarin Chinese Lexicon

LDC96S34        CALLHOME Mandarin Chinese Speech

LDC96T16        CALLHOME Mandarin Chinese Transcripts

LDC96L16        CALLHOME Spanish Lexicon

LDC96S35        CALLHOME Spanish Speech

LDC96T17        CALLHOME Spanish Transcripts

LDC94S13A        CSR-II (WSJ1) Complete

LDC94S13B        CSR-II (WSJ1) Sennheiser

LDC97T12        DSO Corpus of Sense-Tagged English

LDC99L22        Egyptian Colloquial Arabic Lexicon

LDC95T20        Hansard French/English

LDC96S64-1        JEIDA/JCSD-Channel 0 City Names

LDC96S64        JEIDA/JCSD-Channel 0 Complete

LDC96S64-2        JEIDA/JCSD-Channel 0 Control Words

LDC96S64-4        JEIDA/JCSD-Channel 0 Four Digit Sequences

LDC96S64-3        JEIDA/JCSD-Channel 0 Isolated Digits

LDC96S64-5        JEIDA/JCSD-Channel 0 Mono Syllables

LDC96S65-1        JEIDA/JCSD-Channel 1 City Names

LDC96S65        JEIDA/JCSD-Channel 1 Complete

LDC96S65-2        JEIDA/JCSD-Channel 1 Control Words

LDC96S65-4        JEIDA/JCSD-Channel 1 Four Digit Sequences

LDC96S65-3        JEIDA/JCSD-Channel 1 Isolated Digits

LDC96S65-5        JEIDA/JCSD-Channel 1 Mono Syllables

LDC95T13        Mandarin Chinese News Text

LDC95T21        North American News Text Corpus

LDC94S15        SPIDRE

LDC97S62        Switchboard-1 Release 2

LDC97S63        The CMU Kids Corpus

1996

LDC96S61        1996 Speaker Recognition Benchmark

LDC96S36        Boston University Radio Speech Corpus

LDC94S20        BRAMSHILL

LDC96S46        CALLFRIEND American English-Non-Southern Dialect

LDC96S47        CALLFRIEND American English-Southern Dialect

LDC96S48        CALLFRIEND Canadian French

LDC96S49        CALLFRIEND Egyptian Arabic

LDC96S50        CALLFRIEND Farsi

LDC96S51        CALLFRIEND German

LDC96S52        CALLFRIEND Hindi

LDC96S53        CALLFRIEND Japanese

LDC96S54        CALLFRIEND Korean

LDC96S55        CALLFRIEND Mandarin Chinese-Mainland Dialect

LDC96S56        CALLFRIEND Mandarin Chinese-Taiwan Dialect

LDC96S57        CALLFRIEND Spanish-Caribbean Dialect

LDC96S58        CALLFRIEND Spanish-Non-Caribbean Dialect

LDC96S59        CALLFRIEND Tamil

LDC96S60        CALLFRIEND Vietnamese

LDC97L20        CALLHOME American English Lexicon (PRONLEX)

LDC96L17        CALLHOME Japanese Lexicon

LDC96S37        CALLHOME Japanese Speech

LDC96T18        CALLHOME Japanese Transcripts

LDC96L15        CALLHOME Mandarin Chinese Lexicon

LDC96S34        CALLHOME Mandarin Chinese Speech

LDC96T16        CALLHOME Mandarin Chinese Transcripts

LDC96L16        CALLHOME Spanish Lexicon

LDC96S35        CALLHOME Spanish Speech

LDC96T17        CALLHOME Spanish Transcripts

LDC96L14        CELEX2

LDC98L21        COMLEX English Syntax Lexicon

LDC96T11        COMLEX Syntax Text Corpus Version 2.0

LDC93S6A        CSR-I (WSJ0) Complete

LDC93S6C        CSR-I (WSJ0) Other

LDC93S6B        CSR-I (WSJ0) Sennheiser

LDC96S33        CSR-IV HUB3

LDC96S31        CSR-IV HUB4

LDC96S30        CTIMIT

LDC96S38        DCIEM/HCRC

LDC95T11        European Language Newspaper Text

LDC96S32        FFMTIMIT

LDC96S29        Frontiers in Speech Processing 93

LDC96S40        Frontiers in Speech Processing 94

LDC95T20        Hansard French/English

LDC93S12        HCRC Map Task Corpus

LDC96S64-1        JEIDA/JCSD-Channel 0 City Names

LDC96S64        JEIDA/JCSD-Channel 0 Complete

LDC96S64-2        JEIDA/JCSD-Channel 0 Control Words

LDC96S64-4        JEIDA/JCSD-Channel 0 Four Digit Sequences

LDC96S64-3        JEIDA/JCSD-Channel 0 Isolated Digits

LDC96S64-5        JEIDA/JCSD-Channel 0 Mono Syllables

LDC96S65-1        JEIDA/JCSD-Channel 1 City Names

LDC96S65        JEIDA/JCSD-Channel 1 Complete

LDC96S65-2        JEIDA/JCSD-Channel 1 Control Words

LDC96S65-4        JEIDA/JCSD-Channel 1 Four Digit Sequences

LDC96S65-3        JEIDA/JCSD-Channel 1 Isolated Digits

LDC96S65-5        JEIDA/JCSD-Channel 1 Mono Syllables

LDC95T13        Mandarin Chinese News Text

LDC96T10        Message Understanding Conference (MUC) 6 Additional News Text

LDC95T21        North American News Text Corpus

LDC93S3A        Resource Management Complete Set 2.0

LDC93S3B        Resource Management RM1 2.0

LDC93S3C        Resource Management RM2 2.0

LDC96S39        RM Isolated and Spelled Word Data

LDC95T9        Spanish News Text

LDC96S41        VAHA (POLYPHONE II)

1995

LDC95S26        ATIS3 Test Data

LDC97L20        CALLHOME American English Lexicon (PRONLEX)

LDC96L14        CELEX2

LDC98L21        COMLEX English Syntax Lexicon

LDC95S23        CSR-III Speech

LDC95T6        CSR-III Text

LDC95T11        European Language Newspaper Text

LDC95T20        Hansard French/English

LDC95T8        Japanese Business News Text

LDC95S22        KING Speaker Verification

LDC95S28        LATINO-40 Spanish Read News

LDC95T13        Mandarin Chinese News Text

LDC95T21        North American News Text Corpus

LDC95S27        PhoneBook: NYNEX Isolated Words

LDC95T9        Spanish News Text

LDC95S25        TRAINS Spoken Dialog Corpus

LDC95T7        Treebank-2

LDC95S24        WSJCAM0 Cambridge Read News

1994

LDC94S14B        Air Traffic Control BOS

LDC94S14A        Air Traffic Control Complete

LDC94S14C        Air Traffic Control DCA

LDC94S14D        Air Traffic Control DFW

LDC94S19        ATIS3 Training Data

LDC94S20        BRAMSHILL

LDC97L20        CALLHOME American English Lexicon (PRONLEX)

LDC98L21        COMLEX English Syntax Lexicon

LDC94S13A        CSR-II (WSJ1) Complete

LDC94S13C        CSR-II (WSJ1) Other

LDC94S13B        CSR-II (WSJ1) Sennheiser

LDC94T5        ECI Multilingual Text

LDC94S21        MACROPHONE

LDC94S17        OGI Multilanguage Corpus

LDC94S18        OGI Spelled and Spoken Word

LDC94S15        SPIDRE

LDC94T4A        UN Parallel Text (Complete)

LDC94T4B-1        UN Parallel Text (English)

LDC94T4B-2        UN Parallel Text (French)

LDC94T4B-3        UN Parallel Text (Spanish)

LDC94S16        YOHO Speaker Verification

1993

LDC93T1        ACL/DCI

LDC93S4A        ATIS0 Complete

LDC93S4B        ATIS0 Pilot

LDC93S4B-2        ATIS0 Read

LDC93S4B-3        ATIS0 SD Read

LDC93S5        ATIS2

LDC93S6A        CSR-I (WSJ0) Complete

LDC93S6C        CSR-I (WSJ0) Other

LDC93S6B        CSR-I (WSJ0) Sennheiser

LDC93S12        HCRC Map Task Corpus

LDC93S2        NTIMIT

LDC93S3A        Resource Management Complete Set 2.0

LDC93S3B        Resource Management RM1 2.0

LDC93S3C        Resource Management RM2 2.0

LDC93S11        Road Rally

LDC93S8        Switchboard Credit Card

LDC97S62        Switchboard-1 Release 2

LDC93S9        TI 46-Word

LDC93S10        TIDIGITS

LDC93S1W        TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version)

LDC93S1        TIMIT Acoustic-Phonetic Continuous Speech Corpus

LDC93T3A        TIPSTER Complete

LDC93T3B        TIPSTER Volume 1

LDC93T3C        TIPSTER Volume 2

LDC93T3D        TIPSTER Volume 3 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值