Linguistic Data Consortium Corpora Collection - NLP Text and Speech Datasets

Linguistic_Data_Consortium_Corpora (106 files)
README.md 4.49kB
Linguistic_Data_Consortium_Corpora_meta.xml 1.12kB
Linguistic_Data_Consortium_Corpora_meta.sqlite 73.73kB
LDC97S45.tar.zst 1.54GB
LDC97S44.tar.zst 11.00GB
LDC97S43.tar.zst 1.27GB
LDC97S42.tar.zst 1.52GB
LDC97L42.tar.zst 1.52GB
LDC97L20.tar.zst 555.19kB
LDC97L18.tar.zst 3.09MB
LDC96T18.tar.zst 623.44kB
LDC96T17.tar.zst 521.50kB
LDC96T16.tar.zst 609.73kB
LDC96T10.tar.zst 184.24kB
LDC96S60.tar.zst 888.82MB
LDC96S59.tar.zst 781.27MB
LDC96S58.tar.zst 958.12MB
LDC96S57.tar.zst 1.01GB
LDC96S56.tar.zst 933.37MB
LDC96S55.tar.zst 880.38MB
LDC96S54.tar.zst 878.41MB
LDC96S53.tar.zst 865.44MB
LDC96S52.tar.zst 874.74MB
LDC96S51.tar.zst 897.81MB
LDC96S50.tar.zst 897.06MB
Too many files! Click here to view them all.
Type: Dataset
Tags: Dataset, nlp, natural language, corpus, speech, data, text, corpora, LDC, Linguistic Data Consortium

Bibtex:
@article{,
title= {Linguistic Data Consortium Corpora Collection - NLP Text and Speech Datasets},
journal= {},
author= {},
year= {},
url= {https://www.ldc.upenn.edu},
abstract= {# Lingustic Data Consortium Corpora

An assortment of datasets collected from [LDC](https://www.ldc.upenn.edu). Search the corpora number on the LDC website for more information about it.

- Archive.org URL: https://archive.org/details/Linguistic_Data_Consortium_Corpora

## Corpora

- LDC2000S86
- LDC2000S88
- LDC2000T44
- LDC2000T46
- LDC2000T47
- LDC2000T50
- LDC2000T52
- LDC2000T53
- LDC2001S15
- LDC2001T02
- LDC2001T14
- LDC2001T55
- LDC2001T57
- LDC2001T58
- LDC2002E17
- LDC2002E18
- LDC2002E32
- LDC2002E58
- LDC2002-27
- LDC2002-49
- LDC2002-49
- LDC2002S09
- LDC2002S10
- LDC2002S11
- LDC2002S13
- LDC2002S35
- LDC2002S56
- LDC2002T01
- LDC2002T03
- LDC2002T07
- LDC2002T31
- LDC2002T43
- LDC2003E01
- LDC2003E04
- LDC2003E05
- LDC2003E07
- LDC2003E08
- LDC2003E09
- LDC2003E11
- LDC2003E14
- LDC2003E25
- LDC2003-02
- LDC2003S01
- LDC2003T01
- LDC2003T02
- LDC2003T06
- LDC2003T07
- LDC2003T09
- LDC2003T11
- LDC2003T12
- LDC2003T13
- LDC2003T17
- LDC2003T18
- LDC2004E07
- LDC2004E08
- LDC2004E09
- LDC2004E11
- LDC2004E12
- LDC2004E13
- LDC2004E72
- LDC2004S02
- LDC2004S04
- LDC2004S11
- LDC2004S13
- LDC2004T01
- LDC2004T02
- LDC2004T04
- LDC2004T05
- LDC2004T07
- LDC2004T08
- LDC2004T09
- LDC2004T11
- LDC2004T12
- LDC2004T14
- LDC2004T15
- LDC2004T16
- LDC2004T17
- LDC2004T18
- LDC2004T19
- LDC2005E46
- LDC2005E47
- LDC2005E85
- LDC2005S13
- LDC2005S14
- LDC2005S26
- LDC2005T01
- LDC2005T02
- LDC2005T03
- LDC2005T05
- LDC2005T06
- LDC2005T07
- LDC2005T09
- LDC2005T10
- LDC2005T12
- LDC2005T13
- LDC2005T14
- LDC2005T16
- LDC2005T19
- LDC2005T20
- LDC2005T23
- LDC2005T28
- LDC2005T30
- LDC2005T32
- LDC2005T34
- LDC2005T35
- LDC2006E17
- LDC2006E24
- LDC2006E25
- LDC2006E26
- LDC2006E34
- LDC2006E36
- LDC2006E82
- LDC2006E86
- LDC2006E92
- LDC2006E93
- LDC2006E95
- LDC2006S29
- LDC2006S31
- LDC2006S35
- LDC2006S43
- LDC2006S45
- LDC2006T02
- LDC2006T04
- LDC2006T06
- LDC2006T08
- LDC2006T09
- LDC2006T10
- LDC2006T14
- LDC2006T18
- LDC2006T19
- LDC2007E59
- LDC2007E61
- LDC2007S03
- LDC2007S08
- LDC2007S10
- LDC2007S11
- LDC2007S12
- LDC2007T02
- LDC2007T03
- LDC2007T04
- LDC2007T08
- LDC2007T09
- LDC2007T21
- LDC2007T23
- LDC2007T24
- LDC2007T36
- LDC2008E39
- LDC2008E40
- LDC2008E41
- LDC2008E42
- LDC2008E62
- LDC2008-03
- LDC2008S05
- LDC2008T02
- LDC2008T05
- LDC2008T06
- LDC2008T08
- LDC2008T09
- LDC2008T18
- LDC2008T23
- LDC2008T25
- LDC2009S04
- LDC2009S05
- LDC2009T03
- LDC2009T05
- LDC2009T06
- LDC2009T08
- LDC2009T11
- LDC2009T12
- LDC2009T15
- LDC2009T23
- LDC2009T24
- LDC2009T26
- LDC2010E31
- LDC2010E82
- LDC2010S01
- LDC2010T01
- LDC2010T03
- LDC2010T04
- LDC2010T05
- LDC2010T06
- LDC2010T07
- LDC2010T08
- LDC2010T13
- LDC2010T21
- LDC2011S01
- LDC2011S04
- LDC2011S05
- LDC2011S06
- LDC2011S08
- LDC2011S09
- LDC2011S10
- LDC2011T03
- LDC2011T05
- LDC2011T07
- LDC2011T08
- LDC2011T09
- LDC2011T10
- LDC2011T12
- LDC2011T13
- LDC2012E102
- LDC2012E29
- LDC2012E34
- LDC2012S01
- LDC2012S02
- LDC2012T04
- LDC2012T04
- LDC2012T07
- LDC2012T13
- LDC2012T15
- LDC2012T21
- LDC2013E90
- LDC2013S02
- LDC2013S05
- LDC2013S07
- LDC2013T03
- LDC2013T04
- LDC2013T07
- LDC2013T09
- LDC2013T11
- LDC2013T12
- LDC2013T13
- LDC2013T15
- LDC2013T16
- LDC2013T17
- LDC2013T19
- LDC2014S05
- LDC2014S06
- LDC2014S07
- LDC2014T04
- LDC2014T09
- LDC2014T11
- LDC2014T12
- LDC2014T13
- LDC2014T15
- LDC2014T16
- LDC2014T17
- LDC2014T20
- LDC2014T23
- LDC2014T26
- LDC2015S01
- LDC2015S04
- LDC2015S04
- LDC2015S07
- LDC2015S11
- LDC2015S12
- LDC2015T01
- LDC2015T13
- LDC2015T16
- LDC2015T21
- LDC2015T22
- LDC2015T23
- LDC2015T24
- LDC2016S01
- LDC2016S07
- LDC2016T04
- LDC2016T05
- LDC2016T06
- LDC2016T09
- LDC2016T10
- LDC2016T15
- LDC2016T17
- LDC2016T25
- LDC2017S02
- LDC2017S07
- LDC2017S10
- LDC2017S14
- LDC2017S21
- LDC2017S24
- LDC2017T02
- LDC2017T04
- LDC2017T07
- LDC2017T10
- LDC2017T16
- LDC2018T03
- LDC2018T04
- LDC2018T15
- LDC2018T16
- LDC2018T19
- LDC2018T22
- LDC2018T23
- LDC2018T24
- LDC2019S07
- LDC2019S09
- LDC2019S12
- LDC2019S20
- LDC2019T02
- LDC2019T04
- LDC2019T05
- LDC2019T16
- LDC2020S01
- LDC2020T02
- LDC2020T22
- LDC2021T04
- LDC2021T11
- LDC2021T12
- LDC2021T15
- LDC93S1
- LDC93S10
- LDC93S3A
- LDC93S5
- LDC93S6A
- LDC93S7-T
- LDC93T1
- LDC93T3A
- LDC94S13A
- LDC94S14A
- LDC94S19
- LDC95S24
- LDC95S26
- LDC95T13
- LDC95T21
- LDC95T6
- LDC95T7
- LDC96-14
- LDC96-15
- LDC96-16
- LDC96-17
- LDC96S35
- LDC96S35
- LDC96S37
- LDC96S46
- LDC96S47
- LDC96S48
- LDC96S49
- LDC96S50
- LDC96S51
- LDC96S52
- LDC96S53
- LDC96S54
- LDC96S55
- LDC96S56
- LDC96S57
- LDC96S58
- LDC96S59
- LDC96S60
- LDC96T10
- LDC96T16
- LDC96T17
- LDC96T18
- LDC97-18
- LDC97-20
- LDC97-42
- LDC97S42
- LDC97S43
- LDC97S44
- LDC97S45
- LDC97S62
- LDC97S66
- LDC97S66
- LDC97T12
- LDC97T14
- LDC97T15
- LDC97T19
- LDC97T22
- LDC97T62
- LDC98S71
- LDC98S77
- LDC98T24
- LDC98T25
- LDC98T26
- LDC98T28
- LDC98T29
- LDC98T30
- LDC98T31
- LDC99-22
- LDC99T36
- LDC99T42
},
keywords= {LDC, Linguistic Data Consortium, NLP, Natural Language, Text, Speech, Corpora, Corpus, Data, Dataset},
terms= {},
license= {},
superseded= {}
}


Send Feedback