Linguistic Data Consortium Corpora Collection - NLP Text and Speech Datasets

folder Linguistic_Data_Consortium_Corpora (106 files)
fileREADME.md 4.49kB
fileLinguistic_Data_Consortium_Corpora_meta.xml 1.12kB
fileLinguistic_Data_Consortium_Corpora_meta.sqlite 73.73kB
fileLDC97S45.tar.zst 1.54GB
fileLDC97S44.tar.zst 11.00GB
fileLDC97S43.tar.zst 1.27GB
fileLDC97S42.tar.zst 1.52GB
fileLDC97L42.tar.zst 1.52GB
fileLDC97L20.tar.zst 555.19kB
fileLDC97L18.tar.zst 3.09MB
fileLDC96T18.tar.zst 623.44kB
fileLDC96T17.tar.zst 521.50kB
fileLDC96T16.tar.zst 609.73kB
fileLDC96T10.tar.zst 184.24kB
fileLDC96S60.tar.zst 888.82MB
fileLDC96S59.tar.zst 781.27MB
fileLDC96S58.tar.zst 958.12MB
fileLDC96S57.tar.zst 1.01GB
fileLDC96S56.tar.zst 933.37MB
fileLDC96S55.tar.zst 880.38MB
fileLDC96S54.tar.zst 878.41MB
fileLDC96S53.tar.zst 865.44MB
fileLDC96S52.tar.zst 874.74MB
fileLDC96S51.tar.zst 897.81MB
fileLDC96S50.tar.zst 897.06MB
fileLDC96S49.tar.zst 848.70MB
fileLDC96S48.tar.zst 916.00MB
fileLDC96S47.tar.zst 910.95MB
fileLDC96S46.tar.zst 956.81MB
fileLDC96S37.tar.zst 1.28GB
fileLDC96S35.tar.zst 1.09GB
fileLDC96S35-corrections.tar.zst 191.29MB
fileLDC96L17.tar.zst 2.52MB
fileLDC96L16.tar.zst 708.90kB
fileLDC96L15.tar.zst 530.38kB
fileLDC96L14.tar.zst 51.72MB
fileLDC95T7.tar.zst 134.96MB
fileLDC95T6.tar.zst 2.03GB
fileLDC95T21.tar.zst 786.89MB
fileLDC95T13.tar.zst 169.30MB
fileLDC95S26.tar.zst 282.31MB
fileLDC95S24.tar.zst 3.63GB
fileLDC94S19.tar.zst 1.04GB
fileLDC94S14A.tar.zst 3.38GB
fileLDC94S13A.tar.zst 17.38GB
fileLDC93T3A.tar.zst 751.00MB
fileLDC93T1.tar.zst 140.23MB
fileLDC93S7-T.tar.zst 53.94MB
fileLDC93S6A.tar.zst 8.64GB
Too many files! Click here to view them all.
Type: Dataset
Tags: Dataset, nlp, natural language, corpus, speech, data, text, LDC, corpora, Linguistic Data Consortium

Bibtex:
@article{,
title= {Linguistic Data Consortium Corpora Collection - NLP Text and Speech Datasets},
journal= {},
author= {},
year= {},
url= {https://www.ldc.upenn.edu},
abstract= {# Lingustic Data Consortium Corpora

An assortment of datasets collected from [LDC](https://www.ldc.upenn.edu). Search the corpora number on the LDC website for more information about it.

- Archive.org URL: https://archive.org/details/Linguistic_Data_Consortium_Corpora

## Corpora

- LDC2000S86
- LDC2000S88
- LDC2000T44
- LDC2000T46
- LDC2000T47
- LDC2000T50
- LDC2000T52
- LDC2000T53
- LDC2001S15
- LDC2001T02
- LDC2001T14
- LDC2001T55
- LDC2001T57
- LDC2001T58
- LDC2002E17
- LDC2002E18
- LDC2002E32
- LDC2002E58
- LDC2002-27
- LDC2002-49
- LDC2002-49
- LDC2002S09
- LDC2002S10
- LDC2002S11
- LDC2002S13
- LDC2002S35
- LDC2002S56
- LDC2002T01
- LDC2002T03
- LDC2002T07
- LDC2002T31
- LDC2002T43
- LDC2003E01
- LDC2003E04
- LDC2003E05
- LDC2003E07
- LDC2003E08
- LDC2003E09
- LDC2003E11
- LDC2003E14
- LDC2003E25
- LDC2003-02
- LDC2003S01
- LDC2003T01
- LDC2003T02
- LDC2003T06
- LDC2003T07
- LDC2003T09
- LDC2003T11
- LDC2003T12
- LDC2003T13
- LDC2003T17
- LDC2003T18
- LDC2004E07
- LDC2004E08
- LDC2004E09
- LDC2004E11
- LDC2004E12
- LDC2004E13
- LDC2004E72
- LDC2004S02
- LDC2004S04
- LDC2004S11
- LDC2004S13
- LDC2004T01
- LDC2004T02
- LDC2004T04
- LDC2004T05
- LDC2004T07
- LDC2004T08
- LDC2004T09
- LDC2004T11
- LDC2004T12
- LDC2004T14
- LDC2004T15
- LDC2004T16
- LDC2004T17
- LDC2004T18
- LDC2004T19
- LDC2005E46
- LDC2005E47
- LDC2005E85
- LDC2005S13
- LDC2005S14
- LDC2005S26
- LDC2005T01
- LDC2005T02
- LDC2005T03
- LDC2005T05
- LDC2005T06
- LDC2005T07
- LDC2005T09
- LDC2005T10
- LDC2005T12
- LDC2005T13
- LDC2005T14
- LDC2005T16
- LDC2005T19
- LDC2005T20
- LDC2005T23
- LDC2005T28
- LDC2005T30
- LDC2005T32
- LDC2005T34
- LDC2005T35
- LDC2006E17
- LDC2006E24
- LDC2006E25
- LDC2006E26
- LDC2006E34
- LDC2006E36
- LDC2006E82
- LDC2006E86
- LDC2006E92
- LDC2006E93
- LDC2006E95
- LDC2006S29
- LDC2006S31
- LDC2006S35
- LDC2006S43
- LDC2006S45
- LDC2006T02
- LDC2006T04
- LDC2006T06
- LDC2006T08
- LDC2006T09
- LDC2006T10
- LDC2006T14
- LDC2006T18
- LDC2006T19
- LDC2007E59
- LDC2007E61
- LDC2007S03
- LDC2007S08
- LDC2007S10
- LDC2007S11
- LDC2007S12
- LDC2007T02
- LDC2007T03
- LDC2007T04
- LDC2007T08
- LDC2007T09
- LDC2007T21
- LDC2007T23
- LDC2007T24
- LDC2007T36
- LDC2008E39
- LDC2008E40
- LDC2008E41
- LDC2008E42
- LDC2008E62
- LDC2008-03
- LDC2008S05
- LDC2008T02
- LDC2008T05
- LDC2008T06
- LDC2008T08
- LDC2008T09
- LDC2008T18
- LDC2008T23
- LDC2008T25
- LDC2009S04
- LDC2009S05
- LDC2009T03
- LDC2009T05
- LDC2009T06
- LDC2009T08
- LDC2009T11
- LDC2009T12
- LDC2009T15
- LDC2009T23
- LDC2009T24
- LDC2009T26
- LDC2010E31
- LDC2010E82
- LDC2010S01
- LDC2010T01
- LDC2010T03
- LDC2010T04
- LDC2010T05
- LDC2010T06
- LDC2010T07
- LDC2010T08
- LDC2010T13
- LDC2010T21
- LDC2011S01
- LDC2011S04
- LDC2011S05
- LDC2011S06
- LDC2011S08
- LDC2011S09
- LDC2011S10
- LDC2011T03
- LDC2011T05
- LDC2011T07
- LDC2011T08
- LDC2011T09
- LDC2011T10
- LDC2011T12
- LDC2011T13
- LDC2012E102
- LDC2012E29
- LDC2012E34
- LDC2012S01
- LDC2012S02
- LDC2012T04
- LDC2012T04
- LDC2012T07
- LDC2012T13
- LDC2012T15
- LDC2012T21
- LDC2013E90
- LDC2013S02
- LDC2013S05
- LDC2013S07
- LDC2013T03
- LDC2013T04
- LDC2013T07
- LDC2013T09
- LDC2013T11
- LDC2013T12
- LDC2013T13
- LDC2013T15
- LDC2013T16
- LDC2013T17
- LDC2013T19
- LDC2014S05
- LDC2014S06
- LDC2014S07
- LDC2014T04
- LDC2014T09
- LDC2014T11
- LDC2014T12
- LDC2014T13
- LDC2014T15
- LDC2014T16
- LDC2014T17
- LDC2014T20
- LDC2014T23
- LDC2014T26
- LDC2015S01
- LDC2015S04
- LDC2015S04
- LDC2015S07
- LDC2015S11
- LDC2015S12
- LDC2015T01
- LDC2015T13
- LDC2015T16
- LDC2015T21
- LDC2015T22
- LDC2015T23
- LDC2015T24
- LDC2016S01
- LDC2016S07
- LDC2016T04
- LDC2016T05
- LDC2016T06
- LDC2016T09
- LDC2016T10
- LDC2016T15
- LDC2016T17
- LDC2016T25
- LDC2017S02
- LDC2017S07
- LDC2017S10
- LDC2017S14
- LDC2017S21
- LDC2017S24
- LDC2017T02
- LDC2017T04
- LDC2017T07
- LDC2017T10
- LDC2017T16
- LDC2018T03
- LDC2018T04
- LDC2018T15
- LDC2018T16
- LDC2018T19
- LDC2018T22
- LDC2018T23
- LDC2018T24
- LDC2019S07
- LDC2019S09
- LDC2019S12
- LDC2019S20
- LDC2019T02
- LDC2019T04
- LDC2019T05
- LDC2019T16
- LDC2020S01
- LDC2020T02
- LDC2020T22
- LDC2021T04
- LDC2021T11
- LDC2021T12
- LDC2021T15
- LDC93S1
- LDC93S10
- LDC93S3A
- LDC93S5
- LDC93S6A
- LDC93S7-T
- LDC93T1
- LDC93T3A
- LDC94S13A
- LDC94S14A
- LDC94S19
- LDC95S24
- LDC95S26
- LDC95T13
- LDC95T21
- LDC95T6
- LDC95T7
- LDC96-14
- LDC96-15
- LDC96-16
- LDC96-17
- LDC96S35
- LDC96S35
- LDC96S37
- LDC96S46
- LDC96S47
- LDC96S48
- LDC96S49
- LDC96S50
- LDC96S51
- LDC96S52
- LDC96S53
- LDC96S54
- LDC96S55
- LDC96S56
- LDC96S57
- LDC96S58
- LDC96S59
- LDC96S60
- LDC96T10
- LDC96T16
- LDC96T17
- LDC96T18
- LDC97-18
- LDC97-20
- LDC97-42
- LDC97S42
- LDC97S43
- LDC97S44
- LDC97S45
- LDC97S62
- LDC97S66
- LDC97S66
- LDC97T12
- LDC97T14
- LDC97T15
- LDC97T19
- LDC97T22
- LDC97T62
- LDC98S71
- LDC98S77
- LDC98T24
- LDC98T25
- LDC98T26
- LDC98T28
- LDC98T29
- LDC98T30
- LDC98T31
- LDC99-22
- LDC99T36
- LDC99T42
},
keywords= {Dataset, nlp, natural language, corpus, speech, data, text, corpora, LDC, Linguistic Data Consortium},
terms= {},
license= {},
superseded= {}
}


Send Feedback