TypeName FilesAddedSizeDLs
Flickr8k Dataset 2 2019-03-09 1.12GB 16322+ 0
Common Crawl corpus - training-parallel-commoncrawl.tgz (CS-EN, DE-EN, ES-EN, FR-EN, RU-EN) 1 2019-02-04 918.31MB 218+ 0
UN corpus - training-parallel-un.tgz (ES-EN, FR-EN) 1 2019-02-04 2.37GB 106+ 0
Europarl v7 - training-parallel-europarl-v7.tgz (CS-EN, DE-EN, ES-EN, FR-EN) 1 2019-02-04 657.63MB 65+ 0
Phishing corpus 4555 2019-01-02 37.48MB 10597 0
30M Factoid Question-Answer Corpus (30MQA) 2 2018-11-29 529.34MB 5191+ 0
Indiana University - Chest X-Rays (XML Reports) 1 2018-11-22 1.11MB 3293+ 0
Yelp reviews - Polarity 1 2018-10-16 166.37MB 4989+ 0
Yelp reviews - Full 1 2018-10-16 196.15MB 18594+ 0
Sogou news 1 2018-10-16 384.27MB 4389+ 0
DBPedia ontology 1 2018-10-16 68.34MB 4588+ 0
Amazon reviews - Polarity 1 2018-10-16 688.34MB 8892+ 0
Amazon reviews - Full 1 2018-10-16 643.70MB 242113+ 0
AG News 1 2018-10-16 11.78MB 5787+ 0
WMT 2015 French/English parallel texts 1 2018-10-16 2.60GB 5091+ 0
Wikitext-2 1 2018-10-16 4.07MB 4390+ 0
Wikitext-103 1 2018-10-16 190.20MB 6091+ 0
IMDb Large Movie Review Dataset 1 2018-10-16 26.40MB 382105+ 0
Microsoft Academic Graph - 2016/02/05 1 2016-12-25 28.94GB 14589+ 0
MovieLens 20M Dataset 1 2016-12-16 198.70MB 37895+ 0
Sentiment Labelled Sentences Data Set 1 2016-08-26 512.21kB 30889+ 0
Online News Popularity Data Set 1 2016-02-11 7.48MB 2,13889+ 0
Structured Web Data Extraction Dataset 1 2015-11-29 207.31MB 2,20091+ 0
SMS Spam Collection Data Set 2 2015-11-28 695.38kB 14190+ 0
Enwiki Word2vec model 1000 Dimensions 1 2015-04-09 8.63GB 1,543101 1
Yale YouTube Video Text 1 2014-10-20 434.77MB 8789+ 0
Lerman Twitter 2010 Dataset 3 2014-08-15 292.17MB 2,14389+ 0