|
Synthetic Data for Text Localisation in Natural Images
|
15 |
2021-11-15 |
73.50GB |
476 | 11 |
2 |
|
Reading Text in the Wild with Convolutional Neural Networks
|
1 |
2021-11-12 |
10.68GB |
1,564 | 27 |
1 |
|
PMC Open Access Subset
|
16 |
2020-05-24 |
84.14GB |
89 | 7+ |
0 |
|
r/WritingPrompts, Text (2018)
|
1 |
2019-06-19 |
87.47MB |
191 | 5 |
0 |
|
OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized
|
395 |
2019-06-01 |
16.02GB |
132 | 4 |
1 |
|
Flickr8k Dataset
|
2 |
2019-03-09 |
1.12GB |
6,483 | 23+ |
0 |
|
Common Crawl corpus - training-parallel-commoncrawl.tgz (CS-EN, DE-EN, ES-EN, FR-EN, RU-EN)
|
1 |
2019-02-04 |
918.31MB |
72 | 3+ |
0 |
|
UN corpus - training-parallel-un.tgz (ES-EN, FR-EN)
|
1 |
2019-02-04 |
2.37GB |
35 | 2+ |
0 |
|
Europarl v7 - training-parallel-europarl-v7.tgz (CS-EN, DE-EN, ES-EN, FR-EN)
|
1 |
2019-02-04 |
657.63MB |
30 | 3+ |
0 |
|
Phishing corpus
|
4555 |
2019-01-02 |
37.48MB |
513 | 6+ |
0 |
|
30M Factoid Question-Answer Corpus (30MQA)
|
2 |
2018-11-29 |
529.34MB |
600 | 7+ |
0 |
|
Indiana University - Chest X-Rays (XML Reports)
|
1 |
2018-11-22 |
1.11MB |
4,237 | 16+ |
0 |
|
Yelp reviews - Polarity
|
1 |
2018-10-16 |
166.37MB |
311 | 2+ |
0 |
|
Yelp reviews - Full
|
1 |
2018-10-16 |
196.15MB |
310 | 2+ |
0 |
|
Sogou news
|
1 |
2018-10-16 |
384.27MB |
122 | 3+ |
0 |
|
DBPedia ontology
|
1 |
2018-10-16 |
68.34MB |
86 | 2+ |
0 |
|
Amazon reviews - Polarity
|
1 |
2018-10-16 |
688.34MB |
272 | 3+ |
0 |
|
Amazon reviews - Full
|
1 |
2018-10-16 |
643.70MB |
668 | 2+ |
0 |
|
AG News
|
1 |
2018-10-16 |
11.78MB |
190 | 3+ |
0 |
|
WMT 2015 French/English parallel texts
|
1 |
2018-10-16 |
2.60GB |
121 | 2+ |
0 |
|
Wikitext-2
|
1 |
2018-10-16 |
4.07MB |
81 | 3+ |
0 |
|
Wikitext-103
|
1 |
2018-10-16 |
190.20MB |
145 | 2+ |
0 |
|
IMDb Large Movie Review Dataset
|
1 |
2018-10-16 |
26.40MB |
689 | 6+ |
0 |
|
Microsoft Academic Graph - 2016/02/05
|
1 |
2016-12-25 |
28.94GB |
193 | 5+ |
0 |
|
MovieLens 20M Dataset
|
1 |
2016-12-16 |
198.70MB |
611 | 7+ |
0 |
|
Sentiment Labelled Sentences Data Set
|
1 |
2016-08-26 |
512.21kB |
390 | 5+ |
0 |
|
Online News Popularity Data Set
|
1 |
2016-02-11 |
7.48MB |
2,881 | 8+ |
0 |
|
Structured Web Data Extraction Dataset (SWDE)
|
1 |
2015-11-29 |
207.31MB |
2,356 | 5 |
0 |
|
SMS Spam Collection Data Set
|
2 |
2015-11-28 |
695.38kB |
248 | 12+ |
0 |
|
Enwiki Word2vec model 1000 Dimensions
|
1 |
2015-04-09 |
8.63GB |
3,318 | 8 |
0 |
|
Yale YouTube Video Text
|
1 |
2014-10-20 |
434.77MB |
1,003 | 6+ |
0 |
|
Lerman Twitter 2010 Dataset
|
3 |
2014-08-15 |
292.17MB |
2,698 | 15+ |
0 |