30M Factoid Question-Answer Corpus (30MQA)
Iulian Vlad Serban and Alberto García-Durán and Caglar Gulcehre and Sungjin Ahn and Sarath Chandar and Aaron Courville and Yoshua Bengio

30MQA (2 files)
30MQA_1.tar.gz 315.96MB
30MQA_2.tar.gz 213.39MB
Type: Dataset
Tags:

Bibtex:
@article{,
title= {30M Factoid Question-Answer Corpus (30MQA)},
keywords= {},
author= {Iulian Vlad Serban and Alberto García-Durán and Caglar Gulcehre and Sungjin Ahn and Sarath Chandar and Aaron Courville and Yoshua Bengio},
abstract= {The 30M Factoid Question-Answer Corpus consists of 30M natural language questions in English and their corresponding facts in the knowledge base Freebase.

The dataset is formatted as a text file, where each line contains:

```
    <subject> \t <relationship> \t <object> \t natural language question,
```
 
where <subject>, <relationship> and <object> are  the subject, relationship and object identifier in Freebase corresponding to the natural language question.

For a more detailed description, have a look at our paper:

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
http://arxiv.org/abs/1603.06807

Sample:

```
<http://rdf.freebase.com/ns/m.04whkz5>	www.freebase.com/book/written_work/subjects	<http://rdf.freebase.com/ns/m.01cj3p>	what is the book e about ?
<http://rdf.freebase.com/ns/m.0tp2p24>	www.freebase.com/music/release_track/release	<http://rdf.freebase.com/ns/m.0sjc7c1>	in what release does the release track cardiac arrest come from ?
<http://rdf.freebase.com/ns/m.04j0t75>	www.freebase.com/film/film/country	<http://rdf.freebase.com/ns/m.07ssc>	what country is the debt from ?
<http://rdf.freebase.com/ns/m.0ftqr>	www.freebase.com/music/producer/tracks_produced	<http://rdf.freebase.com/ns/m.0p600l>	what songs have nobuo uematsu produced ?
<http://rdf.freebase.com/ns/m.036p007>	www.freebase.com/music/release/producers	<http://rdf.freebase.com/ns/m.0677ng>	who produced eve-olution ?
<http://rdf.freebase.com/ns/m.0ms5mg>	www.freebase.com/music/recording/artist	<http://rdf.freebase.com/ns/m.0mjn2>	which artist recorded most of us are sad ?
```
},
terms= {},
license= {Creative Commons Attribution 3.0 Unported},
superseded= {},
url= {}
}

10 day statistics (13 downloads)

Average Time 1 hrs, 10 mins, 34 secs
Average Speed 125.02kB/s
Best Time 5 mins, 08 secs
Best Speed 1.72MB/s
Worst Time 3 hrs, 39 mins, 15 secs
Worst Speed 40.24kB/s
Report