OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized
eukaryote31 and Joshua Peterson and Aaron Gokaslan and Vanya Cohen

data-owt (395 files)
owt262.npz 40.76MB
owt1.npz 40.44MB
owt2.npz 40.58MB
owt3.npz 40.61MB
owt4.npz 40.61MB
owt5.npz 40.61MB
owt6.npz 40.62MB
owt7.npz 40.40MB
owt8.npz 40.56MB
owt9.npz 40.58MB
owt10.npz 40.57MB
owt11.npz 40.62MB
owt12.npz 40.58MB
owt13.npz 40.49MB
owt14.npz 40.56MB
owt15.npz 40.56MB
owt16.npz 40.53MB
owt17.npz 40.58MB
owt18.npz 40.57MB
owt19.npz 40.56MB
owt20.npz 40.55MB
owt21.npz 40.50MB
owt22.npz 40.64MB
owt23.npz 40.53MB
owt24.npz 40.59MB
owt25.npz 40.55MB
owt26.npz 40.66MB
owt27.npz 40.54MB
owt28.npz 40.54MB
owt29.npz 40.51MB
owt30.npz 40.57MB
owt31.npz 40.60MB
owt32.npz 40.54MB
owt33.npz 40.42MB
owt34.npz 40.70MB
owt35.npz 40.65MB
owt36.npz 40.67MB
owt37.npz 40.41MB
owt38.npz 40.55MB
owt39.npz 40.56MB
owt40.npz 40.56MB
owt41.npz 40.58MB
owt42.npz 40.60MB
owt43.npz 40.51MB
owt44.npz 40.51MB
owt45.npz 40.28MB
owt46.npz 40.60MB
owt47.npz 40.52MB
owt48.npz 40.50MB
owt49.npz 40.63MB
owt50.npz 40.50MB
owt51.npz 40.65MB
owt52.npz 40.70MB
owt53.npz 40.58MB
owt54.npz 40.60MB
owt55.npz 40.58MB
owt56.npz 40.64MB
owt57.npz 40.57MB
owt58.npz 40.59MB
owt59.npz 40.59MB
owt60.npz 40.50MB
owt61.npz 40.56MB
owt62.npz 40.60MB
owt63.npz 40.65MB
owt64.npz 40.68MB
owt65.npz 40.52MB
owt66.npz 40.44MB
owt67.npz 40.60MB
owt68.npz 40.45MB
owt69.npz 40.57MB
owt70.npz 40.58MB
owt71.npz 40.56MB
owt72.npz 40.59MB
owt73.npz 40.52MB
owt74.npz 40.54MB
owt75.npz 40.56MB
owt76.npz 40.54MB
owt77.npz 40.50MB
owt78.npz 40.50MB
owt79.npz 40.56MB
owt80.npz 40.66MB
owt81.npz 40.55MB
owt82.npz 40.43MB
owt83.npz 40.63MB
owt84.npz 40.56MB
owt85.npz 40.53MB
owt86.npz 40.51MB
owt87.npz 40.54MB
owt88.npz 40.50MB
owt89.npz 40.43MB
owt90.npz 40.61MB
owt91.npz 40.69MB
owt92.npz 40.51MB
owt93.npz 40.61MB
owt94.npz 40.61MB
owt95.npz 40.54MB
owt96.npz 40.60MB
owt97.npz 40.59MB
owt98.npz 40.57MB
owt99.npz 40.58MB
owt100.npz 40.53MB
owt101.npz 40.59MB
owt102.npz 40.57MB
owt103.npz 40.66MB
owt104.npz 40.58MB
owt105.npz 40.60MB
owt106.npz 40.63MB
owt107.npz 40.64MB
owt108.npz 40.61MB
owt109.npz 40.54MB
owt110.npz 40.60MB
owt111.npz 40.54MB
owt112.npz 40.48MB
owt113.npz 40.48MB
owt114.npz 40.53MB
owt115.npz 40.52MB
owt116.npz 40.55MB
owt117.npz 40.51MB
owt118.npz 40.62MB
owt119.npz 40.44MB
owt120.npz 40.46MB
owt121.npz 40.60MB
owt122.npz 40.51MB
owt123.npz 40.56MB
owt124.npz 40.65MB
owt125.npz 40.50MB
owt126.npz 40.65MB
owt127.npz 40.60MB
owt128.npz 40.54MB
owt129.npz 40.59MB
owt130.npz 40.58MB
owt131.npz 40.46MB
owt132.npz 40.68MB
owt133.npz 40.56MB
owt134.npz 40.55MB
owt135.npz 40.50MB
owt136.npz 40.65MB
owt137.npz 40.49MB
owt138.npz 40.59MB
owt139.npz 40.47MB
owt140.npz 40.53MB
owt141.npz 40.54MB
owt142.npz 40.65MB
owt143.npz 40.58MB
owt144.npz 40.68MB
owt145.npz 40.68MB
owt146.npz 40.45MB
owt147.npz 40.61MB
owt148.npz 40.63MB
owt149.npz 40.44MB
owt150.npz 40.57MB
owt151.npz 40.56MB
owt152.npz 40.67MB
owt153.npz 40.55MB
owt154.npz 40.62MB
owt155.npz 40.64MB
owt156.npz 40.51MB
owt157.npz 40.68MB
owt158.npz 40.51MB
owt159.npz 40.65MB
owt160.npz 40.52MB
owt161.npz 40.56MB
owt162.npz 40.52MB
owt163.npz 40.54MB
owt164.npz 40.57MB
owt165.npz 40.52MB
owt166.npz 40.55MB
owt167.npz 40.63MB
owt168.npz 40.71MB
owt169.npz 40.58MB
owt170.npz 40.47MB
owt171.npz 40.54MB
owt172.npz 40.52MB
owt173.npz 40.57MB
owt174.npz 40.56MB
owt175.npz 40.59MB
owt176.npz 40.59MB
owt177.npz 40.61MB
owt178.npz 40.52MB
owt179.npz 40.46MB
owt180.npz 40.51MB
owt181.npz 40.57MB
owt182.npz 40.52MB
owt183.npz 40.54MB
owt184.npz 40.65MB
owt185.npz 40.46MB
owt186.npz 40.50MB
owt187.npz 40.48MB
owt188.npz 40.59MB
owt189.npz 40.62MB
owt190.npz 40.59MB
owt191.npz 40.61MB
owt192.npz 40.50MB
owt193.npz 40.69MB
owt194.npz 40.49MB
owt195.npz 40.41MB
owt196.npz 40.69MB
owt197.npz 40.58MB
owt198.npz 40.45MB
Too many files! Click here to view them all.
Type: Dataset

title= {OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized},
journal= {},
author= {eukaryote31 and Joshua Peterson and Aaron Gokaslan and Vanya Cohen},
year= {},
url= {},
abstract= {Code by eukaryote31 and Joshua Peterson: and

Scraped by Aaron Gokaslan and Vanya Cohen:

Tokenized by eukaryote31},
keywords= {},
terms= {},
license= {},
superseded= {}

Hosted by users:

10 day statistics (12 downloads)

Average Time 5 mins, 00 secs
Average Speed 53.40MB/s
Best Time 4 mins, 59 secs
Best Speed 53.59MB/s
Worst Time 5 mins, 03 secs
Worst Speed 52.88MB/s