The Pile An 800GB Dataset of Diverse Text for Language Modeling
EleutherAI

Info hash0d366035664fdf51cfbe9f733953ba325776e667
Last mirror activity457d,00:36:25 ago
Size772.89GB (772,891,257,239 bytes)
Added2021-03-01 01:37:09
Views516
Hits1112
ID4618
Typemulti
Downloaded465 time(s)
Uploaded bygravatar.com icon for user joecohen
FolderEleutherAI_ThePile_v1
Num files51 files
File list
[Hide list]
PathSize
README.txt0.10kB
pile/SHA256SUMS.txt2.78kB
pile/test.jsonl.zst460.25MB
pile/train/00.jsonl.zst15.24GB
pile/train/01.jsonl.zst15.21GB
pile/train/02.jsonl.zst15.21GB
pile/train/03.jsonl.zst15.19GB
pile/train/04.jsonl.zst15.19GB
pile/train/05.jsonl.zst15.21GB
pile/train/06.jsonl.zst15.26GB
pile/train/07.jsonl.zst15.31GB
pile/train/08.jsonl.zst15.23GB
pile/train/09.jsonl.zst15.22GB
pile/train/10.jsonl.zst15.23GB
pile/train/11.jsonl.zst15.22GB
pile/train/12.jsonl.zst15.26GB
pile/train/13.jsonl.zst15.21GB
pile/train/14.jsonl.zst15.22GB
pile/train/15.jsonl.zst15.28GB
pile/train/16.jsonl.zst15.27GB
pile/train/17.jsonl.zst15.31GB
pile/train/18.jsonl.zst15.31GB
pile/train/19.jsonl.zst15.28GB
pile/train/20.jsonl.zst15.21GB
pile/train/21.jsonl.zst15.31GB
pile/train/22.jsonl.zst15.30GB
pile/train/23.jsonl.zst15.29GB
pile/train/24.jsonl.zst15.19GB
pile/train/25.jsonl.zst15.20GB
pile/train/26.jsonl.zst15.20GB
pile/train/27.jsonl.zst15.22GB
pile/train/28.jsonl.zst15.22GB
pile/train/29.jsonl.zst15.22GB
pile/val.jsonl.zst470.91MB
pile_preliminary_components/2020-09-08-arxiv-extracts-nofallback-until-2007-068.tar.gz17.48GB
pile_preliminary_components/EuroParliamentProceedings_1996_2011.jsonl.zst1.48GB
pile_preliminary_components/FreeLaw_Opinions.jsonl.zst17.01GB
pile_preliminary_components/Literotica.jsonl.zst4.43GB
pile_preliminary_components/NIH_ExPORTER_awarded_grant_text.jsonl.zst630.78MB
pile_preliminary_components/PMC_extracts.tar.gz28.28GB
pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst6.90GB
pile_preliminary_components/PhilArchive.jsonl.zst797.71MB
pile_preliminary_components/books1.tar.gz2.40GB
pile_preliminary_components/books3.tar.gz39.52GB
pile_preliminary_components/github.tar113.35GB
pile_preliminary_components/hn.tar.gz706.52MB
pile_preliminary_components/openwebtext2.jsonl.zst.tar29.34GB
pile_preliminary_components/pile_uspto.tar11.79GB
pile_preliminary_components/stackexchange_dataset.tar36.80GB
pile_preliminary_components/ubuntu_irc_until_2020_9_1.jsonl.zst2.04GB
pile_preliminary_components/yt_subs.jsonl.zst1.78GB
Mirrors0 complete, 0 downloading = 0 mirror(s) total [Log in to see full list]


Send Feedback