wikipedia_bin (2 files)
wiki_text_sentence.bin |
6.29GB |
wiki_text_sentence.idx |
1.55GB |
Type: Dataset
Metadata:
Tags:
BERT; NLP;
Metadata:
@article{,
title= {Wikipedia Training Data for Megatron-LM},
journal= {},
author= {},
year= {},
url= {},
abstract= {A preprocessed dataset for https://github.com/NVIDIA/Megatron-LM training. Please see instructions in https://github.com/Lyken17/ML-Datasets for how to use it.
Note: the author does not own any copyrights of the data. },
keywords= {BERT; NLP;},
terms= {},
license= {},
superseded= {}
}
Citation:
Wikipedia Training Data for Megatron-LM. (2021). [Data set]. Academic Torrents. https://academictorrents.com/details/b6215a898a2a08b6061d23f2e4e1094121fb7082
wiki_text_sentence.bin