main (16 files)
data/train-00000-of-00015.parquet |
201.65MB |
data/train-00001-of-00015.parquet |
207.60MB |
data/train-00002-of-00015.parquet |
230.08MB |
data/train-00003-of-00015.parquet |
216.15MB |
data/train-00004-of-00015.parquet |
201.78MB |
data/train-00005-of-00015.parquet |
234.70MB |
data/train-00006-of-00015.parquet |
257.54MB |
data/train-00007-of-00015.parquet |
233.57MB |
data/train-00008-of-00015.parquet |
229.98MB |
data/train-00009-of-00015.parquet |
205.83MB |
data/train-00010-of-00015.parquet |
203.11MB |
data/train-00011-of-00015.parquet |
193.10MB |
data/train-00012-of-00015.parquet |
208.23MB |
data/train-00013-of-00015.parquet |
191.64MB |
data/train-00014-of-00015.parquet |
38.32MB |
README.md |
1.54kB |
Type: Dataset
Bibtex:
Tags:
Bibtex:
@article{,
title= {PPT Online},
journal= {},
author= {nyuuzyou},
year= {},
url= {https://huggingface.co/datasets/nyuuzyou/pptonline},
abstract= {### Dataset Summary
This dataset contains metadata about 1,418,349 PowerPoint (.ppt) files hosted on the ppt-online.org platform. PPT Online is a service designed to display PowerPoint presentations. The dataset includes information such as presentation titles, categories, file sizes, and content snippets. The majority of the presentations are in Russian, Ukrainian, Belarusian, Kazakh, and English, but other languages are also present.
### Languages
The dataset is multilingual, with the primary languages being Russian, Ukrainian, Belarusian, Kazakh, and English. However, presentations in other languages are also included.
## Dataset Structure
### Data Fields
This dataset includes the following fields:
- `id`: Unique identifier for the presentation (integer)
- `title`: Title of the PowerPoint presentation (string)
- `category`: Category or topic of the presentation (string)
- `file_size`: Size of the PowerPoint file (string)
- `body_content`: Snippet or summary of the presentation content. Generated by a service, quite low quality (string)
### Data Splits
All examples are in a single split.},
keywords= {},
terms= {},
license= {},
superseded= {}
}
data/train-00000-of-00015.parquet