PPT Online
nyuuzyou

folder main (16 files)
filedata/train-00000-of-00015.parquet 201.65MB
filedata/train-00001-of-00015.parquet 207.60MB
filedata/train-00002-of-00015.parquet 230.08MB
filedata/train-00003-of-00015.parquet 216.15MB
filedata/train-00004-of-00015.parquet 201.78MB
filedata/train-00005-of-00015.parquet 234.70MB
filedata/train-00006-of-00015.parquet 257.54MB
filedata/train-00007-of-00015.parquet 233.57MB
filedata/train-00008-of-00015.parquet 229.98MB
filedata/train-00009-of-00015.parquet 205.83MB
filedata/train-00010-of-00015.parquet 203.11MB
filedata/train-00011-of-00015.parquet 193.10MB
filedata/train-00012-of-00015.parquet 208.23MB
filedata/train-00013-of-00015.parquet 191.64MB
filedata/train-00014-of-00015.parquet 38.32MB
fileREADME.md 1.54kB
Type: Dataset
Tags:

Bibtex:
@article{,
title= {PPT Online},
journal= {},
author= {nyuuzyou},
year= {},
url= {https://huggingface.co/datasets/nyuuzyou/pptonline},
abstract= {### Dataset Summary

This dataset contains metadata about 1,418,349 PowerPoint (.ppt) files hosted on the ppt-online.org platform. PPT Online is a service designed to display PowerPoint presentations. The dataset includes information such as presentation titles, categories, file sizes, and content snippets. The majority of the presentations are in Russian, Ukrainian, Belarusian, Kazakh, and English, but other languages are also present.

### Languages

The dataset is multilingual, with the primary languages being Russian, Ukrainian, Belarusian, Kazakh, and English. However, presentations in other languages are also included.

## Dataset Structure

### Data Fields

This dataset includes the following fields:

- `id`: Unique identifier for the presentation (integer)
- `title`: Title of the PowerPoint presentation (string)
- `category`: Category or topic of the presentation (string)
- `file_size`: Size of the PowerPoint file (string)
- `body_content`: Snippet or summary of the presentation content. Generated by a service, quite low quality (string)

### Data Splits

All examples are in a single split.},
keywords= {},
terms= {},
license= {},
superseded= {}
}


Send Feedback