main (2 files)
data.parquet |
225.87MB |
README.md |
1.73kB |
Type: Dataset
Bibtex:
Tags:
Bibtex:
@article{,
title= {Russian QnA 333K},
journal= {},
author= {nyuuzyou},
year= {},
url= {https://huggingface.co/datasets/nyuuzyou/ru-QnA-333K},
abstract= {---
pretty_name: Russian QnA
size_categories:
- 100K<n<1M
task_categories:
- question-answering
- text-generation
- text-classification
annotations_creators:
- found
language:
- ru
multilinguality:
- monolingual
configs:
- config_name: default
data_files:
- split: train
path: data.parquet
default: true
license: other
---
# Dataset Card for Russian QnA
### Dataset Summary
This dataset contains a collection of questions and answers in Russian. The dataset includes questions across various categories with corresponding answers, ratings, and metadata.
### Languages
The dataset content is primarily in Russian:
- Russian (ru)
## Dataset Structure
### Data Files
- Single file containing all Q&A records: `data.parquet`
### Data Fields
Each record contains the following fields:
- `question_id`: Unique identifier for the question.
- `question_title`: Title/subject of the question.
- `question_description`: Extended description or body of the question.
- `question_images`: Array of image URLs associated with the question.
- `category`: Category/topic area of the question (e.g., "здоровье и медицина").
- `tags`: Array of tags associated with the question.
- `question_rating`: Rating/score of the question.
- `answers`: Array of answer objects, each containing:
- `answer_text`: Text content of the answer
- `answer_images`: Array of image URLs in the answer
- `answer_rating`: Rating/score of the answer
### Data Splits
The dataset contains a single split with all Q&A records:
| Split | Description | Number of Examples |
| :------ | :------------------------------- | -----------------: |
| `train` | All question-answer pairs | 333,029 |
},
keywords= {},
terms= {},
license= {},
superseded= {}
}
data.parquet