Fandom.com Community Database Dumps Dataset
nyuuzyou

folder main (26 files)
fileREADME.md 2.16kB
filefandom_25.parquet 250.43MB
filefandom_24.parquet 250.85MB
filefandom_23.parquet 251.93MB
filefandom_22.parquet 250.47MB
filefandom_21.parquet 248.87MB
filefandom_20.parquet 247.68MB
filefandom_19.parquet 247.21MB
filefandom_18.parquet 249.77MB
filefandom_17.parquet 248.20MB
filefandom_16.parquet 246.68MB
filefandom_15.parquet 247.91MB
filefandom_14.parquet 250.01MB
filefandom_13.parquet 245.79MB
filefandom_12.parquet 249.68MB
filefandom_11.parquet 249.52MB
filefandom_10.parquet 249.11MB
filefandom_09.parquet 244.70MB
filefandom_08.parquet 250.38MB
filefandom_07.parquet 248.41MB
filefandom_06.parquet 249.04MB
filefandom_05.parquet 246.37MB
filefandom_04.parquet 250.50MB
filefandom_03.parquet 249.66MB
filefandom_02.parquet 252.00MB
filefandom_01.parquet 249.48MB
Type: Dataset
Tags:

Bibtex:
@article{,
title= {Fandom.com Community Database Dumps Dataset},
journal= {},
author= {nyuuzyou},
year= {},
url= {},
abstract= {# Dataset Card for Fandom.com Community Database Dumps

### Dataset Summary
This dataset contains 7,040,984 current pages from all available [Fandom.com community wiki dumps](https://community.fandom.com/wiki/Help:Database_download) as of February 18, 2025. The dataset was created by processing the "Current pages" database dumps from all available Fandom.com wikis. These dumps contain only the current versions of pages without edit history and includes article text, metadata, and structural information across multiple languages.

### Languages
The dataset is multilingual, covering [40+ languages](https://community.fandom.com/wiki/Help:Language_codes).

## Dataset Structure

### Data Fields
This dataset includes the following fields:
- `id`: Unique identifier for the article (string)
- `title`: Title of the article (string)
- `text`: Main content of the article (string)
- `metadata`: Dictionary containing:
  - `templates`: List of templates used in the article
  - `categories`: List of categories the article belongs to
  - `wikilinks`: List of internal wiki links and their text
  - `external_links`: List of external links
  - `sections`: List of section titles and their levels

### Data Splits
All examples are in a single split.

## Additional Information

### License
This dataset inherits the licenses from the source Fandom communities, which use Creative Commons Attribution-ShareAlike 3.0 (CC-BY-SA 3.0).

To learn more about CC-BY-SA 3.0, visit: https://creativecommons.org/licenses/by-sa/3.0/},
keywords= {},
terms= {},
license= {CC-BY-SA 3.0},
superseded= {}
}


Send Feedback