MagnetDB 2024: A Longitudinal Torrent Discovery Dataset with IMDb-Matched Movies and TV Shows
Scott Seidenberger and Noah Pursell and Anindya Maiti

folder MagnetDB_2024 (5 files)
filefiles_sample.csv 25.58kB
filemagnetdb_public.sqlite3.tar.zst 27.67GB
filematchedVideos_sample.csv 563.39MB
fileREADME_MagnetDB_Metadata.pdf 299.70kB
filetorrents_sample.csv 131.13kB
Type: Dataset
Tags: Dataset, piracy, torrents

Bibtex:
@article{,
title= {MagnetDB 2024: A Longitudinal Torrent Discovery Dataset with IMDb-Matched Movies and TV Shows},
journal= {arXiv preprint arXiv:2501.09275},
author= {Scott Seidenberger and Noah Pursell and Anindya Maiti},
year= {2025},
url= {https://osf.io/9eh47/},
abstract= {BitTorrent remains a prominent channel for illicit distribution of copyrighted material, yet the supply side of such content remains understudied. We introduce MagnetDB, a longitudinal dataset of torrents discovered through the BitTorrent DHT between 2018 and 2024, containing more than 28.6 million torrents and metadata of more than 950 million files. While our primary focus is on enabling research based on the supply of pirated movies and TV shows, the dataset also encompasses other legitimate and illegitimate torrents. By applying IMDb-matching and annotation to movie and TV show torrents, MagnetDB facilitates detailed analyses of pirated content evolution in the BitTorrent network. Researchers can leverage MagnetDB to examine distribution trends, subcultural practices, and the gift economy within piracy ecosystems. Through its scale and temporal scope, MagnetDB presents a unique opportunity for investigating the broader dynamics of BitTorrent and advancing empirical knowledge on digital piracy.},
keywords= {Dataset, piracy, torrents},
terms= {},
license= {CC BY 4.0},
superseded= {}
}

Hosted by users:

Send Feedback