Name: catalog.archives.gov-lgbt
Creator: None
Published: 2025-02-24 20:37:08
License: https://academictorrents.com/nolicensespecified

catalog.archives.gov-lgbt (11 files)

steps.txt	0.80kB
tifs.tar.zst	984.02GB
search-results.tar.zst	4.38MB
search-result-urls.txt.zst	1.02kB
README	1.32kB
pdfs.tar.zst	114.94GB
keywords.txt	0.58kB
others.tar.zst	46.67MB
generate-urls.py	1.47kB
jpgs.tar.zst	328.99GB
download-urls.txt.zst	418.40kB

Type: Dataset

Tags: usaunited statesarchives.govnara

Metadata:

@article{,
title= {catalog.archives.gov-lgbt},
journal= {},
author= {},
year= {},
url= {},
abstract= {A partial mirror of catalog.archives.gov, filtered for the LGBTQ-related
keywords found in keywords.txt. The list of keywords was obtained from
catalog-links that were in turn manually collected from
https://www.archives.gov/research/lgbt

For each keyword in keywords.txt, we fire off a search and attempt to download
all metadata (folder "search-results") and attachments (folders "tifs",
"jpgs", "pdfs", "others").

Folders are packed as ZStandard-compressed tarballs to save space and to
reduce overhead in torrent metadata. All data unpacked is approximately 3 TB,
tifs being 2.6 TB of that.

Overview:

search-results.tar.zst contains all JSON metadata that would be available on
search results pages. This includes descriptions, authorship, year of each
record found, and a list of download URLs for PDFs etc. It's best to download
those files first to determine whether this dataset contains something
specific you need.

pdfs.tar.zst, tifs.tar.zst, jpgs.tar.zst, others.tar.zst contain the actual
downloads, segmented by file-type for compression purposes. download-urls.txt.zst contains the list of AWS S3 urls that were downloaded into those folders.

generate-urls.py was used to scrape the catalog for metadata. The detailed procedure for scraping is outlined in steps.txt

Data captured around 2025-02-23.},
keywords= {united states,usa,archives.gov,nara},
terms= {},
license= {},
superseded= {}
}

Citation:

catalog.archives.gov-lgbt. (2025). [Data set]. Academic Torrents. https://academictorrents.com/details/b67dfb8cc94a98c4dc5e4fd23201df7beb5c2a7c