oa_bulk-ratarmount_indexes_compressed (18 files)
comm_use.0-9A-B.txt.tar.gz.index.sqlite.gz |
18.99MB |
comm_use.A-B.xml.tar.gz.index.sqlite.gz |
22.96MB |
comm_use.C-H.txt.tar.gz.index.sqlite.gz |
23.29MB |
comm_use.C-H.xml.tar.gz.index.sqlite.gz |
28.76MB |
comm_use.I-N.txt.tar.gz.index.sqlite.gz |
23.96MB |
comm_use.I-N.xml.tar.gz.index.sqlite.gz |
30.19MB |
comm_use.O-Z.txt.tar.gz.index.sqlite.gz |
30.64MB |
comm_use.O-Z.xml.tar.gz.index.sqlite.gz |
45.13MB |
mount.sh |
1.02kB |
non_comm_use.0-9A-B.txt.tar.gz.index.sqlite.gz |
11.00MB |
non_comm_use.A-B.xml.tar.gz.index.sqlite.gz |
10.66MB |
non_comm_use.C-H.txt.tar.gz.index.sqlite.gz |
15.41MB |
non_comm_use.C-H.xml.tar.gz.index.sqlite.gz |
14.65MB |
non_comm_use.I-N.txt.tar.gz.index.sqlite.gz |
28.51MB |
non_comm_use.I-N.xml.tar.gz.index.sqlite.gz |
28.19MB |
non_comm_use.O-Z.txt.tar.gz.index.sqlite.gz |
17.31MB |
non_comm_use.O-Z.xml.tar.gz.index.sqlite.gz |
11.84MB |
README.md |
1.04kB |
Type: Dataset
Bibtex:
Tags:
Bibtex:
@article{,
title= {ratarmount indexes for PMC OpenAccess subset},
journal= {},
author= {rngadam@coderbunker.com},
year= {},
url= {},
abstract= {## the problem
PMC Open Access bulk article (commercial and non-commercial) is a hefty set of files
that weight in compressed at 79G and uncompressed at 388G.
Archive decompression time in itself can take hours.
A bittorrent mirror exists on:
https://academictorrents.com/details/06d6badd7d1b0cfee00081c28fddd5e15e106165
## the solution
ratarmount (https://github.com/mxmlnkn/ratarmount), a python application, allows us to
use FUSE (through fusepy) to mount a compressed archive as a disk, allowing us randomly
access files in the archive as a disk without first decompression.
To achieve good performance, it creates an index (an sqlite database per archive).
This set of indexes still weight in at 1.4G uncompressed (345M compressed).
## usage
* decompress all indexes in the same directory you've downloaded oa_bulk
* install ratarmount
* use ratarmount to mount the oa_bulk archives on the disk
a sample script ```mount.sh``` is provided as an example
## distribution
we also use bittorrent to distribute the set of indexes. },
keywords= {PMC, PubMed, ratarmount},
terms= {},
license= {CC BY 4.0},
superseded= {}
}
comm_use.0-9A-B.txt.tar.gz.index.sqlite.gz