ratarmount indexes for PMC OpenAccess subset

oa_bulk-ratarmount_indexes_compressed (18 files)
comm_use.0-9A-B.txt.tar.gz.index.sqlite.gz 18.99MB
comm_use.A-B.xml.tar.gz.index.sqlite.gz 22.96MB
comm_use.C-H.txt.tar.gz.index.sqlite.gz 23.29MB
comm_use.C-H.xml.tar.gz.index.sqlite.gz 28.76MB
comm_use.I-N.txt.tar.gz.index.sqlite.gz 23.96MB
comm_use.I-N.xml.tar.gz.index.sqlite.gz 30.19MB
comm_use.O-Z.txt.tar.gz.index.sqlite.gz 30.64MB
comm_use.O-Z.xml.tar.gz.index.sqlite.gz 45.13MB
mount.sh 1.02kB
non_comm_use.0-9A-B.txt.tar.gz.index.sqlite.gz 11.00MB
non_comm_use.A-B.xml.tar.gz.index.sqlite.gz 10.66MB
non_comm_use.C-H.txt.tar.gz.index.sqlite.gz 15.41MB
non_comm_use.C-H.xml.tar.gz.index.sqlite.gz 14.65MB
non_comm_use.I-N.txt.tar.gz.index.sqlite.gz 28.51MB
non_comm_use.I-N.xml.tar.gz.index.sqlite.gz 28.19MB
non_comm_use.O-Z.txt.tar.gz.index.sqlite.gz 17.31MB
non_comm_use.O-Z.xml.tar.gz.index.sqlite.gz 11.84MB
README.md 1.04kB
Type: Dataset
Tags: PMC, PubMed, ratarmount

title= {ratarmount indexes for PMC OpenAccess subset},
journal= {},
author= {rngadam@coderbunker.com},
year= {},
url= {},
abstract= {## the problem

PMC Open Access bulk article (commercial and non-commercial) is a hefty set of files
that weight in compressed at 79G and uncompressed at 388G.

Archive decompression time in itself can take hours.

A bittorrent mirror exists on:


## the solution

ratarmount (https://github.com/mxmlnkn/ratarmount), a python application, allows us to
use FUSE (through fusepy) to mount a compressed archive as a disk, allowing us randomly
access files in the archive as a disk without first decompression.

To achieve good performance, it creates an index (an sqlite database per archive).

This set of indexes still weight in at 1.4G uncompressed (345M compressed).

## usage

* decompress all indexes in the same directory you've downloaded oa_bulk
* install ratarmount
* use ratarmount to mount the oa_bulk archives on the disk

a sample script ```mount.sh``` is provided as an example

## distribution

we also use bittorrent to distribute the set of indexes. },
keywords= {PMC, PubMed, ratarmount},
terms= {},
license= {CC BY 4.0},
superseded= {}

10 day statistics (1 downloads)

Average Time 26 mins, 25 secs
Average Speed 228.08kB/s
Best Time 26 mins, 25 secs
Best Speed 228.08kB/s
Worst Time 26 mins, 25 secs
Worst Speed 228.08kB/s