The torrent contains a single .tar file that contains multiple .tar files, one for each synset, which finally contain the image jpegs.
Ideally the torrent should contain the jpegs straight away, with no intermediate archives inside of it. This will reduce the amount of IO needed to start working with the data. This will also make it easier to keep seeding it, because then you won't need to keep both the tar for seeding and the untarred images for work.
I'll just leave it here for those who need to extract all the tar files (in place) after initial untar is done.
for file in *.tar; do
name="${file%.*}"
mkdir "$name"
tar -xf "$file" --directory "$name"
rm "$file"
done Last edited by alexburtnik at 2020-01-28 11:14:04 GMT
To expand on tailsu's point, hosting a compressed version of the dataset likely isn't doing much to reduce bandwidth given that jpegs are not very compressible. Furthermore, a torrent with the individual images (or at least archives for individual synsets) would allow for partial downloads of the dataset.
by tailsu at 2019-09-09 09:01:21 GMT
Ideally the torrent should contain the jpegs straight away, with no intermediate archives inside of it. This will reduce the amount of IO needed to start working with the data. This will also make it easier to keep seeding it, because then you won't need to keep both the tar for seeding and the untarred images for work.
by kellan at 2019-11-07 11:48:51 GMT
by alexburtnik at 2020-01-28 11:12:11 GMT
for file in *.tar; do
name="${file%.*}"
mkdir "$name"
tar -xf "$file" --directory "$name"
rm "$file"
done
Last edited by alexburtnik at 2020-01-28 11:14:04 GMT
by h at 2020-03-03 12:56:18 GMT
by daisyhaohao at 2020-09-24 07:05:41 GMT
by liuyx599 at 2021-01-21 03:45:58 GMT
by wrk226 at 2022-07-28 06:16:44 GMT
传到百度云上了,有需要的可以取用。
by erotemic at 2024-08-18 18:01:59 GMT
Add a comment