<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:academictorrents="http://academictorrents.com/" version="2.0">
<channel>
<title>Joe's Recommended Mirror List - Academic Torrents</title>
<description>collection curated by joecohen</description>
<link>https://academictorrents.com/collection/joes-recommended-mirror-list</link>
<item>
<title>CAMUS Cardiac Acquisitions for Multi-structure Ultrasound Segmentation (Dataset)</title>
<description>@article{,
title= {CAMUS Cardiac Acquisitions for Multi-structure Ultrasound Segmentation},
keywords= {},
author= {},
abstract= {The goal of this project is to provide all the materials to the community to resolve the problem of echocardiographic image segmentation and volume estimation from 2D ultrasound sequences (both two and four-chamber views). To this aim, the following solutions were set-up
introduction of the largest publicly-available and fully-annotated dataset for 2D echocardiographic assessment (to our knowledge). The CAMUS dataset, containing 2D apical four-chamber and two-chamber view sequences acquired from 500 patients, is made available for download

# Dataset properties

The overall CAMUS dataset consists of clinical exams from 500 patients, acquired at the University Hospital of St Etienne (France) and included in this study within the regulation set by the local ethical committee of the hospital after full anonymization. The acquisitions were optimized to perform left ventricle ejection fraction measurements. In order to enforce clinical realism, neither prerequisite nor data selection have been performed. Consequently,

- some cases were difficult to trace;

- the dataset involves a wide variability of acquisition settings;

- for some patients, parts of the wall were not visible in the images;

for some cases, the probe orientation recommendation to acquire a rigorous four-chambers view was simply impossible to follow and a five-chambers view was acquired instead. This produced a highly heterogeneous dataset, both in terms of image quality and pathological cases, which is typical of daily clinical practice data.

The dataset has been made available to the community HERE. The dataset comprises : i) a training set of 450 patients along with the corresponding manual references based on the analysis of one clinical expert; ii) a testing set composed of 50 new patients. The raw input images are provided through the raw/mhd file format.

# Study population

Half of the dataset population has a left ventricle ejection fraction lower than 45%, thus being considered at pathological risk (beyond the uncertainty of the measurement). Also, 19% of the images have a poor quality (based on the opinion of one expert), indicating that for this subgroup the localization of the left ventricle endocarium and left ventricle epicardium as well as the estimation of clinical indices are not considered clinically accurate and workable. In classical analysis, poor quality images are usually removed from the dataset because of their clinical uselessness. Therefore, those data were not involved in this project during the computation of the different metrics but were used to study their influence as part of the training and validation sets for deep learning techniques.

# Involved systems

The full dataset was acquired from GE Vivid E95 ultrasound scanners (GE Vingmed Ultrasound, Horten Norway), with a GE M5S probe (GE Healthcare, US). No additional protocol than the one used in clinical routine was put in place. For each patient, 2D apical four-chamber and two-chamber view sequences were exported from EchoPAC analysis software (GE Vingmed Ultrasound, Horten, Norway). These standard cardiac views were chosen for this study to enable the estimation of left ventricle ejection fraction values based on the Simpson’s biplane method of discs. Each exported sequence corresponds to a set of B-mode images expressed in polar coordinates. The same interpolation procedure was used to express all sequences in Cartesian coordinates with a unique grid resolution, i.e. λ/2 = 0.3 mm along the x-axis (axis parallel to the probe) and λ/4 = 0.15 mm along the z-axis (axis perpendicular to the probe), where λ corresponds to the wavelength of the ultrasound probe. At least one full cardiac cycle was acquired for each patient in each view, allowing manual annotation of cardiac structures at ED and ES.

This work has published to IEEE TMI journal. You must cite this paper for any use of the CAMUS database

```
S. Leclerc, E. Smistad, J. Pedrosa, A. Ostvik, et al.
"Deep Learning for Segmentation using an Open Large-Scale Dataset in 2D Echocardiography" in IEEE Transactions on Medical Imaging, vol. 38, no. 9, pp. 2198-2210, Sept. 2019.
```




https://i.imgur.com/aVBYSWH.jpg},
terms= {},
license= {},
superseded= {},
url= {https://www.creatis.insa-lyon.fr/Challenge/camus/}
}

</description>
<link>https://academictorrents.com/download/ae545c1e3ce045c33942f89e67f618a6439104a6</link>
</item>
<item>
<title>HMC-QU echocardiography ultrasound recordings (Dataset)</title>
<description>@article{,
title= {HMC-QU echocardiography ultrasound recordings},
keywords= {ultrasound},
author= {},
abstract= {The HMC-QU benchmark dataset is created by the collaboration between Hamad Medical Corporation (HMC), Tampere University, and Qatar University. The usage of data has been approved by the local ethics board of HMC Hospital in February 2019. The dataset includes a collection of apical 4-chamber (A4C) and apical 2-chamber (A2C) view 2D echocardiography recordings obtained during the years 2018 and 2019. The echocardiography recordings are acquired via devices from different vendors that are Phillips and GE Vivid (GE-Health-USA) ultrasound machines. The temporal resolution (frame rate per second) of the echocardiography recordings is 25 fps. The spatial resolution varies from 422x636 to 768x1024 pixels. The dataset can be utilized for both myocardial infarction (heart attack) detection and left ventricle wall segmentation purposes.

# Detection of Myocardial Infarction

HMC-QU is the first dataset that is shared with the research community serving myocardial infarction (MI) detection on the left ventricle wall of the heart. The recordings are from over 10,000 echos performed in a year including more than 800 cases admitted with acute ST-elevation MI. The patients with MI were treated with coronary angiogram/angioplasty after the diagnosis of acute MI with electrocardiography and cardiac enzymes evidence. The patients had echocardiography recordings obtained within 24 hours of admission or in some cases before they underwent coronary angioplasty. The subjects not diagnosed with MI underwent a required health check and investigation for other reasons in the hospital.

The ground-truth labels are provided for each myocardial segment illustrated in Figure 1 as non-MI and MI, where the MI term indicates any sign of regional wall motion abnormality, whereas the subjects without regional wall motion abnormality are assigned to non-MI. The one cardiac cycle frames are predefined for each recording. End-diastole and end-systole frames are defined according to the electrocardiography (ECG) recordings of the patients. For the patients without ECG recordings, the cardiac cycle is defined according to the frames, where the left ventricle area is the largest and smallest.

1.1. Apical 4-chamber

HMC-QU dataset consists of 162 A4C view 2D echocardiography recordings. The A4C view recordings belong to 93 MI patients (all first-time and acute MI) and 69 non-MI subjects.

1.2. Apical 2-chamber

The dataset consists of 130 A2C view 2D echocardiography recordings that belong to 68 MI patients and 62 non-MI subjects.

# Segmentation of the Left Ventricle Wall

A subset of 109 A4C view echocardiography recordings has their corresponding ground-truth segmentation masks for the whole left ventricle wall at each frame for one cardiac cycle. This subset includes 72 MI patients and 37 non-MI subjects. The size of the ground-truth segmentation masks is 224x224 in order to have suitable input dimensions for many state-of-the-art deep network topologies.

If you use the HMC-QU dataset in your research, please consider citing the publications below:

[P1] A. Degerli, S. Kiranyaz, T. Hamid, R. Mazhar, and M. Gabbouj, “Early Myocardial Infarction Detection over Multi-view Echocardiography,” arXiv preprint arXiv:2111.05790v2, 2021, https://doi.org/10.48550/arXiv.2111.05790.

[P2] A. Degerli, M. Zabihi, S. Kiranyaz, T. Hamid, R. Mazhar, R. Hamila, and M. Gabbouj, "Early Detection of Myocardial Infarction in Low-Quality Echocardiography," in IEEE Access, vol. 9, pp. 34442-34453, 2021, https://doi.org/10.1109/ACCESS.2021.3059595.

[P3] S. Kiranyaz, A. Degerli, T. Hamid, R. Mazhar, R. E. F. Ahmed, R. Abouhasera, M. Zabihi, J. Malik, R. Hamila, and M. Gabbouj, "Left Ventricular Wall Motion Estimation by Active Polynomials for Acute Myocardial Infarction Detection," in IEEE Access, vol. 8, pp. 210301-210317, 2020, https://doi.org/10.1109/ACCESS.2020.3038743.

https://i.imgur.com/QKsdWPb.jpg},
terms= {},
license= {https://creativecommons.org/licenses/by-nc-sa/3.0/igo/},
superseded= {},
url= {https://www.kaggle.com/datasets/aysendegerli/hmcqu-dataset}
}

</description>
<link>https://academictorrents.com/download/11832dbd0b58c1dd9305a10373c9536872dd31af</link>
</item>
<item>
<title>STructured Analysis of the Retina (Dataset)</title>
<description>@article{,
title= {STructured Analysis of the Retina},
keywords= {},
author= {},
abstract= {The STARE (STructured Analysis of the Retina) Project was conceived and initiated in 1975 by Michael Goldbaum, M.D., at the University of California, San Diego. It was funded by the U.S. National Institutes of Health . During its history, over thirty people contributed to the project, with backgrounds ranging from medicine to science to engineering. Images and clinical data were provided by the Shiley Eye Center at the University of California, San Diego, and by the Veterans Administration Medical Center in San Diego.
I had the pleasure of working on the project from 1996-2004. The contents of this web page reflect my contributions. Please contact me if you have any questions or requests concerning our data or code. Please contact Dr. Goldbaum if you have any requests concerning the current state of the project.

# A brief overview of the project

An ophthalmologist is a medical doctor that specializes in the structure, function, and diseases of the human eye. During a clinical examination, an opthalmologist notes findings that are visible in the eyes of the subject. The ophthalmologist then uses these findings to reason about the health of the subject. For instance, a patient may exhibit discoloration of the optic nerve, or a narrowing of the blood vessels in the retina. An opthalmologist uses this information to diagnose the patient, as having for instance Coats' disease or a central retinal artery occlusion.
A common procedure during an examination is retinal imaging. An optical camera is used to see through the pupil of the eye to the rear inner surface of the eyeball. A picture is taken showing the optic nerve, fovea, surrounding vessels, and the retinal layer. The opthalmologist can then reference this image while considering any observed findings.

This research concerns a system to automatically diagnose diseases of the human eye. The system takes as input information observable in a retinal image. This information is formulated to mimic the findings that an ophthalmologist would note during a clinical examination. The main output of the system is a diagnosis formulated to mimic the conclusion that an ophthalmologist would reach about the health of the subject.

Our approach breaks the problem into two components. The first component concerns automatically processing a retinal image to denote the important findings. The second component concerns automatically reasoning about the findings to determine a diagnosis. Additional outputs include detailed measurements of the anatomical structures and lesions visible in the retinal image. These measurements are useful for tracking disease severity and the evaluation of treatment progress over time. By collecting a database of measurements for a large number of people, the STARE project could support clinical population studies and intern training.


https://i.imgur.com/rMBvdYq.jpg

# Papers

A lot has been published on this project by many people; these are my two most relevant papers:

A. Hoover, V. Kouznetsova and M. Goldbaum, "Locating Blood Vessels in Retinal Images by Piece-wise Threhsold Probing of a Matched Filter Response", IEEE Transactions on Medical Imaging , vol. 19 no. 3, pp. 203-210, March 2000.

A. Hoover and M. Goldbaum, "Locating the optic nerve in a retinal image using the fuzzy convergence of the blood vessels", IEEE Transactions on Medical Imaging , vol. 22 no. 8, pp. 951-958, August 2003.
},
terms= {},
license= {},
superseded= {},
url= {https://cecas.clemson.edu/~ahoover/stare/}
}

</description>
<link>https://academictorrents.com/download/e4554cd63400dc13b74477efe98032c10757c269</link>
</item>
<item>
<title>Data of the White Matter Hyperintensity (WMH) Segmentation Challenge (Dataset)</title>
<description>@article{,
title= {Data of the White Matter Hyperintensity (WMH) Segmentation Challenge},
keywords= {},
author= {Kuijf, Hugo and Biesbroek, Matthijs and de Bresser, Jeroen and Heinen, Rutger and Chen, Christopher and van der Flier, Wiesje and Barkhof and Viergever, Max and Biessels, Geert Jan},
abstract= {Data of the WMH Segmentation Challenge, including the training data, test data, manual annotations, and additional manual annotations. 

Contents:
- readme.pdf
- training: contains all training data that was originally released
- test: contains all test data
- additional_annotations: contains additional manual annotations of two extra observers

Code: https://github.com/hjkuijf/wmhchallenge

https://wmh.isi.uu.nl/

https://i.imgur.com/RJjPBbP.png},
terms= {},
license= {http://creativecommons.org/licenses/by-nc/4.0},
superseded= {},
url= {https://dataverse.nl/dataset.xhtml?persistentId=doi:10.34894/AECRSD}
}

</description>
<link>https://academictorrents.com/download/a6d90ae5a9ff4cc8184f122048495fd6bd18d6ba</link>
</item>
<item>
<title>TotalSegmentator CT Dataset (Dataset)</title>
<description>@article{,
title= {TotalSegmentator CT Dataset},
keywords= {segmenter, segmentation, computed tomography, segment},
author= {Department of Research and Analysis at University Hospital Basel.},
abstract= {https://i.imgur.com/u63xva0.png

In 1204 CT images we segmented 104 anatomical structures (27 organs, 59 bones, 10 muscles, 8 vessels) covering a majority of relevant classes for most use cases. The CT images were randomly sampled from clinical routine, thus representing a real world dataset which generalizes to clinical application. The dataset contains a wide range of different pathologies, scanners, sequences and institutions.


```
s0720/segmentations/portal_vein_and_splenic_vein.nii.gz187.74kB
s0720/segmentations/pancreas.nii.gz45.25kB
s0720/segmentations/lung_upper_lobe_right.nii.gz218.92kB
s0720/segmentations/lung_upper_lobe_left.nii.gz230.82kB
s0720/segmentations/lung_middle_lobe_right.nii.gz201.18kB
s0720/segmentations/lung_lower_lobe_right.nii.gz240.63kB
s0720/segmentations/lung_lower_lobe_left.nii.gz239.49kB
s0720/segmentations/liver.nii.gz273.08kB
s0720/segmentations/kidney_right.nii.gz198.91kB
s0720/segmentations/kidney_left.nii.gz197.82kB
s0720/segmentations/inferior_vena_cava.nii.gz48.43kB
s0720/segmentations/iliopsoas_right.nii.gz59.12kB
s0720/segmentations/iliopsoas_left.nii.gz59.75kB
s0720/segmentations/iliac_vena_right.nii.gz188.90kB
s0720/segmentations/iliac_vena_left.nii.gz189.66kB
s0720/segmentations/iliac_artery_right.nii.gz186.75kB
s0720/segmentations/iliac_artery_left.nii.gz186.60kB
s0720/segmentations/humerus_right.nii.gz41.96kB
s0720/segmentations/humerus_left.nii.gz43.13kB
s0720/segmentations/hip_right.nii.gz223.33kB
s0720/segmentations/hip_left.nii.gz223.06kB
s0720/segmentations/heart_ventricle_right.nii.gz48.07kB
s0720/segmentations/heart_ventricle_left.nii.gz45.10kB
s0720/segmentations/heart_myocardium.nii.gz49.17kB
s0720/segmentations/heart_atrium_right.nii.gz44.41kB
s0720/segmentations/heart_atrium_left.nii.gz43.02kB
s0720/segmentations/gluteus_minimus_right.nii.gz46.65kB
s0720/segmentations/gluteus_minimus_left.nii.gz45.95kB
s0720/segmentations/gluteus_medius_right.nii.gz53.75kB
s0720/segmentations/gluteus_medius_left.nii.gz52.68kB
s0720/segmentations/gluteus_maximus_right.nii.gz58.02kB
s0720/segmentations/gluteus_maximus_left.nii.gz56.20kB
s0720/segmentations/gallbladder.nii.gz42.20kB
s0720/segmentations/femur_right.nii.gz192.93kB
s0720/segmentations/femur_left.nii.gz193.47kB
s0720/segmentations/face.nii.gz183.15kB
s0720/segmentations/esophagus.nii.gz188.93kB
s0720/segmentations/duodenum.nii.gz189.53kB
s0720/segmentations/colon.nii.gz239.38kB
s0720/segmentations/clavicula_right.nii.gz42.92kB
s0720/segmentations/clavicula_left.nii.gz42.50kB
s0720/segmentations/brain.nii.gz183.15kB
s0720/segmentations/autochthon_right.nii.gz62.97kB
s0720/segmentations/autochthon_left.nii.gz63.75kB
s0720/segmentations/aorta.nii.gz202.39kB
s0720/segmentations/adrenal_gland_right.nii.gz184.50kB
s0720/segmentations/adrenal_gland_left.nii.gz184.35kB
s0720/ct.nii.gz
```


https://arxiv.org/abs/2208.05868

https://zenodo.org/record/6802614},
terms= {},
license= {https://creativecommons.org/licenses/by/4.0},
superseded= {},
url= {https://arxiv.org/abs/2208.05868}
}

</description>
<link>https://academictorrents.com/download/337819f0e83a1c1ac1b7262385609dad5d485abf</link>
</item>
<item>
<title>INbreast: toward a full-field digital mammographic database (Dataset)</title>
<description>@article{,
title= {INbreast: toward a full-field digital mammographic database},
keywords= {},
author= {Inês C Moreira and Igor Amaral and Inês Domingues and António Cardoso and Maria João Cardoso and Jaime S Cardoso},
abstract= {Rationale and objectives: Computer-aided detection and diagnosis (CAD) systems have been developed in the past two decades to assist radiologists in the detection and diagnosis of lesions seen on breast imaging exams, thus providing a second opinion. Mammographic databases play an important role in the development of algorithms aiming at the detection and diagnosis of mammary lesions. However, available databases often do not take into consideration all the requirements needed for research and study purposes. This article aims to present and detail a new mammographic database.

Materials and methods: Images were acquired at a breast center located in a university hospital (Centro Hospitalar de S. João [CHSJ], Breast Centre, Porto) with the permission of the Portuguese National Committee of Data Protection and Hospital's Ethics Committee. MammoNovation Siemens full-field digital mammography, with a solid-state detector of amorphous selenium was used.

Results: The new database-INbreast-has a total of 115 cases (410 images) from which 90 cases are from women with both breasts affected (four images per case) and 25 cases are from mastectomy patients (two images per case). Several types of lesions (masses, calcifications, asymmetries, and distortions) were included. Accurate contours made by specialists are also provided in XML format.

Conclusion: The strengths of the actually presented database-INbreast-relies on the fact that it was built with full-field digital mammograms (in opposition to digitized mammograms), it presents a wide variability of cases, and is made publicly available together with precise annotations. We believe that this database can be a reference for future works centered or related to breast cancer imaging.




https://i.imgur.com/3bWtH38.png},
terms= {},
license= {},
superseded= {},
url= {https://pubmed.ncbi.nlm.nih.gov/22078258/}
}

</description>
<link>https://academictorrents.com/download/ce1ecade37814701ac95193a910a3c6917ea43b3</link>
</item>
<item>
<title>The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions (Dataset)</title>
<description>@article{,
title= {The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions},
keywords= {},
author= {Philipp Tschandl},
abstract= {Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc).

More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (follow_up), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal). The dataset includes lesions with multiple images, which can be tracked by the lesion_id-column within the HAM10000_metadata file.

Due to upload size limitations, images are stored in two files:

- HAM10000_images_part1.zip (5000 JPEG files)
- HAM10000_images_part2.zip (5015 JPEG files)

# Additional data for evaluation purposes

The HAM10000 dataset served as the training set for the ISIC 2018 challenge (Task 3). The test-set images are available herein as ISIC2018_Task3_Test_Images.zip (1511 images), the official validation-set is available through the challenge website https://challenge2018.isic-archive.com/. The ISIC-Archive also provides a "Live challenge" submission site for continuous evaluation of automated classifiers on the official validation- and test-set.

# Comparison to physicians

Test-set evaluations of the ISIC 2018 challenge were compared to physicians on an international scale, where the majority of challenge participants outperformed expert readers: Tschandl P. et al., Lancet Oncol 2019

# Human-computer collaboration

The test-set images were also used in a study comparing different methods and scenarios of human-computer collaboration: Tschandl P. et al., Nature Medicine 2020

Following corresponding metadata is available herein:

- ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.csv: Human ratings for Test images with and without interaction with a ResNet34 CNN (Malignancy Probability, Multi-Class probability, CBIR) or Human-Crowd Multi-Class probabilities. This is data was collected for and analyzed in Tschandl P. et al., Nature Medicine 2020, therefore please refer to this publication when using the data.

- HAM10000_segmentations_lesion_tschandl.zip: To evaluate regions of CNN activations in Tschandl P. et al., Nature Medicine 2020 (please refer to this publication when using the data), a single dermatologist (Tschandl P) created binary segmentation masks for all 10015 images from the HAM10000 dataset. Masks were initialized with the segmentation network as described by Tschandl et al., Computers in Biology and Medicine 2019, and following verified, corrected or replaced via the free-hand selection tool in FIJI.

# Related Publication 

Tschandl, P., Rosendahl, C. &amp; Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 (2018). doi: 10.1038/sdata.2018.161},
terms= {},
license= {https://creativecommons.org/licenses/by-nc/4.0/},
superseded= {},
url= {https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T}
}

</description>
<link>https://academictorrents.com/download/dc3188ee1ce7e2d2254113111b406c484101ba65</link>
</item>
<item>
<title>The Oxford-IIIT Pet Dataset (Dataset)</title>
<description>@article{,
title= {The Oxford-IIIT Pet Dataset},
journal= {},
author= {Omkar M Parkhi and Andrea Vedaldi and Andrew Zisserman and C. V. Jawahar},
year= {},
url= {https://www.robots.ox.ac.uk/~vgg/data/pets/},
abstract= {We have created a 37 category pet dataset with roughly 200 images for each class. The images have a large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation.},
keywords= {},
terms= {The dataset is available to download for commercial/research purposes under a Creative Commons Attribution-ShareAlike 4.0 International License. The copyright remains with the original owners of the images.},
license= {Creative Commons Attribution-ShareAlike 4.0 International License},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/b18bbd9ba03d50b0f7f479acc9f4228a408cecc1</link>
</item>
<item>
<title>Reddit comments/submissions 2005-06 to 2022-06 (Dataset)</title>
<description>@article{,
title= {Reddit comments/submissions 2005-06 to 2022-06},
journal= {},
author= {stuck_in_the_matrix and Watchful1},
year= {},
url= {},
abstract= {Reddit comments and submissions from 2005-06 to 2022-06 collected by pushshift which can be found here https://files.pushshift.io/reddit/

These are zstandard compressed ndjson files. Example python scripts for parsing the data can be found here https://github.com/Watchful1/PushshiftDumps},
keywords= {reddit},
terms= {},
license= {},
superseded= {https://academictorrents.com/details/ba051999301b109eab37d16f027b3f49ade2de13}
}

</description>
<link>https://academictorrents.com/download/0e1813622b3f31570cfe9a6ad3ee8dabffdb8eb6</link>
</item>
<item>
<title>Synthetic Data for Text Localisation in Natural Images (Dataset)</title>
<description>@inproceedings{gupta16,
author= {Ankush Gupta and Andrea Vedaldi and Andrew Zisserman},
title= {Synthetic Data for Text Localisation in Natural Images},
booktitle= {IEEE Conference on Computer Vision and Pattern Recognition},
year= {2016},
abstract= {This is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout.

The dataset consists of *800 thousand* images with approximately *8 million* synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes.},
keywords= {},
terms= {You (the "Researcher"), have requested permission to use the SynthText in the Wild database (the "Database") at the University of Oxford. In exchange for such permission, the Researcher hereby agrees to the following terms and conditions:

1. Researcher shall use the Database only for non-commercial* research and educational purposes.

2. University of Oxford makes no representations or warranties regarding the Database, including but not limited to warranties of non-infringement or fitness for a particular purpose.

3. Researcher accepts full responsibility for his or her use of the Database and shall defend and indemnify University of Oxford, including their employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Database, including but not limited to Researcher's use of any copies of copyrighted images that he or she may create from the Database.

4. Researcher may provide research associates and colleagues with access to the Database provided that they first agree to be bound by these terms and conditions.

5. University of Oxford reservers the right to terminate Researcher's access to the Database at any time.

6. If Researcher is employed by a for-profit, commercial entity*, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.

*  For commercial applications and licensing, contact Roy Azoulay at roy.azoulay@innovation.ox.ac.uk},
license= {},
superseded= {},
url= {https://www.robots.ox.ac.uk/~vgg/data/scenetext/}
}

</description>
<link>https://academictorrents.com/download/2dba9518166cbd141534cbf381aa3e99a087e83c</link>
</item>
<item>
<title>Reading Text in the Wild with Convolutional Neural Networks (Dataset)</title>
<description>@article{jaderberg16,
author= {Max Jaderberg and Karen Simonyan and Andrea Vedaldi and Andrew Zisserman},
title= {Reading Text in the Wild with Convolutional Neural Networks},
journal= {International Journal of Computer Vision},
number= {1},
volume= {116},
pages= {1--20},
month= {jan},
year= {2016},
abstract= {The exact data used to train our deep convolutional neural networks (see our [research page](http://www.robots.ox.ac.uk/~vgg/research/text/)) is included in this torrent.

This is synthetically generated dataset which we found sufficient for training text recognition on real-world images

![Synthetic Data Engine processt](https://i.imgur.com/cqmgbUa.png)

This dataset consists of *9 million images* covering *90k English words*, and includes the training, validation and test splits used in our work.},
keywords= {},
terms= {},
license= {},
superseded= {},
url= {https://www.robots.ox.ac.uk/~vgg/data/text/}
}

</description>
<link>https://academictorrents.com/download/3d0b4f09080703d2a9c6be50715b46389fdb3af1</link>
</item>
<item>
<title>COCO 2017 Resized to 256x256 (Dataset)</title>
<description>@article{,
title= {COCO 2017 Resized to 256x256},
keywords= {},
author= {},
abstract= {COCO: Common Objects in Context

Resized to 256x245},
terms= {},
license= {},
superseded= {},
url= {http://cocodataset.org/}
}

</description>
<link>https://academictorrents.com/download/eea5a532dd69de7ff93d5d9c579eac55a41cb700</link>
</item>
<item>
<title>Ukrainian Open Speech To Text Dataset 4.2 ~1200 hours (Dataset)</title>
<description>@article{,
title= {Ukrainian Open Speech To Text Dataset 4.2 ~1200 hours},
journal= {},
author= {Community Speech Recognition from Ukraine},
year= {},
url= {},
abstract= {Speech Recognition for Ukrainian 🇺🇦
The aim of this repository is to collect information and datasets for speech recognition in Ukrainian.

Get in touch with us in our Telegram group: https://t.me/speech_recognition_uk

Datasets
Compiled dataset from different open sources + Companies + Community = 188.31GB / ~1200 hours 💪},
keywords= {Ukrainian Open Speech To Text Dataset STT},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/fcf8bb60c59e9eb583df003d54ed61776650beb8</link>
</item>
<item>
<title>Vggface2: A dataset for recognising faces across pose and age (Dataset)</title>
<description>@inproceedings{cao2018vggface2,
title= {Vggface2: A dataset for recognising faces across pose and age},
author= {Cao, Qiong and Shen, Li and Xie, Weidi and Parkhi, Omkar M and Zisserman, Andrew},
booktitle= {2018 13th IEEE international conference on automatic face \&amp; gesture recognition (FG 2018)},
pages= {67--74},
year= {2018},
organization= {IEEE},
abstract= {In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimise the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS-Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on the IJB-A and IJB-B face recognition benchmarks, exceeding the previous state-of-the-art by a large margin. The dataset and models are publicly available.

Please make sure to pay attention to the License information for using the dataset for Commercial/Research purposes (Terms of Use) available on http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/.},
keywords= {image, Face, Face Verification, In the Wild, Vision},
terms= {VGG2 provides loosely cropped faces in separated files to download for training and testing. More information and links for download can be found on http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/data_infor.html. You will need to create an account to be able to download the files.

Here is some information regarding VGG2 dataset:

    Number of identities: 9131 (8631 identities for training, 500 identities for testing)

    More than 3.3 million images in the wild

    Almost 362 image samples per person

If you use this dataset:

Please make sure to pay attention to the License information for using the dataset for Commercial/Research purposes (Terms of Use) available on http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/.

Please make sure to cite the paper:

Q. Cao, L. Shen, W. Xie, O. M. Parkhi, A. Zisserman, VGGFace2: A Dataset for Recognizing Face across Pose and Age. International Conference on Automatic Face and Gesture Recognition, 2018.

keywords: Vision, Image, Face, Face Verification, In the Wild},
license= {},
superseded= {},
url= {http://www.robots.ox.ac.uk/~vgg/data/vgg_face2/}
}

</description>
<link>https://academictorrents.com/download/535113b8395832f09121bc53ac85d7bc8ef6fa5b</link>
</item>
<item>
<title>Breast Ultrasound Images Dataset (Dataset BUSI) (Dataset)</title>
<description>@article{,
title= {Breast Ultrasound Images Dataset (Dataset BUSI)},
keywords= {},
author= {},
abstract= {The data collected at baseline include breast ultrasound images among women in ages between 25 and 75 years old. This data was collected in 2018. The number of patients is 600 female patients. The dataset consists of 780 images with an average image size of 500*500 pixels. The images are in PNG format. The ground truth images are presented with original images. The images are categorized into three classes, which are normal, benign, and malignant.


If you use this dataset, please cite:
Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data in Brief. 2020 Feb;28:104863. DOI: 10.1016/j.dib.2019.104863.


| Subject area               | Medicine and Dentistry                                                                                                                                                             |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| More specific subject area | Radiology and Imaging                                                                                                                                                              |
| Type of data               | Images and mask images                                                                                                                                                             |
| How data was acquired      | LOGIQ E9 ultrasound and LOGIQ E9 Agile ultrasound system                                                                                                                           |
| Data format                | PNG                                                                                                                                                                                |
| Experimental factors       | All images are classified as normal, benign and malignant                                                                                                                          |
| Experimental features      | When medical images are used for training deep learning models, they provide fast and accurate results in classification, detection, and segmentation of breast cancer.            |
| Data source location       | Baheya Hospital for Early Detection &amp; Treatment of Women's Cancer, Cairo, Egypt.                                                                                                   |
| Data accessibility         | https://scholar.cu.edu.eg/?q=afahmy/pages/dataset                                                                                                                                  |
| Related research article   | 1. Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled and Aly Fahmy, Deep Learning Approaches for Data Augmentation and Classification of Breast Masses using Ultrasound Images [1] |



https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6906728/


https://i.imgur.com/WV1Tfb7.png},
terms= {},
license= {},
superseded= {},
url= {https://scholar.cu.edu.eg/?q=afahmy/pages/dataset}
}

</description>
<link>https://academictorrents.com/download/d0b7b7ae40610bbeaea385aeb51658f527c86a16</link>
</item>
<item>
<title>NASA Astronomy Picture of the Day Archive (7800 images, 2011) (Dataset)</title>
<description>@article{,
title= {NASA Astronomy Picture of the Day Archive (7800 images, 2011)},
journal= {},
author= {NASA},
year= {2011},
url= {https://apod.nasa.gov/},
abstract= {Archive of over 7800 images from apod.nasa.gov, originally organized in 2011},
keywords= {NASA, images, space, esa, astronomy, archive, photos},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/5f755e078ee9195b8ae0b3336710e6ce92ef3251</link>
</item>
<item>
<title>10 years of Dukascopy Forex Tick Data (2008-2019) (Dataset)</title>
<description>@article{,
title= {10 years of Dukascopy Forex Tick Data (2008-2019)},
journal= {},
author= {Justin Timperio},
year= {},
url= {https://www.driftinginrecursion.com/post/dukascopy_opensource_data/},
abstract= {Data collected and formatted by Justin Timperio:

"In my exploration of world of big data and I became curious about tick data. Tick data is extremely granular and provides a great challenge for those looking to work on their optimization skills due to its size. Unfortunately, market data is almost always behind a pay wall or de-sampled to the point of uselessness. After discovering the Dukascopy api, I knew I wanted to make this data available for all in a more accessible format."

Total Line Count: 8,495,770,706
Total Data Points: 33,983,082,824
Total Decompressed Size: 501 GB},
keywords= {finance, forex, exchange, market, economy, economics},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/8baee145786f4311b66bea5d13ef30eedce04a24</link>
</item>
<item>
<title>115 paintings from the Hermitage museum, high-resolution, JPEG (Dataset)</title>
<description>@article{,
title= {115 paintings from the Hermitage museum, high-resolution, JPEG},
journal= {},
author= {Hermitage Museum},
year= {},
url= {},
abstract= {115 paintings from the Hermitage museum, high-resolution, JPEG

All images are public domain.},
keywords= {art,paintings,high resolution},
terms= {},
license= {Public domain},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/0ef42919a5688ea60f7174ccf899a91774508b48</link>
</item>
<item>
<title>TB Portal Tuberculosis Chest X-ray dataset for Belarus (Dataset)</title>
<description>@article{,
title= {TB Portal Tuberculosis Chest X-ray dataset for Belarus},
keywords= {},
author= {},
abstract= {This is a tuberculosis Chest X-ray dataset containing patients who are resistant to conventional tuberculosis treatment. Data is provided in raw format as available in https://tbportals.niaid.nih.gov. The dataset mainly comes from the population of Belarus - in total over 1000 tuberculosis cases are provided.

Credits to:
TB Portals Program, Office of Cyber Infrastructure and Computational Biology (OCICB), National Institute of Allergy and Infectious Diseases (NIAID).},
terms= {},
license= {},
superseded= {},
url= {https://www.kaggle.com/raddar/drug-resistant-tuberculosis-xrays}
}

</description>
<link>https://academictorrents.com/download/509f986b456b6fce04c15f9d1de22cd4ccb2c4b7</link>
</item>
<item>
<title>UT Zappos50K (Version 2.1) (Dataset)</title>
<description>@article{,
title= {UT Zappos50K (Version 2.1)},
keywords= {},
author= {Aron Yu and Kristen Grauman},
abstract= {UT Zappos50K (UT-Zap50K) is a large shoe dataset consisting of 50,025 catalog images collected from Zappos.com. The images are divided into 4 major categories — shoes, sandals, slippers, and boots — followed by functional types and individual brands. The shoes are centered on a white background and pictured in the same orientation for convenient analysis.

This dataset is created in the context of an online shopping task, where users pay special attentions to fine-grained visual differences. For instance, it is more likely that a shopper is deciding between two pairs of similar men's running shoes instead of between a woman's high heel and a man's slipper. GIST and LAB color features are provided. In addition, each image has 8 associated meta-data (gender, materials, etc.) labels that are used to filter the shoes on Zappos.com.

https://i.imgur.com/RoVL6qr.jpg

# Citation

This dataset is for academic, non-commercial use only. If you use this dataset in a publication, please cite the following papers:

A. Yu and K. Grauman. "Fine-Grained Visual Comparisons with Local Learning". In CVPR, 2014.
[paper] [supp] [poster] [bibtex] [project page]

@InProceedings{finegrained,
  author = {A. Yu and K. Grauman},
  title = {Fine-Grained Visual Comparisons with Local Learning},
  booktitle = {Computer Vision and Pattern Recognition (CVPR)},
  month = {Jun},
  year = {2014}
}

A. Yu and K. Grauman. "Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images". In ICCV, 2017.
[paper] [supp] [poster] [bibtex] [project page]},
terms= {},
license= {},
superseded= {},
url= {http://vision.cs.utexas.edu/projects/finegrained/utzap50k/}
}

</description>
<link>https://academictorrents.com/download/3b3cb58f4ccafc6320d06d00f0862a4ba923b510</link>
</item>
<item>
<title>PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification (Dataset)</title>
<description>@article{,
title= {PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification},
keywords= {},
author= {Gamper, Jevgenij and Koohbanani, Navid Alemi and Benet, Ksenija and Khuram, Ali and Rajpoot, Nasir},
abstract= {https://i.imgur.com/iYlXSCm.png


Semi automatically generated nuclei instance segmentation and classification dataset with exhaustive nuclei labels across 19 different tissue types. The dataset consists of 481 visual fields, of which 312 are randomly sampled from more than 20K whole slide images at different magnifications, from multiple data sources. In total the dataset contains 205,343 labeled nuclei, each with an instance segmentation mask. Models trained on pannuke can aid in whole slide image tissue type segmentation, and generalise to new tissues. PanNuke demonstrates one of the first succesfully semi-automatically generated datasets.

## citation

```
@inproceedings{gamper2019pannuke,
  title={PanNuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification},
  author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benet, Ksenija and Khuram, Ali and Rajpoot, Nasir},
  booktitle={European Congress on Digital Pathology},
  pages={11--19},
  year={2019},
  organization={Springer}
}
@article{gamper2020pannuke,
  title={PanNuke Dataset Extension, Insights and Baselines},
  author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Graham, Simon and Jahanifar, Mostafa and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir},
  journal={arXiv preprint arXiv:2003.10778},
  year={2020}
}
```

https://i.imgur.com/T4ogyHR.png},
terms= {},
license= {http://creativecommons.org/licenses/by-nc-sa/4.0/},
superseded= {},
url= {https://jgamper.github.io/PanNukeDataset/}
}

</description>
<link>https://academictorrents.com/download/99f2c7b57b95500711e33f2ee4d14c9fd7c7366c</link>
</item>
<item>
<title>Whale Shark ID Dataset (Dataset)</title>
<description>@article{,
title= {Whale Shark ID Dataset},
journal= {},
author= {Wild Me},
year= {2020},
url= {https://www.wildme.org},
abstract= {Our released whale shark (Rhincodon typus) data set represents a collaborative effort based on the data collection and population modeling efforts conducted at Ningaloo Marine Park in Western Australia from 1995-2008 (Holmberg et al. 2008, 2009). Photos (7888) and metadata from 2441 whale shark encounters were collected from 464 individual contributors, especially from the original research of Brad Norman and from members of the local whale shark tourism industry who sight these animals annually from April-June. Images were annotated with bounding boxes around each visible whale shark and viewpoints labeled (e.g., left, right, etc.). A total of 543 individual whale sharks were identified by their unique spot patterning using first computer-assisted spot pattern recognition (Arzoumanian et al. 2005) and then manual review and confirmation.  A total of 7,693 named sightings were exported.

The dataset is released in the Microsoft COCO format (https://cocodataset.org/) and therefore uses flat image folders with associated YAML metadata files. We have collapsed the entire dataset into a single "train" label and have left "val" and "test" empty; we do this as an invitation to researchers to experiment with their own novel approaches for dealing with the unbalanced and chaotic distribution on the number of sightings per individual.  All of the images in the dataset have been resized to have a maximum linear dimension of 3,000 pixels.  The metadata for all animal sightings is defined by an axis-aligned bounding box via and includes information on the rotation of the box (theta), the viewpoint of the animal, a species (category) ID, a source image ID, an individual string ID name, and other miscellaneous values.  The temporal ordering of the images, and an anonymized ID for the original photographer, can be determined from the metadata for each image.

For research or press contact, please direct all correspondence to Wild Me at info@wildme.org.  Wild Me (https://www.wildme.org) is a registered 501(c)(3) not-for-profit based in Portland, Oregon, USA and brings state-of-the-art computer vision tools to ecology researchers working around the globe on wildlife conservation.

Direct download mirror: https://wildbookiarepository.azureedge.net/datasets/whaleshark.coco.tar.gz},
keywords= {coco, identification, wildlife, whale shark},
terms= {Use of this dataset in scientific research must provide attribution under the CDLA-Permissive License (version 1.0) and must also cite the original research publication: 

@article{holmberg2009estimating,
  title={Estimating population size, structure, and residency time for whale sharks Rhincodon typus through collaborative photo-identification},
  author={Holmberg, Jason and Norman, Bradley and Arzoumanian, Zaven},
  journal={Endangered Species Research},
  volume={7},
  number={1},
  pages={39--53},
  year={2009}
}},
license= {Community Data License Agreement – Permissive – Version 1.0 (https://cdla.io/permissive-1-0/)},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/bb47cd1d6dde2f49b040495382c778c102409080</link>
</item>
<item>
<title>Great Zebra and Giraffe Count ID Dataset (Dataset)</title>
<description>@article{,
title= {Great Zebra and Giraffe Count ID Dataset},
journal= {},
author= {Wild Me},
year= {2020},
url= {https://www.wildme.org},
abstract= {Our dataset for plains zebra (Equus quagga) is taken from a two-day census of the Nairobi National Park, located just south of the capital’s airport in Nairobi, Kenya.  The “Great Zebra and Giraffe Count” (GZGC) photographic census was organized on February 28th and March 1st 2015 and had the participation of 27 different teams of citizen scientists, 55 total photographers, and collected 9,406 images of plains zebra and Masai giraffe (Giraffa tippelskirchi) (Parham et al. 2017).  Only images containing either zebras or giraffes were included in the exported dataset, a total of 4,948 images, where the original biographical information of the original contributors are removed.  All images are labeled with bounding boxes around the individual animals for which there is ID metadata, meaning some images contain missing boxes and are not intended to be used for object detection training or testing.  Viewpoints for all animal annotations were also added.  All ID assignments were completed using the HotSpotter algorithm (Crall et al. 2013) by visually matching the stripes and spots as seen on the body of the animal.  A total of 2,056 combined names are released for 6,286 individual zebra and 639 giraffe sightings.  This dataset presents as a challenging comparison compared to the whale shark dataset since it contains a significantly higher number of animals that are only seen once during the survey.

The dataset is released in the Microsoft COCO format (https://cocodataset.org/) and therefore uses flat image folders with associated YAML metadata files. We have collapsed the entire dataset into a single "train" label and have left "val" and "test" empty; we do this as an invitation to researchers to experiment with their own novel approaches for dealing with the unbalanced and chaotic distribution on the number of sightings per individual.  All of the images in the dataset have been resized to have a maximum linear dimension of 3,000 pixels.  The metadata for all animal sightings is defined by an axis-aligned bounding box via and includes information on the rotation of the box (theta), the viewpoint of the animal, a species (category) ID, a source image ID, an individual string ID name, and other miscellaneous values.  The temporal ordering of the images, and an anonymized ID for the original photographer, can be determined from the metadata for each image.

For research or press contact, please direct all correspondence to Wild Me at info@wildme.org.  Wild Me (https://www.wildme.org) is a registered 501(c)(3) not-for-profit based in Portland, Oregon, USA and brings state-of-the-art computer vision tools to ecology researchers working around the globe on wildlife conservation.

Direct download mirror: https://wildbookiarepository.azureedge.net/datasets/gzgc.coco.tar.gz},
keywords= {zebra, wildlife, coco, identification, giraffe},
terms= {Use of this dataset in scientific research must provide attribution under the CDLA-Permissive License (version 1.0) and must also cite the original research publication: 

@inproceedings{parham2017animal,
  title={Animal population censusing at scale with citizen science and photographic identification},
  author={Parham, Jason and Crall, Jonathan and Stewart, Charles and Berger-Wolf, Tanya and Rubenstein, Daniel I},
  booktitle={AAAI Spring Symposium-Technical Report},
  year={2017}
}},
license= {Community Data License Agreement – Permissive – Version 1.0 (https://cdla.io/permissive-1-0/)},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/69160c6bf11275321017f18124dbaff2d381b21c</link>
</item>
<item>
<title>Medical Imaging with Deep Learning Tutorial 2020 - Joseph Paul Cohen (Course)</title>
<description>@article{,
title= {Medical Imaging with Deep Learning Tutorial 2020 - Joseph Paul Cohen},
keywords= {radiology},
author= {Joseph Paul Cohen},
abstract= {This tutorial will be styled as a graduate lecture about medical imaging with deep learning. This will cover the background of popular medical image domains (chest X-ray and histology) as well as methods to tackle multi-modality/view, segmentation, and counting tasks. These methods will be covered in terms of architecture and objective function design. Also, a discussion about incorrect feature attribution and approaches to mitigate the issue. Prerequisites: basic knowledge of computer vision (CNNs) and machine learning (regression, gradient descent).

Presented by:
Joseph Paul Cohen PhD
Postdoctoral Fellow
Mila, University of Montreal

View presentations online here: https://www.youtube.com/playlist?list=PLheiZMDg_8ufxEx9cNVcOYXsT3BppJP4b

https://i.imgur.com/0eexA1V.jpg

https://i.imgur.com/GhTVcY0.jpg},
terms= {},
license= {https://creativecommons.org/licenses/by/4.0/},
superseded= {},
url= {https://www.youtube.com/playlist?list=PLheiZMDg_8ufxEx9cNVcOYXsT3BppJP4b}
}

</description>
<link>https://academictorrents.com/download/e0974c84449826e34d8cc96c943cba2af18ab514</link>
</item>
<item>
<title>DRIVE: Digital Retinal Images for Vessel Extraction (Dataset)</title>
<description>@article{,
title= {DRIVE: Digital Retinal Images for Vessel Extraction},
keywords= {},
author= {},
abstract= {The DRIVE database has been established to enable comparative studies on segmentation of blood vessels in retinal images. Retinal vessel segmentation and delineation of morphological attributes of retinal blood vessels, such as length, width, tortuosity, branching patterns and angles are utilized for the diagnosis, screening, treatment, and evaluation of various cardiovascular and ophthalmologic diseases such as diabetes, hypertension, arteriosclerosis and chorodial neovascularization. Automatic detection and analysis of the vasculature can assist in the implementation of screening programs for diabetic retinopathy, can aid research on the relationship between vessel tortuosity and hypertensive retinopathy, vessel diameter measurement in relation with diagnosis of hypertension, and computer-assisted laser surgery. Automatic generation of retinal maps and extraction of branch points have been used for temporal or multimodal image registration and retinal image mosaic synthesis. Moreover, the retinal vascular tree is found to be unique for each individual and can be used for biometric identification.

## Data

The photographs for the DRIVE database were obtained from a diabetic retinopathy screening program in The Netherlands. The screening population consisted of 400 diabetic subjects between 25-90 years of age. Forty photographs have been randomly selected, 33 do not show any sign of diabetic retinopathy and 7 show signs of mild early diabetic retinopathy. Here is a brief description of the abnormalities in these 7 cases:

25_training: pigment epithelium changes, probably butterfly maculopathy with pigmented scar in fovea, or choroidiopathy, no diabetic retinopathy or other vascular abnormalities.

26_training: background diabetic retinopathy, pigmentary epithelial atrophy, atrophy around optic disk

32_training: background diabetic retinopathy

03_test: background diabetic retinopathy

08_test: pigment epithelium changes, pigmented scar in fovea, or choroidiopathy, no diabetic retinopathy or other vascular abnormalities 

14_test: background diabetic retinopathy 

17_test: background diabetic retinopathy

Each image has been JPEG compressed.

The images were acquired using a Canon CR5 non-mydriatic 3CCD camera with a 45 degree field of view (FOV). Each image was captured using 8 bits per color plane at 768 by 584 pixels. The FOV of each image is circular with a diameter of approximately 540 pixels. For this database, the images have been cropped around the FOV. For each image, a mask image is provided that delineates the FOV.

The set of 40 images has been divided into a training and a test set, both containing 20 images. For the training images, a single manual segmentation of the vasculature is available. For the test cases, two manual segmentations are available; one is used as gold standard, the other one can be used to compare computer generated segmentations with those of an independent human observer. Furthermore, a mask image is available for every retinal image, indicating the region of interest. All human observers that manually segmented the vasculature were instructed and trained by an experienced ophthalmologist. They were asked to mark all pixels for which they were for at least 70% certain that they were vessel.

https://i.imgur.com/AkjZ5pz.png},
terms= {},
license= {},
superseded= {},
url= {https://drive.grand-challenge.org/}
}

</description>
<link>https://academictorrents.com/download/062dc18f55b086c76c718ac88f98972789b3c04c</link>
</item>
<item>
<title>Object-CXR - Automatic detection of foreign objects on chest X-rays (Dataset)</title>
<description>@article{,
title= {Object-CXR - Automatic detection of foreign objects on chest X-rays},
keywords= {radiology},
author= {JF Healthcare},
abstract= {## Data
5000 frontal chest X-ray images with foreign objects presented and 5000 frontal chest X-ray images without foreign objects were filmed and collected from about 300 township hosiptials in China. 12 medically-trained radiologists with 1 to 3 years of experience annotated all the images. Each annotator manually annotates the potential foreign objects on a given chest X-ray presented within the lung field. Foreign objects were annotated with bounding boxes, bounding ellipses or masks depending on the shape of the objects. Support devices were excluded from annotation. A typical frontal chest X-ray with foreign objects annotated looks like this:

https://i.imgur.com/SFUZy80.jpg


## Annotation

Object-level annotations for each image, which indicate the rough location of each foreign object using a closed shape.

Annotations are provided in csv files and a csv example is shown below.

```csv
image_path,annotation
/path/#####.jpg,ANNO_TYPE_IDX x1 y1 x2 y2;ANNO_TYPE_IDX x1 y1 x2 y2 ... xn yn;...
/path/#####.jpg,
/path/#####.jpg,ANNO_TYPE_IDX x1 y1 x2 y2
...
```

Three type of shapes are used namely rectangle, ellipse and polygon. We use `0`, `1` and `2` as `ANNO_TYPE_IDX` respectively.

- For rectangle and ellipse annotations, we provide the bounding box (upper left and lower right) coordinates in the format `x1 y1 x2 y2` where `x1` &lt; `x2` and `y1` &lt; `y2`.

- For polygon annotations, we provide a sequence of coordinates in the format `x1 y1 x2 y2 ... xn yn`.

&gt; ### Note:
&gt; Our annotations use a Cartesian pixel coordinate system, with the origin (0,0) in the upper left corner. The x coordinate extends from left to right; the y coordinate extends downward.

## Organizers
[JF Healthcare](http://www.jfhealthcare.com/) is the primary organizer of this challenge.
},
terms= {},
license= {https://creativecommons.org/licenses/by-nc/4.0/},
superseded= {},
url= {https://web.archive.org/web/20201127235812/https://jfhealthcare.github.io/object-CXR/}
}

</description>
<link>https://academictorrents.com/download/fdc91f11d7010f7259a05403fc9d00079a09f5d5</link>
</item>
<item>
<title>Sci-Hub SQL Database (2020-05-30) (Dataset)</title>
<description>@article{,
title= {Sci-Hub SQL Database (2020-05-30)},
journal= {},
author= {Library Genesis},
year= {2020},
url= {https://gitlab.com/lucidhack/knowl/-/wikis/References/Libgen-Articles-Tables},
abstract= {Sci-Hub is a website that provides free access to scientific research articles and books by bypassing publisher paywalls. Sci-Hub does not make an article database publicly available. Instead, Library Genesis indexes files provided by Sci-Hub into a separate database (internally named "scimag"). The Library Genesis "scimag" database indexed 82,513,235 articles as of 2020-07-07. The database consists only of DOI and article metadata and does not contain the articles themselves. Each row of the database represents an entry for a full-text scientific article, uniquely identified by its DOI.

Timestamped 2020-05-30 04:54

Current as of 2020-07-07

File hashes

MD5:1CC808AD4ACC430A4B3A40892252793A

SHA-1:5369999C6A534A693D41D32A7D5869505C292155

Related research

Cabanac, G. (2016). Bibliogifts in LibGen? A study of a text-sharing platform driven by biblioleaks and crowdsourcing. Journal of the Association for Information Science and Technology, 67(4), 874–884. https://doi.org/10.1002/asi.23445

Greshake B. Looking into Pandora's Box: The Content of Sci-Hub and its Usage. F1000Research 2017, 6:541. https://doi.org/10.12688/f1000research.11366.1

Himmelstein, D. S., Romero, A. R., Levernier, J. G., Munro, T. A., McLaughlin, S. R., Tzovaras, B. G., &amp; Greene, C. S. (2018). Sci-Hub provides access to nearly all scholarly literature. ELife, 7, e32822. https://doi.org/10.7554/eLife.32822},
keywords= {openscience, librarygenesis, doi, openaccess, scihub, database},
terms= {},
license= {CC-0},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/4b13244559282f9650a382f70506dc4c516215e2</link>
</item>
<item>
<title>SIIM-ACR Pneumothorax Segmentation (Dataset)</title>
<description>@article{,
title= {SIIM-ACR Pneumothorax Segmentation},
keywords= {radiology},
author= {Society for Imaging Informatics in Medicine (SIIM)},
abstract= {In this competition, you’ll develop a model to classify (and if present, segment) pneumothorax from a set of chest radiographic images. If successful, you could aid in the early recognition of pneumothoraces and save lives.

What am I predicting?
We are attempting to a) predict the existence of pneumothorax in our test images and b) indicate the location and extent of the condition using masks. Your model should create binary masks and encode them using RLE. 

https://i.imgur.com/xJYwEv4.png},
terms= {},
license= {},
superseded= {},
url= {https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation}
}

</description>
<link>https://academictorrents.com/download/6ef7c6d039e85152c4d0f31d83fa70edc4aba088</link>
</item>
<item>
<title>Leaf counting dataset (Dataset)</title>
<description>@article{,
title= {Leaf counting dataset},
keywords= {},
author= {Teimouri, Nima and Dyrmann, Mads and Nielsen, Per  Rydahl and Mathiassen, Solvejg  Kopp and Somerville, Gayle  J. and Jørgensen, Rasmus  Nyholm},
abstract= {## Leaf counting dataset

Dataset containing  9372 RGB images of weeds with the number of leaves counted.
The images are collected in fields across Denmark using Nokia and Samsung
cell phone cameras; Samsung, Nikon, Canon and Sony consumer cameras; and a Point Grey
industrial camera.


https://i.imgur.com/h7JFf86.jpg


## Citation

If you use this dataset in your research or elsewhere, please cite/reference the following paper:
PAPER: Weed Growth Stage Estimator Using Deep Convolutional Neural Networks

Bibtex
```
@Article{s18051580,
author = {Teimouri, Nima and Dyrmann, Mads and Nielsen, Per  Rydahl and Mathiassen, Solvejg  Kopp and Somerville, Gayle  J. and Jørgensen, Rasmus  Nyholm},
title = {Weed Growth Stage Estimator Using Deep Convolutional Neural Networks},
journal = {Sensors},
volume = {18},
year = {2018},
number = {5},
url = {http://www.mdpi.com/1424-8220/18/5/1580},
issn = {1424-8220}
}
```},
terms= {},
license= {https://creativecommons.org/licenses/by-sa/4.0/},
superseded= {},
url= {https://vision.eng.au.dk/leaf-counting-dataset/}
}

</description>
<link>https://academictorrents.com/download/a147c27ea0a9c155df9d77af832c321210cf5529</link>
</item>
<item>
<title>[Coursera] What A Plant Knows (Daniel Chamovitz, Tel Aviv University) (Course)</title>
<description>@article{,
title= {[Coursera] What A Plant Knows (Daniel Chamovitz, Tel Aviv University)},
journal= {},
author= {Daniel Chamovitz},
year= {},
url= {},
abstract= {For centuries we have collectively marveled at plant diversity and form—from Charles Darwin’s early fascination with stems and flowers to Seymour Krelborn’s distorted doting in Little Shop of Horrors. This course intends to present an intriguing and scientifically valid look at how plants themselves experience the world—from the colors they see to the sensations they feel. Highlighting the latest research in genetics and more, we will delve into the inner lives of plants and draw parallels with the human senses to reveal that we have much more in common with sunflowers and oak trees than we may realize. We’ll learn how plants know up from down, how they know when a neighbor has been infested by a group of hungry beetles, and whether they appreciate the music you’ve been playing for them or if they’re just deaf to the sounds around them. We’ll explore definitions of memory and consciousness as they relate to plants in asking whether we can say that plants might even be aware of their surroundings. This highly interdisciplinary course meshes historical studies with cutting edge modern research and will be relevant to all humans who seek their place in nature. 

This class has three main goals: 1. To introduce you to basic plant biology by exploring plant senses (sight, smell, hearing, touch, taste, balance). 2. To introduce you to biological research and the scientific method. 3. To get the student to question life in general and what defines us as humans.

Once you've taken this course, if you are interested in a more in-depth study of plants, check out my follow-up course, Fundamentals of Plant Biology (https://www.coursera.org/learn/plant-biology/home/welcome).

In order to receive academic credit for this course you must successfully pass the academic exam on campus. For information on how to register for the academic exam – https://tauonline.tau.ac.il/registration

Additionally, you can apply to certain degrees using the grades you received on the courses. Read more on this here – 
https://go.tau.ac.il/b.a/mooc-acceptance

Teachers interested in teaching this course in their class rooms are invited to explore our Academic High school program here – https://tauonline.tau.ac.il/online-highschool

https://i.imgur.com/yvcoRwi.png},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/81ff5fc1df7c1fb9300e9712368dfc479427004d</link>
</item>
<item>
<title>PMC Open Access Subset (Dataset)</title>
<description>@article{,
title= {PMC Open Access Subset},
journal= {},
author= {NIH/NLM},
year= {},
url= {https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/},
abstract= {https://i.imgur.com/GBSDr8v.png

mirror of ftp.ncbi.nlm.nih.gov:/pub/pmc/oa_bulk

PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM).

https://www.ncbi.nlm.nih.gov/pmc/

The PMC Open Access Subset some or all openaccess content is a part of the total collection of articles in PMC. The articles in the OA Subset are made available under a Creative Commons or similar license that generally allows more liberal redistribution and reuse than a traditional copyrighted work.

https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/},
keywords= {PMC, PubMed Central},
terms= {},
license= {CC},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/06d6badd7d1b0cfee00081c28fddd5e15e106165</link>
</item>
<item>
<title>TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild (Dataset)</title>
<description>@inproceedings{muller2018trackingnet,
title= {TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild},
author= {Müller, Matthias and Bibi, Adel and Giancola, Silvio and Al-Subaihi, Salman and Ghanem, Bernard},
booktitle= {European Conference on Computer Vision},
year= {2018},
keywords= {},
abstract= {},
terms= {},
license= {},
superseded= {},
url= {}
}

</description>
<link>https://academictorrents.com/download/1faf1b53cc0099d2206f02be42b5688952c3c6b3</link>
</item>
<item>
<title>RSNA Pneumonia Detection Challenge (JPG files) (Dataset)</title>
<description>@article{,
title= {RSNA Pneumonia Detection Challenge (JPG files)},
keywords= {},
author= {},
abstract= {Details from the challenge:

## What am I predicting?

In this challenge competitors are predicting whether pneumonia exists in a given image. They do so by predicting bounding boxes around areas of the lung. Samples without bounding boxes are negative and contain no definitive evidence of pneumonia. Samples with bounding boxes indicate evidence of pneumonia.

When making predictions, competitors should predict as many bounding boxes as they feel are necessary, in the format: confidence x-min y-min width height

There should be only ONE predicted row per image. This row may include multiple bounding boxes.

A properly formatted row may look like any of the following.

For patientIds with no predicted pneumonia / bounding boxes: 0004cfab-14fd-4e49-80ba-63a80b6bddd6,

For patientIds with a single predicted bounding box: 0004cfab-14fd-4e49-80ba-63a80b6bddd6,0.5 0 0 100 100

For patientIds with multiple predicted bounding boxes: 0004cfab-14fd-4e49-80ba-63a80b6bddd6,0.5 0 0 100 100 0.5 0 0 100 100, etc.

## File descriptions
```
stage_2_train.csv - the training set. Contains patientIds and bounding box / target information.
stage_2_detailed_class_info.csv - provides detailed information about the type of positive or negative class for each image.
```
## Data fields
```
patientId _- A patientId. Each patientId corresponds to a unique image.
x_ - the upper-left x coordinate of the bounding box.
y_ - the upper-left y coordinate of the bounding box.
width_ - the width of the bounding box.
height_ - the height of the bounding box.
Target_ - the binary Target, indicating whether this sample has evidence of pneumonia.
```},
terms= {},
license= {},
superseded= {},
url= {https://www.kaggle.com/c/rsna-pneumonia-detection-challenge}
}

</description>
<link>https://academictorrents.com/download/95588a735c9ae4d123f3ca408e56570409bcf2a9</link>
</item>
<item>
<title>LNDb CT scan dataset (training) (Dataset)</title>
<description>@article{,
title= {LNDb CT scan dataset (training)},
keywords= {},
author= {João Pedrosa and Guilherme Aresta and Carlos Ferreira and Márcio Rodrigues and Patrícia Leitão and André Silva Carvalho and João Rebelo and Eduardo Negrão and Isabel Ramos and António Cunha and Aurélio Campilho},
abstract= {The main goal of this challenge is the automatic classification of chest CT scans according to the 2017 Fleischner society pulmonary nodule guidelines for patient follow-up recommendation. 

The LNDb dataset contains 294 CT scans collected retrospectively at the Centro Hospitalar e Universitário de São João (CHUSJ) in Porto, Portugal between 2016 and 2018. All data was acquired under approval from the CHUSJ Ethical Commitee and was anonymised prior to any analysis to remove personal information except for patient birth year and gender. Further details on patient selection and data acquisition can be consulted on the database description paper.

Each CT scan was read by at least one radiologist at CHUSJ to identify pulmonary nodules and other suspicious lesions. A total of 5 radiologists with at least 4 years of experience reading up to 30 CTs per week participated in the annotation process throughout the project. Annotations were performed in a single blinded fashion, i.e. a radiologist would read the scan once and no consensus or review between the radiologists was performed. Each scan was read by at least one radiologist. The instructions for manual annotation were adapted from LIDC-IDRI. Each radiologist identified the following lesions:

 - nodule ⩾3mm: any lesion considered to be a nodule by the radiologist with greatest in-plane dimension larger or equal to 3mm;
 - nodule &lt;3mm: any lesion considered to be a nodule by the radiologist with greatest in-plane dimension smaller than 3mm;
 - non-nodule: any pulmonary lesion considered not to be a nodule by the radiologist, but that contains features which could make it identifiable as a nodule;

The annotation process varied for the different categories. Nodules ⩾3mm were segmented and subjectively characterized according to LIDC-IDRI (ratings on subtlety, internal structure, calcification, sphericity, margin, lobulation, spiculation, texture and likelihood of malignancy). For a complete description of these characteristics the reader is referred to McNitt-Gray et al.. For nodules &lt;3mm the nodule centroid was marked and subjective assessment of the nodule's characteristics was performed. For non-nodules, only the lesion centroid was marked. Given that different radiologists may have read the same CT and no consensus review was performed, variability in radiologist annotations is expected.

Note that from the 294 CTs of the LNDb dataset, 58 CTs with annotations by at least two radiologists have been withheld for the test set, as well as the corresponding annotations.

https://i.imgur.com/MiHSh9c.png},
terms= {The dataset, or any data derived from it, cannot be given or redistributed under any circumstances to persons not belonging to the registered team. If the data in the dataset is remixed, transformed or built upon, the modified data cannot be redistributed under any circumstances;

The dataset cannot be used for commercial purposed under any circumstances;

Appropriate credit must be given to the authors any time this data is used, independent of purpose. Attribution must be done through citation of the database description paper (https://arxiv.org/abs/1911.08434) or (after publication) to the main challenge publication.},
license= {https://creativecommons.org/licenses/by-nc-nd/4.0/},
superseded= {},
url= {https://lndb.grand-challenge.org/Data/}
}

</description>
<link>https://academictorrents.com/download/e3c196b07c8ea94ac5fca872bccf2cc035f4e88d</link>
</item>
<item>
<title>Illinois DOC labeled faces dataset (Dataset)</title>
<description>@article{,
title= {Illinois DOC labeled faces dataset},
journal= {},
author= {Illinois DOC},
year= {},
url= {},
abstract= {This is a dataset of prisoner mugshots and associated data (height, weight, etc). The copyright status is public domain, since it's produced by the government, the photographs do not have sufficient artistic merit, and a mere collection of facts aren't copyrightable.  
  
The source is the Illinois Dept. of Corrections. In total, there are 68149 entries, of which a few hundred have shoddy data.  
  
It's useful for neural network training, since it has pictures from both front and side, and they're (manually) labeled with date of birth, name (useful for clustering), weight, height, hair color, eye color, sex, race, and some various goodies such as sentence duration and whether they're sex offenders.  
  
Here is the readme file:  
  
---BEGIN README---  
Scraped from the Illinois DOC.  
  
https://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=  
https://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=  
https://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=  
  
paste &lt;(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/pub_showside.asp\?idoc\=/g') &lt;(cat ids.txt| sed 's/^/  out=/g' | sed 's/$/.jpg/g') -d '\n' &gt; showside.txt  
paste &lt;(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/pub_showfront.asp\?idoc\=/g') &lt;(cat ids.txt| sed 's/^/  out=/g' | sed 's/$/.jpg/g') -d '\n' &gt; showfront.txt  
paste &lt;(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/inms_print.asp\?idoc\=/g') &lt;(cat ids.txt| sed 's/^/  out=/g' | sed 's/$/.html/g') -d '\n' &gt; inmates_print.txt  
  
aria2c -i ../inmates_print.txt -j4 -x4 -l ../log-$(pwd|rev|cut -d/ -f 1|rev)-$(date +%s).txt  
  
Then use htmltocsv.py to get the csv. Note that the script is very poorly written and may have errors. It also doesn't do anything with the warrant-related info, although there are some commented-out lines which may be relevant.  
Also note that it assumes all the HTML files are located in the inmates directory., and overwrites any csv files in csv if there are any.  
  
front.7z contains mugshots from the front  
side.7z contains mugshots from the side  
inmates.7z contains all the html files  
csv contains the html files converted to CSV  
  
The reason for packaging the images is that many torrent clients would otherwise crash if attempting to load the torrent.  
  
All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.  
Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.  
  
There are 68149 inmates in total, although some (a few hundred) are marked as "Unknown"/"N/A"/"" in one or more fields.  
  
The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."  
Some inmates were marked "Not Available", this has been replaced with "N/A".  
Likewise, the "weight" column has been altered "XXX lbs." -&gt; "XXX". Again, some are marked "N/A".  
  
The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.  
  
The "weight" column is often rounded to the nearest 5 lbs.  
  
Statistics for hair:  
  43305 Black  
  17371 Brown  
   2887 Blonde or Strawberry  
   2539 Gray or Partially Gray  
    740 Red or Auburn  
    624 Bald  
    396 Not Available  
    209 Salt and Pepper  
     70 White  
      7 Sandy  
      1 Unknown  
  
Statistics for sex:  
  63409 Male  
   4740 Female  
  
Statistics for race:  
  37991 Black  
  20992 White  
   8637 Hispanic  
    235 Asian  
    104 Amer Indian  
     94 Unknown  
     92 Bi-Racial  
      4  
  
Statistics for eyes:  
  51714 Brown  
   7808 Blue  
   4259 Hazel  
   2469 Green  
   1382 Black  
    420 Not Available  
     87 Gray  
      9 Maroon  
      1 Unknown  
---END README---  
  
Here is a formal summary:  
  
---BEGIN SUMMARY---  
 Documentation:  
  
1. Title: Illinois DOC dataset  
  
2. Source Information  
   -- Creators: Illinois DOC  
     -- Illinois Department of Corrections  
        1301 Concordia Court  
        P.O. Box 19277  
        Springfield, IL 62794-9277  
        (217) 558-2200 x 2008  
   -- Donor: Anonymous  
   -- Date: 2019  
  
3. Past Usage:  
   -- None  
  
4. Relevant Information:  
   -- All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.  
   -- Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.  
   -- The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."  
   -- Some inmates were marked "Not Available", this has been replaced with "N/A".  
   -- Likewise, the "weight" column has been altered "XXX lbs." -&gt; "XXX". Again, some are marked "N/A".  
   -- The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.  
   -- The "weight" column is often rounded to the nearest 5 lbs.  
  
5. Number of Instances: 68149  
  
6. Number of Attributes: 30 (in some instances, information is missing. If so, it should be treated as unknown or undefined information)  
  
7. Attribute Information:  
   1. ID: Alphanumeric internal ID (string)  
   2. mark: Human-readable string describing marks and scars. May have zero, one, or multiple entries for one ID. (string)  
   3. name: First and last name in format "SURNAME, GIVEN" - upper case. Redacted in provided copy, script must be executed to regenerate column. (string/void)  
   4. date_of_birth: Date of birth in format MM/DD/YYYY. Some inmates are marked as "Not Available" and some inmates are marked as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. (date OR enumeration)  
   5. weight: Physical weight in pounds OR "N/A". Often rounded to 5 lb increments. It may be related to the institution they are kept in. (integer OR void)  
   6. hair: Hair color. One of ("Black", "Brown", "Blonde or Strawberry", "Gray or Partially Gray", "Red or Auburn", "Bald", "Not Available", "Salt and Pepper", "White", "Sandy", "Unknown") (enumeration)  
   7. sex: Sex. One of ("Male", "Female") (enumeration)  
   8. height: Height in inches. (integer)  
   9. race: Race. One of ("Black", "White", "Hispanic", "Asian", "Amer Indian", "Unknown", "Bi-Racial", "") (enumeration)  
  10. eyes: Eye color. One of ("Brown", "Blue", "Hazel", "Green", "Black", "Not Available", "Gray", "Maroon", "Unknown") (enumeration)  
  11. admission_date: Date of admission in format MM/DD/YYYY. (date)  
  12. projected_parole_date: Projected parole date in format MM/DD/YYYY OR one of ("TO BE DETERMINED", "Sexually D", "3yrs---Lif", "3yrs---Lif", "TO BE DETERMINED BY COMMITTING COURT") OR "" (if none projected) (date OR enumeration OR void)  
  13. last_paroled_date: Last paroled date in format MM/DD/YYYY OR "" (if not paroled). (date OR void)  
  14. projected_discharge_date: Projected discharge date in format MM/DD/YYYY OR one of ("TO BE DETERMINED", "3 YRS TO LIFE - TO BE DETERMINED", "INELIGIBLE", "SEXUALLY D", "TO BE DETERMINED BY COMMITTING COURT", "PENDING", "3 YRS TO L") OR "". (date OR enumeration OR void)  
  15. parole_date: Parole date in format MM/DD/YYYY OR "". (date OR void)  
  16. electronic_detention_date: Electronic detention date in format MM/DD/YYYY OR "". (date OR void)  
  17. discharge_date: Date of discharge from institution. Always "", since discharged offenders are not included in the data set. (void)  
  18. parent_institution: Institution at which offender is kept, or "PAROLE" if parole. One of ("STATEVILLE CORRECTIONAL CENTER", "SHERIDAN CORRECTIONAL CENTER", "PINCKNEYVILLE CORRECTIONAL CENTER", "MENARD CORRECTIONAL CENTER", "LOGAN CORRECTIONAL CENTER", "ILLINOIS RIVER CORRECTIONAL CENTER", "DIXON CORRECTIONAL CENTER", "VANDALIA CORRECTIONAL CENTER", "GRAHAM CORRECTIONAL CENTER", "LAWRENCE CORRECTIONAL CENTER", "EAST MOLINE CORRECTIONAL CENTER", "SHAWNEE CORRECTIONAL CENTER", "JACKSONVILLE CORRECTIONAL CENTER", "DANVILLE CORRECTIONAL CENTER", "VIENNA CORRECTIONAL CENTER", "HILL CORRECTIONAL CENTER", "BIG MUDDY CORRECTIONAL CENTER", "CENTRALIA CORRECTIONAL CENTER", "ROBINSON CORRECTIONAL CENTER", "WESTERN ILLINOIS CORRECTIONAL CENTER", "LINCOLN CORRECTIONAL CENTER", "TAYLORVILLE CORRECTIONAL CENTER", "SOUTHWESTERN CORRECTIONAL CENTER", "PONTIAC CORRECTIONAL CENTER", "CONCORDIA", "DECATUR CORRECTIONAL CENTER", "KEWANEE LIFE SKILLS RE-ENTRY CENTER", "JOLIET TREATMENT CENTER", "PAROLE") (enumeration)  
  19. offender_status: Status of offender. One of ("CUSTODY", "PAROLE", "ABSCONDER", "RECEPTION", "WORK RELEASE CUSTODY", "TEMP RESIDENT", "NON-IDOC CUSTODY", "WRIT", "BOND", "HOME CUSTODY", "DETAINER", "MEDICAL FURLOUGH", "ESCAPE") (enumeration)  
  20. location: Location. One of ("PAROLE DISTRICT 1", "PAROLE DISTRICT 2", "PAROLE DISTRICT 3", "MENARD", "INTERSTATE COMPACT", "PINCKNEYVILLE", "LAWRENCE CORRECTIONAL CENTER", "PAROLE DISTRICT 4", "ILLINOIS RIVER", "DANVILLE", "HILL", "SHAWNEE", "DIXON", "SHERIDAN", "BIG MUDDY RIVER", "LOGAN", "PAROLE", "GRAHAM", "CENTRALIA", "EAST MOLINE", "NORTHERN RECEPTION CENTER", "VANDALIA", "ROBINSON", "STATEVILLE", "WESTERN ILLINOIS", "VIENNA", "TAYLORVILLE", "LINCOLN", "JACKSONVILLE", "PAROLE DISTRICT 5", "PONTIAC", "DIXON CORRECTIONAL CENTER", "SOUTHWESTERN ILLINOIS", "DECATUR", "", "MENARD MEDIUM SECURITY UNIT", "PONTIAC MEDIUM SECURITY", "GRAHAM R&amp;C", "CROSSROADS CCC", "KEWANEE", "ILL/OTH STATE/FED CONCURR", "PEORIA CCC", "NORTH LAWNDALE  ADULT TRANSITI", "STATEVILLE FARM", "GREENE COUNTY WORK CAMP", "COURT", "PITTSFIELD WORK CAMP", "FOX VALLEY CCC", "BOND", "SOUTHWESTERN IL WORK CAMP", "MENARD R&amp;C", "ELECTRONIC DETENTION", "CLAYTON WORK CAMP", "DIXON SPRINGS BOOT", "DUQUOIN IMPACT INCARCERATION P", "DETAINER", "PAROLE DISTRICTS", "FURLOUGH", "ESCAPE", "DEPT. OF HUMAN SERVICES", "FED/STATE/TRANSFER OTH ST", "WOMENS TREATMENT CENTER", "JAIL", "CONCORDIA") (enumeration)  
  21. sex_offender_registry_required: Whether the offender is required to register as a sex offender. One of ("true", "") (boolean)  
  22. alias: Aliases, separated by pipe sign OR one of ("", "None Reported") (string OR enumeration)  
  23. mittimus: Mittimus ID (string)  
  24. class: Class of offender. One of ("4", "2", "3", "X", "1", "M", "U", "A", "B", "C") (enumeration)  
  25. count: Count of offenses (?) (integer)  
  26. offense: Offense. One of 1576 values. Appears to have been keyed in by hand. (enumeration/string)  
  27. custody_date: Date at which offender was taken into custody. (date)  
  28. sentence: Duration of sentence in format "X Years Y Months Z Days", where Y and Z may exceed 12 and 31 respectively OR one of ("DEATH", "LIFE", "SDP") (int[3] OR enumeration)  
  29. county: County or "out-of-state". One of ("COOK", "WILL", "WINNEBAGO", "KANE", "DUPAGE", "MADISON", "MACON", "LAKE", "PEORIA", "ST-CLAIR", "CHAMPAIGN", "MCLEAN", "SANGAMON", "KANKAKEE", "VERMILION", "LA SALLE", "TAZEWELL", "ADAMS", "LIVINGSTON", "STEPHENSON", "MCHENRY", "COLES", "WHITESIDE", "JEFFERSON", "MARION", "KENDALL", "ROCK-ISLAND", "KNOX", "HENRY", "DEKALB", "BOONE", "JACKSON", "MONTGOMERY", "MACOUPIN", "SALINE", "FRANKLIN", "LOGAN", "ROCK ISLAND", "CHRISTIAN", "FAYETTE", "CLINTON", "MORGAN", "WILLIAMSON", "JERSEY", "WHITE", "LEE", "MASON", "PIKE", "EDGAR", "RANDOLPH", "WOODFORD", "OGLE", "EFFINGHAM", "FULTON", "GRUNDY", "BOND", "IROQUOIS", "SHELBY", "UNION", "CRAWFORD", "LAWRENCE", "BUREAU", "CLAY", "MCDONOUGH", "DEWITT", "JOHNSON", "PERRY", "WAYNE", "MASSAC", "RICHLAND", "CLARK", "CASS", "HANCOCK", "ALEXANDER", "DOUGLAS", "WABASH", "HAMILTON", "GREENE", "WARREN", "FORD", "EDWARDS", "MONROE", "WASHINGTON", "MOULTRIE", "CUMBERLAND", "MERCER", "MENARD", "CARROLL", "GALLATIN", "SCHUYLER", "JASPER", "BROWN", "CALHOUN", "PIATT", "JO-DAVIESS", "POPE", "HARDIN", "PULASKI", "MARSHALL", "HENDERSON", "ST CLAIR", "PUTNAM", "SCOTT", "STARK", "OUT-OF-STATE", "OUT OF STATE", "JO DAVIESS") OR "" (enumeration or void)  
  30. sentence_discharged: Whether the sentence has been discharged. One of ("YES", "NO") (boolean)  
  
8. Missing Attribute Values: See values marked "void" above.  
  
9. Class Distribution:  
  
Statistics for hair:  
  43305 Black  
  17371 Brown  
   2887 Blonde or Strawberry  
   2539 Gray or Partially Gray  
    740 Red or Auburn  
    624 Bald  
    396 Not Available  
    209 Salt and Pepper  
     70 White  
      7 Sandy  
      1 Unknown  
  
Statistics for sex:  
  63409 Male  
   4740 Female  
  
Statistics for race:  
  37991 Black  
  20992 White  
   8637 Hispanic  
    235 Asian  
    104 Amer Indian  
     94 Unknown  
     92 Bi-Racial  
      4  
  
Statistics for eyes:  
  51714 Brown  
   7808 Blue  
   4259 Hazel  
   2469 Green  
   1382 Black  
    420 Not Available  
     87 Gray  
      9 Maroon  
      1 Unknown  
  
Summary Statistics:  
         median  
weight:  185  
height:  69  
---END SUMMARY---  

Image: ![](https://i.postimg.cc/D7pbKD0g/montage-0.jpg) https://i.postimg.cc/D7pbKD0g/montage-0.jpg},
keywords= {machine learning, Dataset, images, prisoners},
terms= {},
license= {Public Domain},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/4b9b7e449aa732842aea1a7d4e6413f4507aea99</link>
</item>
<item>
<title>musicnet.tar.gz (Dataset)</title>
<description>@article{,
title= {musicnet.tar.gz},
journal= {},
author= {John Thickstun and Zaid Harchaoui and Dean P. Foster and Sham M. Kakade},
year= {},
url= {https://homes.cs.washington.edu/~thickstn/musicnet.html},
abstract= {MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note in every recording, the instrument that plays each note, and the note's position in the metrical structure of the composition. The labels are acquired from musical scores aligned to recordings by dynamic time warping. The labels are verified by trained musicians; we estimate a labeling error rate of 4%. We offer the MusicNet labels to the machine learning and music communities as a resource for training models and a common benchmark for comparing results. },
keywords= {music, music transcription, midi, audio},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/d2b2ae5e3ec4fd475d6e4c517d4c8752a7aa8455</link>
</item>
<item>
<title>Ocular Disease Intelligent Recognition ODIR-5K (Dataset)</title>
<description>@article{,
title= {Ocular Disease Intelligent Recognition ODIR-5K},
keywords= {},
author= {},
abstract= {We collected a structured ophthalmic database of 5,000 patients with age, color fundus photographs from left and right eyes and doctors' diagnostic keywords from doctors (in short, ODIR-5K). This dataset is ‘‘real-life’’ set of patient information collected by Shanggong Medical Technology Co., Ltd. from different hospitals/medical centers in China. In these institutions, fundus images are captured by various cameras in the market, such as Canon, Zeiss and Kowa, resulting into varied image resolutions. Patient identifying information will be removed. Annotations are labeled by trained human readers with quality control management. They classify patient into eight labels including normal (N), diabetes (D), glaucoma (G), cataract (C), AMD (A), hypertension (H), myopia (M) and other diseases/abnormalities (O) based on both eye images and additionally patient age. The publishing of this dataset follows the ethical and privacy rules of China. Table 1 shows one record from ODIR-5K dataset.

The 5,000 patients in this challenge are divided into training, off-site testing and on-site testing subsets. Almost 4,000 cases are used in training stage while others are for testing stages (off-site and on-site). Table 2 shows the distribution of case number with respect to eight labels in different stages. Note: one patient may contains one or multiple labels.

https://i.imgur.com/vXa8rU9.png

https://i.imgur.com/Hs7kYUF.png

},
terms= {},
license= {},
superseded= {},
url= {https://odir2019.grand-challenge.org/}
}

</description>
<link>https://academictorrents.com/download/cf3b8d5ecdd4284eb9b3a80fcfe9b1d621548f72</link>
</item>
<item>
<title>Replicated GPT-2 1.5B Parameter Model (Dataset)</title>
<description>@article{,
title= {Replicated GPT-2 1.5B Parameter Model},
journal= {},
author= { Aaron Gokaslan and Vanya Cohen},
year= {},
url= {https://medium.com/@vanya_cohen/opengpt-2-we-replicated-gpt-2-because-you-can-too-45e34e6d36dc},
abstract= {},
keywords= {openai gpt-2 gpt2 1.5B},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/af468cfbb0284a35e706f5ae9b5dbcb45684f9d2</link>
</item>
<item>
<title>1000 Fundus images with 39 categories (Dataset)</title>
<description>@article{,
title= {1000 Fundus images with 39 categories},
keywords= {},
author= {Joint Shantou International Eye Centre (JSIEC)},
abstract= {All these 1000 fundus images which belong to 39 classes are come from the Joint Shantou International Eye Centre (JSIEC), Shantou city, Guangdong province ,China. These images are a small part of total 209,494 fundus images to be used for training validating and testing our deep learning platform. The copyright of these images belongs to JSIEC, and can be freely used for any purpose.

https://i.imgur.com/kWGUlMo.jpg

```
3.1M1000images/12.Disc swelling and elevation
2.6M1000images/26.Fibrosis
 23M1000images/0.0.Normal
2.4M1000images/15.1.Bietti crystalline dystrophy
3.8M1000images/22.Cotton-wool spots
9.8M1000images/8.MH
4.5M1000images/24.Chorioretinal atrophy-coloboma
3.4M1000images/25.Preretinal hemorrhage
2.9M1000images/14.Congenital disc abnormality
4.7M1000images/28.Silicon oil in eye
 19M1000images/1.1.DR3
3.5M1000images/16.Peripheral retinal degeneration and break
4.4M1000images/10.0.Possible glaucoma
 28M1000images/1.0.DR2
 19M1000images/0.2.Large optic cup
8.9M1000images/7.ERM
 22M1000images/9.Pathological myopia
2.8M1000images/20.Massive hard exudates
 74M1000images/29.0.Blur fundus without PDR
 12M1000images/15.0.Retinitis pigmentosa
2.8M1000images/13.Dragged Disc
5.6M1000images/5.1.VKH disease
 27M1000images/4.Rhegmatogenous RD
 33M1000images/6.Maculopathy
 12M1000images/0.3.DR1
 12M1000images/21.Yellow-white spots-flecks
 19M1000images/2.0.BRVO
1.6M1000images/19.Fundus neoplasm
 13M1000images/29.1.Blur fundus with suspected PDR
4.6M1000images/2.1.CRVO
5.0M1000images/23.Vessel tortuosity
5.5M1000images/10.1.Optic atrophy
6.2M1000images/5.0.CSCR
4.3M1000images/11.Severe hypertensive retinopathy
2.8M1000images/17.Myelinated nerve fiber
6.1M1000images/0.1.Tessellated fundus
5.3M1000images/27.Laser Spots
3.7M1000images/18.Vitreous particles
3.6M1000images/3.RAO
429M1000images
```},
terms= {},
license= {can be freely used for any purpose},
superseded= {},
url= {https://www.kaggle.com/linchundan/fundusimage1000}
}

</description>
<link>https://academictorrents.com/download/6d239d7d6c23f8b2a8046cca7078a7e10c6889d0</link>
</item>
<item>
<title>MRI Dataset for Hippocampus Segmentation (HFH) (hippseg_2011) (Dataset)</title>
<description>@article{,
title= {MRI Dataset for Hippocampus Segmentation (HFH) (hippseg_2011)},
keywords= {},
author= {K. Jafari-Khouzani and K. Elisevich, S. Patel and H. Soltanian-Zadeh},
abstract= {This dataset contains T1-weighted MR images of 50 subjects, 40 of whom are patients with temporal lobe epilepsy and 10 are nonepileptic subjects. Hippocampus labels are provided for 25 subjects for training. The users may submit their segmentation outcomes for the remaining 25 testing images to get a table of segmentation metrics. 

https://i.imgur.com/XSJr6oQ.png

https://i.imgur.com/jWpnVeu.gif


```
HFH
├── ReadMe.txt
├── Test
│   ├── HFH_026.hdr
│   ├── HFH_026.img
│   ├── HFH_027.hdr
│   ├── HFH_027.img
│   ├── HFH_028.hdr
│   ├── HFH_028.img
│   ├── HFH_029.hdr
│   ├── HFH_029.img
│   ├── HFH_030.hdr
│   ├── HFH_030.img
│   ├── HFH_031.hdr
│   ├── HFH_031.img
│   ├── HFH_032.hdr
│   ├── HFH_032.img
│   ├── HFH_033.hdr
│   ├── HFH_033.img
│   ├── HFH_034.hdr
│   ├── HFH_034.img
│   ├── HFH_035.hdr
│   ├── HFH_035.img
│   ├── HFH_036.hdr
│   ├── HFH_036.img
│   ├── HFH_037.hdr
│   ├── HFH_037.img
│   ├── HFH_038.hdr
│   ├── HFH_038.img
│   ├── HFH_039.hdr
│   ├── HFH_039.img
│   ├── HFH_040.hdr
│   ├── HFH_040.img
│   ├── HFH_041.hdr
│   ├── HFH_041.img
│   ├── HFH_042.hdr
│   ├── HFH_042.img
│   ├── HFH_043.hdr
│   ├── HFH_043.img
│   ├── HFH_044.hdr
│   ├── HFH_044.img
│   ├── HFH_045.hdr
│   ├── HFH_045.img
│   ├── HFH_046.hdr
│   ├── HFH_046.img
│   ├── HFH_047.hdr
│   ├── HFH_047.img
│   ├── HFH_048.hdr
│   ├── HFH_048.img
│   ├── HFH_049.hdr
│   ├── HFH_049.img
│   ├── HFH_050.hdr
│   └── HFH_050.img
└── Train
    ├── HFH_001.hdr
    ├── HFH_001.img
    ├── HFH_002.hdr
    ├── HFH_002.img
    ├── HFH_003.hdr
    ├── HFH_003.img
    ├── HFH_004.hdr
    ├── HFH_004.img
    ├── HFH_005.hdr
    ├── HFH_005.img
    ├── HFH_006.hdr
    ├── HFH_006.img
    ├── HFH_007.hdr
    ├── HFH_007.img
    ├── HFH_008.hdr
    ├── HFH_008.img
    ├── HFH_009.hdr
    ├── HFH_009.img
    ├── HFH_010.hdr
    ├── HFH_010.img
    ├── HFH_011.hdr
    ├── HFH_011.img
    ├── HFH_012.hdr
    ├── HFH_012.img
    ├── HFH_013.hdr
    ├── HFH_013.img
    ├── HFH_014.hdr
    ├── HFH_014.img
    ├── HFH_015.hdr
    ├── HFH_015.img
    ├── HFH_016.hdr
    ├── HFH_016.img
    ├── HFH_017.hdr
    ├── HFH_017.img
    ├── HFH_018.hdr
    ├── HFH_018.img
    ├── HFH_019.hdr
    ├── HFH_019.img
    ├── HFH_020.hdr
    ├── HFH_020.img
    ├── HFH_021.hdr
    ├── HFH_021.img
    ├── HFH_022.hdr
    ├── HFH_022.img
    ├── HFH_023.hdr
    ├── HFH_023.img
    ├── HFH_024.hdr
    ├── HFH_024.img
    ├── HFH_025.hdr
    ├── HFH_025.img
    └── Labels
        ├── HFH_001_Hipp_Labels.hdr
        ├── HFH_001_Hipp_Labels.img
        ├── HFH_002_Hipp_Labels.hdr
        ├── HFH_002_Hipp_Labels.img
        ├── HFH_003_Hipp_Labels.hdr
        ├── HFH_003_Hipp_Labels.img
        ├── HFH_004_Hipp_Labels.hdr
        ├── HFH_004_Hipp_Labels.img
        ├── HFH_005_Hipp_Labels.hdr
        ├── HFH_005_Hipp_Labels.img
        ├── HFH_006_Hipp_Labels.hdr
        ├── HFH_006_Hipp_Labels.img
        ├── HFH_007_Hipp_Labels.hdr
        ├── HFH_007_Hipp_Labels.img
        ├── HFH_008_Hipp_Labels.hdr
        ├── HFH_008_Hipp_Labels.img
        ├── HFH_009_Hipp_Labels.hdr
        ├── HFH_009_Hipp_Labels.img
        ├── HFH_010_Hipp_Labels.hdr
        ├── HFH_010_Hipp_Labels.img
        ├── HFH_011_Hipp_Labels.hdr
        ├── HFH_011_Hipp_Labels.img
        ├── HFH_012_Hipp_Labels.hdr
        ├── HFH_012_Hipp_Labels.img
        ├── HFH_013_Hipp_Labels.hdr
        ├── HFH_013_Hipp_Labels.img
        ├── HFH_014_Hipp_Labels.hdr
        ├── HFH_014_Hipp_Labels.img
        ├── HFH_015_Hipp_Labels.hdr
        ├── HFH_015_Hipp_Labels.img
        ├── HFH_016_Hipp_Labels.hdr
        ├── HFH_016_Hipp_Labels.img
        ├── HFH_017_Hipp_Labels.hdr
        ├── HFH_017_Hipp_Labels.img
        ├── HFH_018_Hipp_Labels.hdr
        ├── HFH_018_Hipp_Labels.img
        ├── HFH_019_Hipp_Labels.hdr
        ├── HFH_019_Hipp_Labels.img
        ├── HFH_020_Hipp_Labels.hdr
        ├── HFH_020_Hipp_Labels.img
        ├── HFH_021_Hipp_Labels.hdr
        ├── HFH_021_Hipp_Labels.img
        ├── HFH_022_Hipp_Labels.hdr
        ├── HFH_022_Hipp_Labels.img
        ├── HFH_023_Hipp_Labels.hdr
        ├── HFH_023_Hipp_Labels.img
        ├── HFH_024_Hipp_Labels.hdr
        ├── HFH_024_Hipp_Labels.img
        ├── HFH_025_Hipp_Labels.hdr
        └── HFH_025_Hipp_Labels.img

3 directories, 151 files
```},
terms= {The dataset is free to use for research and education. Please refer to the following article if you use it in your publications:

K. Jafari-Khouzani, K. Elisevich, S. Patel, and H. Soltanian-Zadeh, “Dataset of magnetic resonance images of nonepileptic subjects and temporal lobe epilepsy patients for validation of hippocampal segmentation techniques,” Neuroinformatics, 2011.},
license= {free to use for research and education},
superseded= {},
url= {https://www.nitrc.org/projects/hippseg_2011/}
}

</description>
<link>https://academictorrents.com/download/d019f4f082f3fda94f0f74577b50dc30beee7bf8</link>
</item>
<item>
<title>Minecraft Skins (Dataset)</title>
<description>@article{,
title= {Minecraft Skins},
journal= {},
author= {SHA65536},
year= {2019},
url= {},
abstract= {An image data set containing 900,000+ Images of unique Minecraft skins of real players.
Could be used for training a GAN or for other image related applications.
If you make something nice with this i would love to know! Message me! =)
Direct Download Link: http://www.mediafire.com/file/z6wbmo2aqxkztcm/Skins.tar/file
Examples: https://i.imgur.com/ek5c9vR.png},
keywords= {Dataset, images, Image, Minecraft, Skins},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/14cf27fca7f26714d2a5193dc95348a4712cdcdf</link>
</item>
<item>
<title>Twitch Emotes Images Dataset (Dataset)</title>
<description>@article{,
title= {Twitch Emotes Images Dataset},
journal= {},
author= {SHA65536},
year= {2019},
url= {},
abstract= {This is a dataset containing over 1,200,000 images of twitch real twitch emotes.
Most emotes (99.99%) are 28 by 28
Could be used to create a GAN or for other applications.

Examples:

https://i.imgur.com/CJKTaWM.png},
keywords= {Dataset, images, Emotes, Image, Twitch},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/168649d9e29662e033d8db9c7bf0077c793d36c8</link>
</item>
<item>
<title>P. vivax (malaria) infected human blood smears (BBBC041) (Dataset)</title>
<description>@article{,
title= {P. vivax (malaria) infected human blood smears (BBBC041)},
keywords= {},
author= {},
abstract= {### Description of the biological application
Malaria is a disease caused by Plasmodium parasites that remains a major threat in global health, affecting 200 million people and causing 400,000 deaths a year. The main species of malaria that affect humans are Plasmodium falciparum and Plasmodium vivax.

For malaria as well as other microbial infections, manual inspection of thick and thin blood smears by trained microscopists remains the gold standard for parasite detection and stage determination because of its low reagent and instrument cost and high flexibility. Despite manual inspection being extremely low throughput and susceptible to human bias, automatic counting software remains largely unused because of the wide range of variations in brightfield microscopy images. However, a robust automatic counting and cell classification solution would provide enormous benefits due to faster and more accurate quantitative results without human variability; researchers and medical professionals could better characterize stage-specific drug targets and better quantify patient reactions to drugs.

Previous attempts to automate the process of identifying and quantifying malaria have not gained major traction partly due to difficulty of replication, comparison, and extension. Authors also rarely make their image sets available, which precludes replication of results and assessment of potential improvements. The lack of a standard set of images nor standard set of metrics used to report results has impeded the field.

### Images
Images are in .png or .jpg format. There are 3 sets of images consisting of 1364 images (~80,000 cells) with different researchers having prepared each one: from Brazil (Stefanie Lopes), from Southeast Asia (Benoit Malleret), and time course (Gabriel Rangel). Blood smears were stained with Giemsa reagent.

### Ground truth
The data consists of two classes of uninfected cells (RBCs and leukocytes) and four classes of infected cells (gametocytes, rings, trophozoites, and schizonts). Annotators were permitted to mark some cells as difficult if not clearly in one of the cell classes. The data had a heavy imbalance towards uninfected RBCs versus uninfected leukocytes and infected cells, making up over 95% of all cells.

A class label and set of bounding box coordinates were given for each cell. For all data sets, infected cells were given a class label by Stefanie Lopes, malaria researcher at the Dr. Heitor Vieira Dourado Tropical Medicine Foundation hospital, indicating stage of development or marked as difficult.

### For more information
These images were contributed by Jane Hung of MIT and the Broad Institute in Cambridge, MA.

https://i.imgur.com/1zrfx2Y.png},
terms= {},
license= {Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License },
superseded= {},
url= {https://data.broadinstitute.org/bbbc/BBBC041/}
}

</description>
<link>https://academictorrents.com/download/2fed90eeaa0fbf98aba474c5d7e56f6290121507</link>
</item>
<item>
<title>ISIC2018: Skin Lesion Analysis Towards Melanoma Detection (Dataset)</title>
<description>@article{,
title= {ISIC2018: Skin Lesion Analysis Towards Melanoma Detection},
keywords= {},
author= {Noel Codella and Veronica Rotemberg and Philipp Tschandl and M. Emre Celebi and Stephen Dusza and David Gutman and Brian Helba and Aadi Kalloo and Konstantinos Liopyris and Michael Marchetti and Harald Kittler and Allan Halpern},
abstract= {This challenge is broken into three separate tasks:

- Task 1: Lesion Segmentation  
- Task 2: Lesion Attribute Detection
- Task 3: Disease Classification

https://i.imgur.com/daTTwFV.png

When using the ISIC 2018 datasets in your research, please cite the following works:

[1] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; https://arxiv.org/abs/1902.03368

[2] Tschandl, P., Rosendahl, C. &amp; Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).},
terms= {},
license= {Creative Commons Attribution-NonCommercial 4.0 International Public
License},
superseded= {},
url= {https://challenge2018.isic-archive.com/}
}

</description>
<link>https://academictorrents.com/download/1e3811b66f1129a2b86b7c291316db8583dbc94f</link>
</item>
<item>
<title>r/WritingPrompts, Text (2018) (Dataset)</title>
<description>@article{,
title= {r/WritingPrompts, Text (2018)},
journal= {},
author= {},
year= {},
url= {},
abstract= {r/WritingPrompts data, formatted for GPT-2 training. },
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/b4fa678ca4a330cf7078750b93eaefb1680a9053</link>
</item>
<item>
<title>01QZP 2018-2019 Ambient Intelligence (Course)</title>
<description>@article{,
title= {01QZP 2018-2019 Ambient Intelligence},
journal= {},
author= {Corno, Fulvio and De Russis, Luigi and Monge Roffarello, Alberto},
year= {},
url= {http://bit.ly/polito-ami},
abstract= {Lectures of Ambient Intelligence at Politecnico di Torino, in 2019.

Topics:

* Introduction to Ambient Intelligence: definitions and available approaches for smart homes, smart buildings, etc. Overview of application areas (home, building, city, traffic, etc.) and types of applications (monitoring, comfort, anomaly detection, ambient assisted living, control and automation, etc.)
* Requirements and design methodology for AmI. Design, analysis and specification of requirements and functionalities related to user interacting with AmI settings.
* Practical programming of AmI systems: the Python language, the Raspberry Pi computer, Web protocols and languages (e.g., HTTP and REST), web-based APIs, and collaboration tools (git, GitHub).

},
keywords= {Ambient Intelligence, Intelligent Environments, Internet of Things, Projects, Python, Smart Environments},
terms= {},
license= {Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International 
https://creativecommons.org/licenses/by-nc-sa/4.0/},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/9bbe28468af204ccefe75662cd184ce0abed0ad4</link>
</item>
<item>
<title>DRIMDB (Diabetic Retinopathy Images Database) Database for Quality Testing of Retinal Images (Dataset)</title>
<description>@article{,
title= {DRIMDB (Diabetic Retinopathy Images Database) Database for Quality Testing of Retinal Images},
keywords= {fundus},
author= {},
abstract= {Retinal image quality assessment (IQA) is a crucial process for automated retinal image analysis systems to obtain an accurate and successful diagnosis of retinal diseases. Consequently, the first step in a good retinal image analysis system is measuring the quality of the input image. We present an approach for finding medically suitable retinal images for retinal diagnosis. 

We used a three-class grading system that consists of good, bad, and outlier classes. We created a retinal image quality dataset with a total of 216 consecutive images called the Diabetic Retinopathy Image Database. We identified the suitable images within the good images for automatic retinal image analysis systems using a novel method. Subsequently, we evaluated our retinal image suitability approach using the Digital Retinal Images for Vessel Extraction and Standard Diabetic Retinopathy Database Calibration level 1 public datasets. The results were measured through the F1 metric, which is a harmonic mean of precision and recall metrics. The highest F1 scores of the IQA tests were 99.60%, 96.50%, and 85.00% for good, bad, and outlier classes, respectively. Additionally, the accuracy of our suitable image detection approach was 98.08%. Our approach can be integrated into any automatic retinal analysis system with sufficient performance scores.

Good:
https://i.imgur.com/D5unNKs.png

Bad:
https://i.imgur.com/slFzaCZ.png

Outlier:
https://i.imgur.com/eG4PDet.png},
terms= {},
license= {},
superseded= {},
url= {https://pubmed.ncbi.nlm.nih.gov/24718384/}
}

</description>
<link>https://academictorrents.com/download/99811ba62918f8e73791d21be29dcc372d660305</link>
</item>
<item>
<title>DiaRetDB1 V2.1 - Diabetic Retinopathy Database (Dataset)</title>
<description>@article{,
title= {DiaRetDB1 V2.1 - Diabetic Retinopathy Database},
keywords= {},
author= {Machine Vision and Pattern Recognition Laboratory},
abstract= {The DiaRetDB1 is a public database for evaluating and benchmarking diabetic retinopathy detection algorithms. The database contains digital images of eye fundus and expert annotated ground truth for several well-known diabetic fundus lesions (hard exudates, soft exudates, microaneurysms and hemorrhages). The original images and the raw ground truth are both available.
In addition to the data we also provide Matlab functionality (M-files) to read data (XML-files), fuse data of several experts and to evaluate detection methods.

This database is related to ImageRet project and the ground truth was collected using our ImgAnnoTool image annotation tool (contact Lasse Lensu for more information). For a more detailed description, see our documentation, please.

### Authors

The following authors have significantly contributed to the actual work of establishing and collecting the data and implementing the methods for the database:
Tomi Kauppi, Valentina Kalesnykiene, Iiris Sorri, Asta Raninen, Raija Voutilainen, Joni Kamarainen, Lasse Lensu and Hannu Uusitalo.

90 images

https://i.imgur.com/Oy7GJSR.png
},
terms= {},
license= {},
superseded= {},
url= {http://www.it.lut.fi/project/imageret/diaretdb1_v2_1/}
}

</description>
<link>https://academictorrents.com/download/817b91fd639263f6f644de4ccc9575c20b005c6c</link>
</item>
<item>
<title>MS-Celeb-1M: {A} Dataset and Benchmark for Large-Scale Face Recognition (Dataset)</title>
<description>@article{dblp:journals/corr/guozhhg16,
author= {Yandong Guo and               Lei Zhang and               Yuxiao Hu and               Xiaodong He and               Jianfeng Gao},
title= {MS-Celeb-1M: {A} Dataset and Benchmark for Large-Scale Face Recognition},
journal= {CoRR},
volume= {abs/1607.08221},
year= {2016},
url= {http://arxiv.org/abs/1607.08221},
archiveprefix= {arXiv},
eprint= {1607.08221},
timestamp= {Mon, 13 Aug 2018 16:46:27 +0200},
biburl= {https://dblp.org/rec/bib/journals/corr/GuoZHHG16},
bibsource= {dblp computer science bibliography, https://dblp.org},
abstract= {In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base. More specifically, we propose a benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individual on the web as training data. The rich information provided by the knowledge base helps to conduct disambiguation and improve the recognition accuracy, and contributes to various real-world applications, such as image captioning and news video analysis. Associated with this task, we design and provide concrete measurement set, evaluation protocol, as well as training data. We also present in details our experiment setup and report promising baseline results. Our benchmark task could lead to one of the largest classification problems in computer vision. To the best of our knowledge, our training dataset, which contains 10M images in version 1, is the largest publicly available one in the world.
},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/9e67eb7cc23c9417f39778a8e06cca5e26196a97</link>
</item>
<item>
<title>OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized (Dataset)</title>
<description>@article{,
title= {OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized},
journal= {},
author= {eukaryote31 and Joshua Peterson and Aaron Gokaslan and Vanya Cohen},
year= {},
url= {},
abstract= {Code by eukaryote31 and Joshua Peterson: https://github.com/jcpeterson/openwebtext and https://github.com/eukaryote31/openwebtext

Scraped by Aaron Gokaslan and Vanya Cohen: https://skylion007.github.io/OpenWebTextCorpus/

Tokenized by eukaryote31},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/36c39b25657ce1639ccec0a91cf242b42e1f01db</link>
</item>
<item>
<title>MIT-BIH Arrhythmia Database (Dataset)</title>
<description>@article{,
title= {MIT-BIH Arrhythmia Database},
keywords= {},
author= {Moody GB, Mark RG.},
abstract= {Since 1975, our laboratories at Boston's Beth Israel Hospital (now the Beth Israel Deaconess Medical Center) and at MIT have supported our own research into arrhythmia analysis and related subjects. One of the first major products of that effort was the MIT-BIH Arrhythmia Database, which we completed and began distributing in 1980. The database was the first generally available set of standard test material for evaluation of arrhythmia detectors, and has been used for that purpose as well as for basic research into cardiac dynamics at more than 500 sites worldwide. Originally, we distributed the database on 9-track half-inch digital tape at 800 and 1600 bpi, and on quarter-inch IRIG-format FM analog tape. In August, 1989, we produced a CD-ROM version of the database.

The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.

The recordings were digitized at 360 samples per second per channel with 11-bit resolution over a 10 mV range. Two or more cardiologists independently annotated each record; disagreements were resolved to obtain the computer-readable reference annotations for each beat (approximately 110,000 annotations in all) included with the database.

This directory contains the entire MIT-BIH Arrhythmia Database. About half (25 of 48 complete records, and reference annotation files for all 48 records) of this database has been freely available here since PhysioNet's inception in September 1999. The 23 remaining signal files, which had been available only on the MIT-BIH Arrhythmia Database CD-ROM, were posted here in February 2005.

Much more information about this database may be found in the MIT-BIH Arrhythmia Database Directory.


## Citation

Moody GB, Mark RG. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng in Med and Biol 20(3):45-50 (May-June 2001). (PMID: 11446209)},
terms= {},
license= {},
superseded= {},
url= {https://physionet.org/physiobank/database/mitdb/}
}

</description>
<link>https://academictorrents.com/download/78d14c9cb4fa765b3c323c1a26bd114e2b30ef34</link>
</item>
<item>
<title>CMU Graphics Lab Motion Capture Database Converted to FBX (Dataset)</title>
<description>@article{,
title= {CMU Graphics Lab Motion Capture Database Converted to FBX},
journal= {},
author= {CMU Graphics Lab},
year= {},
url= {http://mocap.cs.cmu.edu/},
abstract= {Collection of various motion capture recordings (walking, dancing, sports, and others) performed by over 140 subjects. The database contains free motions which you can download and use.
The original dataset is delivered by the authors in the Acclaim format. This version of the dataset is a conversion to FBX based on the BVH conversion by B. Hahne with some fixes in T-Poses and framerates.},
keywords= {motion capture},
terms= {This data is free for use in research projects.
You may include this data in commercially-sold products, 
but you may not resell this data directly, even in converted form.
If you publish results obtained using this data, we would appreciate it
if you would send the citation to your published paper to jkh+mocap@cs.cmu.edu,
and also would add this text to your acknowledgments section:
The data used in this project was obtained from mocap.cs.cmu.edu.
The database was created with funding from NSF EIA-0196217.},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/8e21416d1584981ef3e9d8a97ee4278f93390623</link>
</item>
<item>
<title>Stanford Drone Dataset (Dataset)</title>
<description>@article{,
title= {Stanford Drone Dataset},
keywords= {},
author= {A. Robicquet and A. Sadeghian and A. Alahi and S. Savarese},
abstract= {When humans navigate a crowed space such as a university campus or the sidewalks of a busy street, they follow common sense rules based on social etiquette. In order to enable the design of new algorithms that can fully take advantage of these rules to better solve tasks such as target tracking or trajectory forecasting, we need to have access to better data. To that end, we contribute the very first large scale dataset (to the best of our knowledge) that collects images and videos of various types of agents (not just pedestrians, but also bicyclists, skateboarders, cars, buses, and golf carts) that navigate in a real world outdoor environment such as a university campus. In the above images, pedestrians are labeled in pink, bicyclists in red, skateboarders in orange, and cars in green.

https://i.imgur.com/iJl5sUN.png

https://i.imgur.com/XOBHAoE.png

https://i.imgur.com/MDruCEV.png

https://i.imgur.com/cYpHgG5.png


### CITATION

If you find this dataset useful, please cite this paper (and refer the data as Stanford Drone Dataset or SDD):
A. Robicquet, A. Sadeghian, A. Alahi, S. Savarese, Learning Social Etiquette: Human Trajectory Prediction In Crowded Scenes in European Conference on Computer Vision (ECCV), 2016.},
terms= {},
license= {Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License},
superseded= {},
url= {http://cvgl.stanford.edu/projects/uav_data/}
}

</description>
<link>https://academictorrents.com/download/01f95ea32e160e6c251ea55a87bd5a24b23cb03d</link>
</item>
<item>
<title>Inria Aerial Image Labeling Dataset (Dataset)</title>
<description>@article{,
title= {Inria Aerial Image Labeling Dataset},
keywords= {},
author= {Emmanuel Maggiori and Yuliya Tarabalka and Guillaume Charpiat and Pierre Alliez},
abstract= {The Inria Aerial Image Labeling addresses a core topic in remote sensing: the automatic pixelwise labeling of aerial imagery.

Dataset features:

Coverage of 810 km² (405 km² for training and 405 km² for testing)
Aerial orthorectified color imagery with a spatial resolution of 0.3 m
Ground truth data for two semantic classes: building and not building (publicly disclosed only for the training subset)
The images cover dissimilar urban settlements, ranging from densely populated areas (e.g., San Francisco’s financial district) to alpine towns (e.g,. Lienz in Austrian Tyrol).

Instead of splitting adjacent portions of the same images into the training and test subsets, different cities are included in each of the subsets. For example, images over Chicago are included in the training set (and not on the test set) and images over San Francisco are included on the test set (and not on the training set). The ultimate goal of this dataset is to assess the generalization power of the techniques: while Chicago imagery may be used for training, the system should label aerial images over other regions, with varying illumination conditions, urban landscape and time of the year.

The dataset was constructed by combining public domain imagery and public domain official building footprints.

https://i.imgur.com/wAL5IUX.png

Citation
Emmanuel Maggiori, Yuliya Tarabalka, Guillaume Charpiat and Pierre Alliez. “Can Semantic Labeling Methods Generalize to Any City? The Inria Aerial Image Labeling Benchmark”. IEEE International Geoscience and Remote Sensing Symposium (IGARSS). 2017.

},
terms= {},
license= {},
superseded= {},
url= {https://project.inria.fr/aerialimagelabeling/}
}

</description>
<link>https://academictorrents.com/download/cf445f6073540af0803ee345f46294f088e7bba5</link>
</item>
<item>
<title>PROSTATEx (Dataset)</title>
<description>@article{,
title= {PROSTATEx},
journal= {},
author= {Geert Litjens and Oscar Debats and Jelle Barentsz and Nico Karssemeijer, and Henkjan Huisman},
year= {},
url= {https://wiki.cancerimagingarchive.net/display/Public/SPIE-AAPM-NCI+PROSTATEx+Challenges},
abstract= {This collection is a retrospective set of prostate MR studies. All studies included T2-weighted (T2W), proton density-weighted (PD-W), dynamic contrast enhanced (DCE), and diffusion-weighted (DW) imaging. The images were acquired on two different types of Siemens 3T MR scanners, the MAGNETOM Trio and Skyra. T2-weighted images were acquired using a turbo spin echo sequence and had a resolution of around 0.5 mm in plane and a slice thickness of 3.6 mm. The DCE time series was acquired using a 3-D turbo flash gradient echo sequence with a resolution of around 1.5 mm in-plane, a slice thickness of 4 mm and a temporal resolution of 3.5 s. The proton density weighted image was acquired prior to the DCE time series using the same sequence with different echo and repetition times and a different flip angle. Finally, the DWI series were acquired with a single-shot echo planar imaging sequence with a resolution of 2 mm in-plane and 3.6 mm slice thickness and with diffusion-encoding gradients in three directions. Three b-values were acquired (50, 400, and 800), and subsequently, the ADC map was calculated by the scanner software. All images were acquired without an endorectal coil.

https://i.imgur.com/dh121Ur.png






## Citation

G. Litjens, O. Debats, J. Barentsz, N. Karssemeijer and H. Huisman. "Computer-aided detection of prostate cancer in MRI", IEEE Transactions on Medical Imaging 2014;33:1083-1092.},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/5a447ff50062194bd58dd11c0fedead59e6d873c</link>
</item>
<item>
<title>Head-Neck-CT (Dataset)</title>
<description>@article{,
title= {Head-Neck-CT},
keywords= {},
author= {},
abstract= {https://i.imgur.com/4jYnRqK.png

This is a subset of just the CT scans from the original dataset.

"This collection contains FDG-PET/CT and radiotherapy planning CT imaging data of 298 patients from four different institutions in Québec with histologically proven head-and-neck cancer (H&amp;N) All patients had pre-treatment FDG-PET/CT scans between April 2006 and November 2014, and within a median of 18 days (range: 6-66) before treatment Dates in the TCIA images have been changed in the interest of de-identification; the same change was applied across all images, preserving the time intervals between serial scans." 
 These patients were all part of a study described in further detail (treatment, image scanning protocols, etc.) in the publication:

## Publication Citation

Vallières, M. et al. Radiomics strategies for risk assessment of tumour failure in head-and-neck cancer. Sci Rep 7, 10117 (2017). doi: 10.1038/s41598-017-10371-5},
terms= {},
license= {Creative Commons Attribution 3.0 Unported License},
superseded= {},
url= {https://wiki.cancerimagingarchive.net/display/Public/Head-Neck-PET-CT}
}

</description>
<link>https://academictorrents.com/download/d06aafd957f0c8c9b0eb4636e5c3ebdb7bdaf54f</link>
</item>
<item>
<title>UCF Google Street View Dataset 2014 (Dataset)</title>
<description>@article{,
title= {UCF Google Street View Dataset 2014},
keywords= {},
author= {Amir R. Zamir and Mubarak Shah},
abstract= {https://i.imgur.com/MjhbQgK.png

The dataset contains 62,058 high quality Google Street View images. The images cover the
downtown and neighboring areas of Pittsburgh, PA; Orlando, FL and partially Manhattan, NY.
Accurate GPS coordinates of the images and their compass direction are provided as well.

For each Street View placemark (i.e. each spot on one street), the 360° spherical view is broken
down into 4 side views and 1 upward view. There is one additional image per placemark which
shows some overlaid markers, such as the address, name of streets, etc. 


### Citation:

Please cite the following paper for which this data was collected (partially):

Image Geo-localization based on Multiple Nearest Neighbor Feature Matching using
Generalized Graphs. Amir Roshan Zamir and Mubarak Shah. IEEE Transactions on
Pattern Analysis and Machine Intelligence (TPAMI), 2014.

},
terms= {},
license= {},
superseded= {},
url= {https://www.crcv.ucf.edu/data/GMCP_Geolocalization/},
year= {2014}
}

</description>
<link>https://academictorrents.com/download/e52a8978af7c2f734f2b30795075dbcd50efc983</link>
</item>
<item>
<title>PADCHEST_SJ (Feb 2019 Update) (Dataset)</title>
<description>@article{,
title= {PADCHEST_SJ (Feb 2019 Update)},
keywords= {chest xray, radiology},
author= {},
abstract= {This dataset includes more than 160,000 images obtained from 67,000 patients that were interpreted and reported by radiologists at Hospital San Juan Hospital (Spain) from 2009 to 2017, covering six different position views and additional information on image acquisition and patient demography. The reports were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and mapped onto standard Unified Medical Language System (UMLS) terminology.

https://i.imgur.com/MpVlYgB.png},
terms= {},
license= {Creative Commons Attribution-ShareAlike 4.0 International License},
superseded= {},
url= {https://arxiv.org/abs/1901.07441}
}

</description>
<link>https://academictorrents.com/download/dec12db21d57e158f78621f06dcbe78248d14850</link>
</item>
<item>
<title>Lung CT Segmentation Challenge 2017 (LCTSC) (Dataset)</title>
<description>@article{,
title= {Lung CT Segmentation Challenge 2017 (LCTSC)},
keywords= {},
author= {},
abstract= {Average 4DCT or free-breathing (FB) CT images from 60 patients, depending on clinical practice, are used for this challenge. Data were acquired from 3 institutions (20 each). Datasets were divided into three groups, stratified per institution:

36 training datasets
12 off-site test datasets
12 live test datasets

https://i.imgur.com/CzjcFRj.png

|Collection Statistics| |
|--- |--- |
|Image Size (GB)|4.8|
|Modalities|CT, RT|
|Number of Images|9569|
|Number of Patients|60|
|Number of Series|96|
|Number of Studies|60|
},
terms= {},
license= {Creative Commons Attribution 3.0 Unported License},
superseded= {},
url= {https://wiki.cancerimagingarchive.net/display/Public/Lung+CT+Segmentation+Challenge+2017}
}

</description>
<link>https://academictorrents.com/download/0a3611528c9172383656cb1b6a07cfb7f095eb82</link>
</item>
<item>
<title>ImageNet Large Scale Visual Recognition Challenge (V2017) (Dataset)</title>
<description>@article{ilsvrc15,
author= {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei},
title= {ImageNet Large Scale Visual Recognition Challenge (V2017)},
year= {2015},
journal= {International Journal of Computer Vision (IJCV)},
doi= {10.1007/s11263-015-0816-y},
volume= {115},
number= {3},
pages= {211-252},
abstract= {},
keywords= {ILSVRC2017, ILSVRC, ImageNet, MLPerf},
terms= {},
license= {},
superseded= {},
url= {}
}

</description>
<link>https://academictorrents.com/download/943977d8c96892d24237638335e481f3ccd54cfb</link>
</item>
<item>
<title>IDRiD (Indian Diabetic Retinopathy Image Dataset) (Dataset)</title>
<description>@article{,
title= {IDRiD (Indian Diabetic Retinopathy Image Dataset)},
keywords= {},
author= {},
abstract= {IDRiD (Indian Diabetic Retinopathy Image Dataset), is the first database representative of an Indian population. Moreover, it is the only dataset constituting typical diabetic retinopathy lesions and also normal retinal structures annotated at a pixel level. This dataset provides information on the disease severity of diabetic retinopathy, and diabetic macular edema for each image. This makes it perfect for development and evaluation of image analysis algorithms for early detection of diabetic retinopathy.

This dataset was available as a part of "Diabetic Retinopathy: Segmentation and Grading Challenge" organised in conjuction with IEEE International Symposium on Biomedical Imaging (ISBI-2018), Washington D.C.


The dataset is divided into three parts:
A. Segmentation: It consists of
1. Original color fundus images (81 images divided into train and test set - JPG Files)
2. Groundtruth images for the Lesions (Microaneurysms, Haemorrhages, Hard Exudates and Soft Exudates divided into train and test set - TIF Files) and Optic Disc (divided into train and test set - TIF Files)

B. Disease Grading: it consists of
1. Original color fundus images (516 images divided into train set (413 images) and test set (103 images) - JPG Files)
2. Groundtruth Labels for Diabetic Retinopathy and Diabetic Macular Edema Severity Grade (Divided into train and test set - CSV File)

C. Localization: It consists of
1. Original color fundus images (516 images divided into train set (413 images) and test set (103 images) -
JPG Files)
2. Groundtruth Labels for Optic Disc Center Location (Divided into train and test set - CSV File)
3. Groundtruth Labels for Fovea Center Location (Divided into train and test set - CSV File)
 
For more information visit idrid.grand-challenge.org

Sample images (scaled down)

https://i.imgur.com/gajYxoR.png

Sample segmentations of microaneurysms (scaled down)

https://i.imgur.com/f8irOmW.png

Paper:
https://res.mdpi.com/data/data-03-00025/article_deploy/data-03-00025.pdf?filename=&amp;attachment=1
},
terms= {},
license= {Creative Commons Attribution},
superseded= {},
url= {https://ieee-dataport.org/open-access/indian-diabetic-retinopathy-image-dataset-idrid}
}

</description>
<link>https://academictorrents.com/download/3bb974ffdad31f9df9d26a63ed2aea2f1d789405</link>
</item>
<item>
<title>Kaggle Diabetic Retinopathy Detection Training Dataset (DRD) (Dataset)</title>
<description>@article{,
title= {Kaggle Diabetic Retinopathy Detection Training Dataset (DRD)},
keywords= {fundus},
author= {},
abstract= {This dataset is a large set of high-resolution retina images taken under a variety of imaging conditions. A left and right field is provided for every subject. Images are labeled with a subject id as well as either left or right (e.g. 1_left.jpeg is the left eye of patient id 1).

A clinician has rated the presence of diabetic retinopathy in each image on a scale of 0 to 4, according to the following scale:

```
0 - No DR
1 - Mild
2 - Moderate
3 - Severe
4 - Proliferative DR
```
Total Images: 35126. The distribution of labels is: {0: 25810, 1: 2443, 2: 5292, 4: 708, 3: 873}

Your task is to create an automated analysis system capable of assigning a score based on this scale.

The images in the dataset come from different models and types of cameras, which can affect the visual appearance of left vs. right. Some images are shown as one would see the retina anatomically (macula on the left, optic nerve on the right for the right eye). Others are shown as one would see through a microscope condensing lens (i.e. inverted, as one sees in a typical live eye exam). There are generally two ways to tell if an image is inverted:

It is inverted if the macula (the small dark central area) is slightly higher than the midline through the optic nerve. If the macula is lower than the midline of the optic nerve, it's not inverted.
If there is a notch on the side of the image (square, triangle, or circle) then it's not inverted. If there is no notch, it's inverted.

Like any real-world data set, you will encounter noise in both the images and labels. Images may contain artifacts, be out of focus, underexposed, or overexposed. A major aim of this competition is to develop robust algorithms that can function in the presence of noise and variation.

https://i.imgur.com/Tmba2IF.png},
terms= {},
license= {},
superseded= {},
url= {https://www.kaggle.com/c/diabetic-retinopathy-detection}
}

</description>
<link>https://academictorrents.com/download/08c244595c6cc4ec403b21023cf99c2b085cbc72</link>
</item>
<item>
<title>ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events (Dataset)</title>
<description>@incollection{nips2017_6932,
title= {ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events},
author= {Racah, Evan and Beckham, Christopher and Maharaj, Tegan and Kahou, Samira and Prabhat, Mr. and Pal, Chris},
booktitle= {Advances in Neural Information Processing Systems 30},
editor= {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
pages= {3405--3416},
year= {2017},
publisher= {Curran Associates, Inc.},
url= {http://papers.nips.cc/paper/6932-extremeweather-a-large-scale-climate-dataset-for-semi-supervised-detection-localization-and-understanding-of-extreme-weather-events.pdf},
abstract= {The detection and identification of extreme weather events in large-scale climate simulations is an important problem for risk management, informing governmental policy decisions and advancing our basic understanding of the climate system. Recent work has shown that fully supervised convolutional neural networks (CNNs) can yield acceptable accuracy for classifying well-known types of extreme weather events when large amounts of labeled data are available. However, many different types of spatially localized climate patterns are of interest including hurricanes, extra-tropical cyclones, weather fronts, and blocking events among others. Existing labeled data for these patterns can be incomplete in various ways, such as covering only certain years or geographic areas and having false negatives. This type of climate data therefore poses a number of interesting machine learning challenges. We present a multichannel spatiotemporal CNN architecture for semi-supervised bounding box prediction and exploratory data analysis. We demonstrate that our approach is able to leverage temporal information and unlabeled data to improve the localization of extreme weather events. Further, we explore the representations learned by our model in order to better understand this important data. We present a dataset, ExtremeWeather, to encourage machine learning research in this area and to help facilitate further work in understanding and mitigating the effects of climate change. The dataset is available at extremeweatherdataset.github.io and the code is available at https://github.com/eracah/hur-detect.

## Citation
Racah, Evan, et al. "ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events." Advances in Neural Information Processing Systems. 2017.

## Pictures
https://extremeweatherdataset.github.io/variables.jpg},
keywords= {},
terms= {},
license= {Unrestricted Use},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/c5bf370a90cae548d5a306c1be7d79186b9f60b9</link>
</item>
<item>
<title>DeepLesion (10,594 CT scans with lesions) (Dataset)</title>
<description>@article{,
title= {DeepLesion (10,594 CT scans with lesions)},
keywords= {},
author= {Ke Yan (National Institutes of Health Clinical Center)},
abstract= {## Introduction

The DeepLesion dataset contains 32,120 axial computed tomography (CT) slices from 10,594 CT
scans (studies) of 4,427 unique patients. There are 1–3 lesions in each image with accompanying
bounding boxes and size measurements, adding up to 32,735 lesions altogether. The lesion
annotations were mined from NIH’s picture archiving and communication system (PACS). Some
meta-data are also provided. The contents include:
 - Folder “Images\_png”: png image files. We named each slice with the format “{patient
index}\_{study index}\_{series index}\_{slice index}.png”, with the last underscore being / or \
to indicate sub-folders. The images are stored in unsigned 16 bit. One should subtract 32768
from the pixel intensity to obtain the original Hounsfield unit (HU) values.
 We provide not only the key CT slice that contains the lesion annotation, but also its 3D
context (30mm extra slices above and below the key slice). Due to the large size of the data
and the file size limit of the website, we packed them to 56 smaller zip files for downloading.
 - Key_slices.zip: key slices with overlaid lesion annotations for review purposes.
 - Folder “Key_slice_examples”: random image examples chosen from Key_slices.zip.
 - DL_info.csv: The annotations and meta-data. See Section “Annotations” below.

## Reference

Ke Yan, Xiaosong Wang, Le Lu, Ronald M. Summers, "DeepLesion: Automated Mining of
Large-Scale Lesion Annotations and Universal Lesion Detection with Deep Learning", Journal
of Medical Imaging 5(3), 036501 (2018), doi: 10.1117/1.JMI.5.3.036501



## Annotations
In DL_info.csv, each row is the information of a lesion in DeepLesion. The meaning of the columns
are:
1. File name. Please replace the last underscore with / or \ to indicate sub-folders.
2. Patient index starting from 1.
3. Study index for each patient starting from 1. There are 1~26 studies for each patient.
4. Series ID.
5. Slice index of the key slice containing the lesion annotation, starting from 1.
6. 8D vector, the image coordinates (in pixel) of the two RECIST diameters of the lesion. [x11,
y11, x12, y12, x21, y21, x22, y22]. The first 4 coordinates are for the long axis. Please see our paper
and its supplementary material for further explanation.
7. 4D vector, the bounding-box [x1, y1, x2, y2] of the lesion (in pixel) estimated from the RECIST diameters, see our paper
8. 2D vector, the lengths of the long and short axes. The unit is pixels.
9. The relative body position of the center of the lesion. The z-coordinates were predicted by the
self-supervised body part regressor. See our paper for details. The coordinates are approximate
and just for reference.
10. The type of the lesion. Types 1~8 correspond to bone, abdomen, mediastinum, liver, lung,
kidney, soft tissue, and pelvis, respectively. See our paper for details. The lesion types are
coarsely defined and just for reference. Only the lesions in the val and test sets were annotated
with others denoted as -1.
11. This field is set to 1 if the annotation of this lesion is possibly noisy according to manual check.
We found 35 noisy annotations out of 32,735 till now.
12. Slice range. Context slices neighboring to the key slice were provided in this dataset. For
example, in the first lesion, the key slice is 109 and the slice range is 103~115, meaning that
slices 103~115 are provided. For most lesions, we provide 30mm extra slices above and below
the key slice, unless the long axis of the lesion is larger than this thickness (then we provide
more) or the beginning or end of the volume is reached.
13. Spacing (mm per pixel) of the x, y, and z axes. The 3rd value is the slice interval, or the physical
distance between two slices.
14. Image size.
15. The windowing (min~max) in Hounsfield unit extracted from the original DICOM file.
16. Patient gender. F for female and M for male.
17. Patient age.
18. Official randomly generated patient-level data split, train=1, validation=2, test=3.

## Applications
DeepLesion is a large-scale dataset that contains a variety types of lesions. It can be used for lesion
detection, classification, segmentation, retrieval, measurement, growth analysis, relationship mining
between different lesions, etc.
Limitations

Since DeepLesion was mined from PACS, it has a few limitations:
 - DeepLesion contains only 2D diameter measurements and bounding-boxes of lesions. It has no lesion segmentation masks, 3D bounding-boxes, or fine-grained lesion types. Therefore,
some applications (e.g. lesion segmentation) may need extra manual annotations.
 - Not all lesions were annotated in the images. Radiologists typically mark only representative
lesions in each study. Therefore, some lesions remain unannotated.
 - According to manual examination, although most bookmarks represent abnormal findings or
lesions, a small proportion of the bookmarks are actually measurement of normal structures,
such as lymph nodes of normal size.

https://i.imgur.com/AuNDBbz.png

## Acknowledgments 

This research was supported by the Intramural Research Program of the NIH Clinical Center. We
thank NVIDIA for the donation of GPU cards. We thank our lab members Jiamin Liu, Yuxing Tang,
and Youbao Tang for their help in preparing the dataset.},
terms= {},
license= {"usage of the data set is unrestricted"},
superseded= {},
url= {https://nihcc.app.box.com/v/DeepLesion}
}

</description>
<link>https://academictorrents.com/download/de50f4d4aa3d028944647a56199c07f5fa6030ff</link>
</item>
<item>
<title>Condensing Steam: Distilling the Diversity of Gamer Behavior (Dataset)</title>
<description>@article{,
title= {Condensing Steam: Distilling the Diversity of Gamer Behavior},
keywords= {steam, online gaming},
author= {Mark O'Neill and Justin Wu and Elham Vaziripour and Daniel Zappala},
abstract= {109 MILLION GAMERS
716 MILLION GAMES
1.1 MILLION YEARS OF PLAYTIME

A dataset collected and analyzed for the 2016 ACM Internet Measurement Conference article by Mark O'Neill, Justin Wu, Elham Vaziripour, and Daniel Zappala

Table and attribute descriptions
================================

All fields whose names begin with a lowercase letter are from the Steam Web API. Fields whose names begin with an uppercase letter were either obtained via other methods or are derived fields. Steam IDs are unique, 64-bit values representing a specific Steam user.

Achievement\_Percentages
------------------------

This table was obtained through the Web API \[ISteamUserStats::GetGlobalAchievementPercentagesForApp\]. It contains achievmeent completion data for all the products listed in the App\_ID\_Info table.

appid : The ID of the game in question

Name : The name of the achievement as it appears to players. As an internal value assigned by developers, its descriptiveness of the achievement varies.

Percentage : The percentage of players who have finished this achievement out of all total players who own this game.

App\_ID\_Info
-------------

This table was derived from scanning the steam storefront by emulating the REST API calls issued by the Steam client's "Big Picture mode". It contains selected information for each product ("app") offered on Steam. An older version of this table from the time of our first data pull can be found with an "\_Old" suffix.

appid : The ID of the "app" in question, which is not necessarily a game.

Title : The Title of the app, as it appears to users

Type : The type of the "app". Possible values include: "demo," "dlc," "game," "hardware," "mod," and "video." Game is the most common.

Price : The current price of the "app" on the Steam storefront, in US dollars. Free items have a price of 0.

Release\_Date : The date the "app" was made available via the Steam storefront. Note that apps released elsewhere originally and later published through steam carry the date of the Steam publish

Rating : The rating of the "app" on Metacritic. Set to -1 if not applicable.

Required\_Age : The MSRB or PEGI-assigned age requirement for viewing this game in the Steam storefront, and, by extension, clicking the button to purchase it.

Is\_Multiplayer : A value of either 0 or 1 indicating whether or not an "app" contains multiplayer content. Self-reported by developers.

Friends
-------

This table was obtained through the Web API \[ISteamUser::GetFriendList\]. It contains a list of the (reciprocal) friendships of steam users.

steamid\_a : The Steam ID of the user who's friend list was queried

steamid\_b : The Steam ID of the a user who is a friend of the user referenced by steamid\_a

relationship : The type of relationship represented by this entry. Currently the only value used is "friend"

friend\_since : The date and time when the users in this entry became friends. Note that this field was added in 2009 and thus all frienships existing previous this date are recorded with the default unix timestamp (1970)

dateretrieved : Timestamp when this friend list data was requested from the API

Games\_1
--------

This table was obtained through the Web API \[IPlayerService::GetOwnedGames\]. It contains the game data requested during our initial crawl of the Steam network.

steamid : The steam ID of the user in question

appid : The ID of a given app in the user's library

playtime\_2weeks : The total time the user has run this app in the two-week period leading up to when this data was requested from the API. Values are given in minutes.

playtime\_forever : The total time the user has run this app since adding it to their library. Values are given in minutes.

dateretrieved : Timestamp of the time when this game data was requested from the API

Games\_2
--------

This table was obtained through the Web API \[IPlayerService::GetOwnedGames\]. It contains the game data requested during our follow-up crawl of the Steam network.

steamid : The steam ID of the user in question

appid : The ID of a given app in the user's library

playtime\_2weeks : The total time the user has run this app in the two-week period leading up to when this data was requested from the API. Values are given in minutes.

playtime\_forever : The total time the user has run this app since adding it to their library. Values are given in minutes.

dateretrieved : Timestamp of the time when this game data was requested from the API

Games\_Daily
------------

This table was obtained through the Web API \[IPlayerService::GetOwnedGames\]. It contains the game playing data for a select subset of users. Each user's data in the subset was requested repeatedly, every day for five days.

steamid : The steam ID of the user in question

appid : The ID of a given app in the user's library

playtime\_2weeks : The total time the user has run this app in the two-week period leading up to when this data was requested from the API. Values are given in minutes.

playtime\_forever : The total time the user has run this app since adding it to their library. Values are given in minutes.

dateretrieved : Timestamp of the time when this game data was requested from the API

Games\_Developers
-----------------

This table was derived from scanning the steam storefront by emulating the REST API calls issued by the Steam client's "Big Picture mode". It contains the names of the developers for each product on Steam. This is a sister table to App\_ID\_Info. An older version of this table from the time of our first data pull can be found with an "\_Old" suffix.

appid : ID of the app in question

Developer : A developer of the app in question. Note that some apps have multiple developers and thus numerous distinct rows with the same appid are possible.

Games\_Genres
-------------

This table was derived from scanning the steam storefront by emulating the REST API calls issued by the Steam client's "Big Picture mode". It contains the names of the genres for each product on Steam. This is a sister table to App\_ID\_Info. An older version of this table from the time of our first data pull can be found with an "\_Old" suffix.

appid : ID of the app in question

Genre : A genre of the app in question. Note that most apps have multiple genres and thus numerous distinct rows with the same appid are possible.

Games\_Publishers
-----------------

This table was derived from scanning the steam storefront by emulating the REST API calls issued by the Steam client's "Big Picture mode". It contains the names of the publishers for each product on Steam. This is a sister table to App\_ID\_Info. An older version of this table from the time of our first data pull can be found with an "\_Old" suffix.

appid : ID of the app in question

Publisher : A publisher of the app in question. Note that some apps have multiple publishers and thus numerous distinct rows with the same appid are possible.

Groups
------

This table was derived from the steamcommunity.com XML data. It contains a list of all the group memberships of each user.

steamid : The developer of the app in question

groupid : A group ID for a group to which te user referenced by steamid belongs. Users may belong to more than one group.

dateretrieved : Timestamp of the time when this game data was requested from the API

Player\_Summaries
-----------------

This table obtained through the Web API \[ISteamUser::GetPlayerSummaries\]. It contains a profile summary for each Steam user.

steamid : The Steam ID of the user in question

lastlogoff : Timestamp of the time when this game data was requested from the API

primaryclanid : The groupid (Groups::groupid) of the group that the user has designated as their primary group

timecreated : Timestamp of the time when the account was created

gameid : If the user was in-game at the time of the API request, this value specifies which game they were running at the time

gameserverip : If the user was in-game at the time of the request, and playing a game using Steam matchmaking, this value specifies the IP of the server they were connected to. Is otherwise set to "0.0.0.0:0"

loccountrycode : ISO-3166 code for the country in which the user resides. Self-reported.

locstatecode : State where the user resides. Self-reported.

loccityid : Internal Steam ID corresponding to the city where the user resides. Self-reported.

dateretrieved : Timestamp of the time when this game data was requested from the API

## Citation

Mark O'Neill, Elham Vaziripour, Justin Wu, and Daniel Zappala. 2016. Condensing Steam: Distilling the Diversity of Gamer Behavior. In Proceedings of the 2016 Internet Measurement Conference (IMC '16). ACM, New York, NY, USA, 81-95. DOI: https://doi.org/10.1145/2987443.2987489
},
terms= {},
license= {},
superseded= {},
url= {https://steam.internet.byu.edu/}
}

</description>
<link>https://academictorrents.com/download/eba3b48fcdaa9e69a927051f1678251a86a546f3</link>
</item>
<item>
<title>Labeled Optical Coherence Tomography (OCT) (Dataset)</title>
<description>@article{,
title= {Labeled Optical Coherence Tomography (OCT)},
keywords= {},
author= {},
abstract= {Dataset of validated OCT images described and analyzed in "Deep learning-based classification and referral of treatable human diseases". The OCT Images are split into a training set and a testing set of independent patients. OCT Images are labeled as (disease)-(randomized patient ID)-(image number by this patient) and split into 4 directories: CNV, DME, DRUSEN, and NORMAL.

```
  250 files in directory ./test/CNV
  250 files in directory ./test/DME
  250 files in directory ./test/DRUSEN
  250 files in directory ./test/NORMAL
37205 files in directory ./train/CNV
11348 files in directory ./train/DME
 8616 files in directory ./train/DRUSEN
26315 files in directory ./train/NORMAL
```

https://i.imgur.com/tsAGf0V.png


## Acknowledgements
Data: https://data.mendeley.com/datasets/rscbjbr9sj/2

License: CC BY 4.0

## Citation: 
Kermany D, Goldbaum M, Cai W et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell. 2018; 172(5):1122-1131. doi:10.1016/j.cell.2018.02.010.
http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5},
terms= {},
license= {CC BY 4.0},
superseded= {},
url= {https://data.mendeley.com/datasets/rscbjbr9sj/3}
}

</description>
<link>https://academictorrents.com/download/198145c88af9a1d61ba8070f5b05c3539896ff4e</link>
</item>
<item>
<title>Chest X-Ray Images (Pediatric Pneumonia) (Dataset)</title>
<description>@article{,
title= {Chest X-Ray Images (Pediatric Pneumonia)},
keywords= {radiology},
author= {},
abstract= {The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.

https://i.imgur.com/U7dBW7X.png

## Acknowledgements
Data: https://data.mendeley.com/datasets/rscbjbr9sj/2

License: CC BY 4.0

## Citation: 
http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5},
terms= {},
license= {CC BY 4.0},
superseded= {},
url= {https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia/home}
}

</description>
<link>https://academictorrents.com/download/7208a86910cc518ae8feaa9021bf7f8565b97644</link>
</item>
<item>
<title>30M Factoid Question-Answer Corpus (30MQA) (Dataset)</title>
<description>@article{,
title= {30M Factoid Question-Answer Corpus (30MQA)},
keywords= {},
author= {Iulian Vlad Serban and Alberto García-Durán and Caglar Gulcehre and Sungjin Ahn and Sarath Chandar and Aaron Courville and Yoshua Bengio},
abstract= {The 30M Factoid Question-Answer Corpus consists of 30M natural language questions in English and their corresponding facts in the knowledge base Freebase.

The dataset is formatted as a text file, where each line contains:

```
&lt;subject&gt; \t &lt;relationship&gt; \t &lt;object&gt; \t natural language question,
```
 
where &lt;subject&gt;, &lt;relationship&gt; and &lt;object&gt; are  the subject, relationship and object identifier in Freebase corresponding to the natural language question.

For a more detailed description, have a look at our paper:

Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus
http://arxiv.org/abs/1603.06807

Sample:

```
&lt;http://rdf.freebase.com/ns/m.04whkz5&gt;www.freebase.com/book/written_work/subjects&lt;http://rdf.freebase.com/ns/m.01cj3p&gt;what is the book e about ?
&lt;http://rdf.freebase.com/ns/m.0tp2p24&gt;www.freebase.com/music/release_track/release&lt;http://rdf.freebase.com/ns/m.0sjc7c1&gt;in what release does the release track cardiac arrest come from ?
&lt;http://rdf.freebase.com/ns/m.04j0t75&gt;www.freebase.com/film/film/country&lt;http://rdf.freebase.com/ns/m.07ssc&gt;what country is the debt from ?
&lt;http://rdf.freebase.com/ns/m.0ftqr&gt;www.freebase.com/music/producer/tracks_produced&lt;http://rdf.freebase.com/ns/m.0p600l&gt;what songs have nobuo uematsu produced ?
&lt;http://rdf.freebase.com/ns/m.036p007&gt;www.freebase.com/music/release/producers&lt;http://rdf.freebase.com/ns/m.0677ng&gt;who produced eve-olution ?
&lt;http://rdf.freebase.com/ns/m.0ms5mg&gt;www.freebase.com/music/recording/artist&lt;http://rdf.freebase.com/ns/m.0mjn2&gt;which artist recorded most of us are sad ?
```
},
terms= {},
license= {Creative Commons Attribution 3.0 Unported},
superseded= {},
url= {}
}

</description>
<link>https://academictorrents.com/download/973fb709bdb9db6066213bbc5529482a190098ce</link>
</item>
<item>
<title>Indiana University - Chest X-Rays (PNG Images) (Dataset)</title>
<description>@article{,
title= {Indiana University - Chest X-Rays (PNG Images)},
keywords= {radiology, chest x-ray},
author= {OpenI},
abstract= {1000 radiology reports for the chest x-ray images from the Indiana University hospital network.

To identify images associated with the reports, use XML tag. More than one image could be associated with a report)

https://i.imgur.com/5uR5snH.png},
terms= {},
license= {Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License},
superseded= {},
url= {https://openi.nlm.nih.gov/faq.php}
}

</description>
<link>https://academictorrents.com/download/5a3a439df24931f410fac269b87b050203d9467d</link>
</item>
<item>
<title>Indiana University - Chest X-Rays (XML Reports) (Dataset)</title>
<description>@article{,
title= {Indiana University - Chest X-Rays (XML Reports)},
keywords= {chest x-ray, radiology},
author= {},
abstract= {1000 radiology reports for the chest x-ray images from the Indiana University hospital network.

To identify images associated with the reports, use XML tag &lt;parentImage id="image-id"&gt;. More than one image could be associated with a report)

https://i.imgur.com/PWo3x47.png},
terms= {},
license= {Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License},
superseded= {},
url= {https://openi.nlm.nih.gov/faq.php}
}

</description>
<link>https://academictorrents.com/download/66450ba52ba3f83fbf82ef9c91f2bde0e845aba9</link>
</item>
<item>
<title>Tom Mitchell - Machine Learning  - 2012 (Course)</title>
<description>@article{,
title= {Tom Mitchell - Machine Learning  - 2012},
keywords= {machine learning, Tom Mitchell},
journal= {},
author= {Tom Mitchell,CMU},
year= {},
url= {http://www.cs.cmu.edu/~tom/10601_fall2012/lectures.shtml},
license= {},
abstract= {http://www.cs.cmu.edu/~tom/10601_fall2012/lectures.shtml},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/35b6b8bf0c2931ba7ecd8a1a8e65fa32f3e7473f</link>
</item>
<item>
<title>The PatchCamelyon benchmark dataset (PCAM) (Dataset)</title>
<description>@article{,
title= {The PatchCamelyon benchmark dataset (PCAM)},
keywords= {},
author= {Bas Veeling},
abstract= {The PatchCamelyon benchmark is a new and challenging image classification dataset. It consists of 327.680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annoted with a binary label indicating presence of metastatic tissue. PCam provides a new benchmark for machine learning models: bigger than CIFAR10, smaller than imagenet, trainable on a single GPU.

## Why PCam
Fundamental machine learning advancements are predominantly evaluated on straight-forward natural-image classification datasets. Think MNIST, CIFAR, SVHN. Medical imaging is becoming one of the major applications of ML and we believe it deserves a spot on the list of go-to ML datasets. Both to challenge future work, and to steer developments into directions that are beneficial for this domain.

We think PCam can play a role in this. It packs the clinically-relevant task of metastasis detection into a straight-forward binary image classification task, akin to CIFAR-10 and MNIST. Models can easily be trained on a single GPU in a couple hours, and achieve competitive scores in the Camelyon16 tasks of tumor detection and WSI diagnosis. Furthermore, the balance between task-difficulty and tractability makes it a prime suspect for fundamental machine learning research on topics as active learning, model uncertainty and explainability.

https://github.com/basveeling/pcam/raw/master/pcam.jpg
},
terms= {},
license= {},
superseded= {},
url= {https://github.com/basveeling/pcam}
}

</description>
<link>https://academictorrents.com/download/1561a180b11d4b746273b5ce46772ad36f1229b6</link>
</item>
<item>
<title>University of Washington - Pedro Domingos - Machine Learning (Course)</title>
<description>@article{,
title= {University of Washington - Pedro Domingos - Machine Learning},
keywords= {Pedro Domingos, Machine Learning Course, University of Washington},
journal= {},
author= {Pedro Domingos},
year= {},
url= {https://www.youtube.com/user/UWCSE/playlists?sort=dd&amp;view=50&amp;shelf_id=16},
license= {},
abstract= {Video Lecture of Course Data Mining &amp; Machine Learning by Prof Pedro Domingos, University of Washington USA.},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/0db676a6aaff8c33f9749d5f9c0fa22bf336bc76</link>
</item>
<item>
<title>POLEN23E: image dataset for the Brazilian Savannah pollen types (Dataset)</title>
<description>@article{,
title= {POLEN23E: image dataset for the Brazilian Savannah pollen types},
keywords= {},
author= {Ariadne Barbosa Gonçalves and Junior Silva Souza and Gercina Gonçalves da Silva and Marney Pascoli Cereda and Arnildo Pott and Marco Hiroshi Naka and Hemerson Pistori },
abstract= {The classification of pollen species and types is an important task in many areas like forensic palynology, archaeological palynology and melissopalynology. This paper presents the first annotated image dataset for the Brazilian Savannah pollen types that can be used to train and test computer vision based automatic pollen classifiers. A first baseline human and computer performance for this dataset has been established using 805 pollen images of 23 pollen types. In order to access the computer performance, a combination of three feature extractors and four machine learning techniques has been implemented, fine tuned and tested.

https://i.imgur.com/P2bNuVi.png

Citation:
Gonçalves AB, Souza JS, Silva GGd, Cereda MP, Pott A, Naka MH, et al. (2016) Feature Extraction and Machine Learning for the Classification of Brazilian Savannah Pollen Grains. PLoS ONE 11(6): e0157044. https://doi.org/10.1371/journal.pone.0157044 The link for the dataset is: http://dx.doi.org/10.6084/m9.figshare.1525086.
},
terms= {},
license= {Creative Commons Attribution License},
superseded= {},
url= {https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0157044}
}

</description>
<link>https://academictorrents.com/download/ee51ec7708b35b023caba4230c871ae1fa254ab3</link>
</item>
<item>
<title>BRATS2013 Tumor-NoTumor Dataset (T-NT) (Dataset)</title>
<description>@article{,
title= {BRATS2013 Tumor-NoTumor Dataset (T-NT)},
keywords= {TNT},
author= {},
abstract= {This dataset (called T-NT) contains images which contain or do not contain a tumor along with a segmentation of brain matter and the tumor. The goal is that it can be used to simulate bias in data in a controlled fashion.

# Dataset Construction 

The synthetic data of the BRATS2013 dataset is used to construct this dataset. Each brain contains a tumor but it is typically only on one side. Only the right side is taken in order to have examples that do not have tumors. 

Each image is filtered to ensure it has enough brain in the image (more than 30% of the pixels). If the tumor takes up at least 1% of the pixels in the brain then it is considered to have a tumor. 

Here is an snippet from the code used to construct the dataset:

```
def get_labels(rightside):
    met = {}
    met['brain'] = (
        1. * (rightside != 0).sum() / (rightside == 0).sum())
    met['tumor'] = (
        1. * (rightside &gt; 2).sum() / ((rightside != 0).sum() + 1e-10))
    met['has_enough_brain'] = met['brain'] &gt; 0.30
    met['has_tumor'] = met['tumor'] &gt; 0.01
    return met
```

# File and Folder structure
The files are organized as follows:
PatientID-SlideNumber-HasTumor.png

For example:
```
HG0011-118-False.png
HG0015-65-True.png
HG0019-95-False.png
```

The segmentation images are pixel values that correspond to the following 6 classes:

```
Non Tumor classes: 0, 10, 20
Tumor classes: 40
Unknown classes: 30, 50
```

A Tumor example
https://i.imgur.com/WIKFhO1.png

A NoTumor example
https://i.imgur.com/AbkTw5L.png

The folders are divided into training or testing by patient. Then they are divided into flair, t1, and a segmentation image.
```
train (2125 images, 1421 tumor, 704 notumor)
├── flair 
├── segmentation
└── t1
holdout (1415 images, 1051 tumor, 364 notumor)
├── flair
├── segmentation
└── t1
```

Patients in training: ['HG0018' 'HG0019' 'HG0012' 'HG0013' 'HG0010' 'HG0011' 'HG0016' 'HG0017'
 'HG0014' 'HG0015' 'HG0023' 'HG0022' 'HG0021' 'LG0005' 'LG0004' 'LG0007'
 'LG0006' 'LG0001' 'LG0003' 'LG0002' 'LG0025' 'LG0024' 'LG0009' 'LG0022'
 'LG0021' 'LG0020' 'HG0009' 'HG0008' 'HG0002' 'HG0025']

Patients in test: ['HG0001' 'HG0003' 'HG0024' 'HG0005' 'HG0004' 'HG0007' 'HG0006' 'HG0020'
 'LG0023' 'LG0008' 'LG0016' 'LG0017' 'LG0014' 'LG0015' 'LG0012' 'LG0013'
 'LG0010' 'LG0011' 'LG0018' 'LG0019']


Sample Flair Images

| Tumor   |      NoTumor      | 
|:----------:|:-------------:|
| https://i.imgur.com/3305V4u.png |  https://i.imgur.com/QDVB4fo.png| 
| https://i.imgur.com/kGHfa8Q.png | https://i.imgur.com/MKA9vxK.png|










# Citation

If you use this dataset, please cite:

```
Distribution Matching Losses Can Hallucinate Features in Medical Image Translation
Joseph Paul Cohen, Margaux Luck, Sina Honari
Medical Image Computing &amp; Computer Assisted Intervention (MICCAI)
https://arxiv.org/abs/1805.08841
```

```
@article{cohen2018distribution,
author = {Cohen, Joseph Paul and Luck, Margaux and Honari, Sina},
journal = {Medical Image Computing &amp; Computer Assisted Intervention (MICCAI)},
title = {Distribution Matching Losses Can Hallucinate Features in Medical Image Translation},
year = {2018}
}
```
## License
The original files are shared with the following license so our dataset is shared with the same license. 

"Except where otherwise noted, content is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Switzerland License. http://creativecommons.org/licenses/by-nc-sa/3.0/ch/deed.en"

The following papers describe the original dataset:

Menze et al., The Multimodal Brain TumorImage Segmentation Benchmark (BRATS), IEEE Trans. Med. Imaging, 2015.Get the citation as BibTex

Kistler et. al, The virtual skeleton database: an open access repository for biomedical research and collaboration. JMIR, 2013. (BibTex)
},
terms= {},
license= {Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)},
superseded= {},
url= {https://github.com/ieee8023/dist-bias}
}

</description>
<link>https://academictorrents.com/download/d52ccc21455c7a82fd6e58964c89b7da99e0edf7</link>
</item>
<item>
<title>ImageClef - IAPR TC-12 Benchmark (Dataset)</title>
<description>@article{,
title= {ImageClef - IAPR TC-12 Benchmark},
keywords= {},
author= {ImageClef},
abstract= {The following archive contains the complete IAPR TC-12 Benchmark, which is now available free of charge and without any copyright restrictions. This is the most updated version of the IAPR TC-12 Benchmark and should be used from researchers from now on. This archive thereby comprises:

20000 images
1000 additional images previously used in object annotation tasks and/or the MUSCLE live event
all complete (full-text) annotations (English, German, Random)
all light annotations (English, German, Spanish, Random), i.e. all annotation tags except for the description tag
Note:

The image collection of the IAPR TC-12 Benchmark consists of 20,000 still natural images taken from locations around the world and comprising an assorted cross-section of still natural images. This includes pictures of different sports and actions, photographs of people, animals, cities, landscapes and many other aspects of contemporary life. Example images can be found in Section 2.

Each image is associated with a text caption in up to three different languages (English, German and Spanish) . These annotations are stored in a database which is managed by a benchmark administration system that allows the specification of parameters according to which different subsets of the image collection can be generated. Section 3 provides more information and an annotation example.

The IAPR TC-12 Benchmark is now available free of charge and without copyright restrictions. Information on how to access (and download) the complete benchmark as well as the resources used at ImageCLEFphoto 2006 - 2008 is given in Sections 4 and 5, while Section 6 provides links to related publications.

2 Collection Content
The 20,000 images are high quality, multi-object, colour photographs that have been chosen according to strict image selection rules (see [2] for more details). Here are a couple of example images of some chosen categories:

In publications based on the IAPR TC-12 Benchmark and/or the use of its data or a subset thereof, please cite the following publication:

The IAPR Benchmark: A New Evaluation Resource for Visual Information Systems, Grubinger, Michael, Clough Paul D., Müller Henning, and Deselaers Thomas , International Conference on Language Resources and Evaluation, 24/05/2006, Genoa, Italy, (2006)

Additional information on this data is available from the PhD thesis of Michael Grubinger:

Michael Grubinger. Analysis and Evaluation of Visual Information Systems Performance. PhD Thesis. School of Computer Science and Mathematics, Faculty of Health, Engineering and Science, Victoria University, Melbourne, Australia, 2007.

The thesis is available here:

http://nla.gov.au/anbd.bib-an43036734
http://wallaby.vu.edu.au/adt-VVUT/public/adt-VVUT20080408.130459/index.html

Data can also be downloaded here: http://www-i6.informatik.rwth-aachen.de/imageclef/resources/iaprtc12.tgz


![](https://i.imgur.com/bWnsHUj.png)},
terms= {},
license= {},
superseded= {},
url= {https://www.imageclef.org/photodata}
}

</description>
<link>https://academictorrents.com/download/cf870b196222cf961a01c13999be9e4b7760cef1</link>
</item>
<item>
<title>comma.ai driving dataset (Dataset)</title>
<description>@article{,
title= {comma.ai driving dataset},
keywords= {},
journal= {},
author= {Comma AI},
year= {},
url= {https://github.com/commaai/research},
license= {Attribution-Noncommercial-Share Alike 3.0},
abstract= {This dataset contains more than seven hours of highway driving for you to use in your projects.

Details included within the dataset are:

- The speed of the car
- The acceleration
- The steering angle
- GPS coordinates

You won’t need to register on the site to download the dataset, it can be downloaded with a single click. For your convenience, we’ve included the direct download links below so you can instantly download and use them!

![](https://i.imgur.com/X6LA8Qm.gif)

45 GB compressed, 80 GB uncompressed

```
dog/2016-01-30--11-24-51 (7.7G)
dog/2016-01-30--13-46-00 (8.5G)
dog/2016-01-31--19-19-25 (3.0G)
dog/2016-02-02--10-16-58 (8.1G)
dog/2016-02-08--14-56-28 (3.9G)
dog/2016-02-11--21-32-47 (13G)
dog/2016-03-29--10-50-20 (12G)
emily/2016-04-21--14-48-08 (4.4G)
emily/2016-05-12--22-20-00 (7.5G)
frodo/2016-06-02--21-39-29 (6.5G)
frodo/2016-06-08--11-46-01 (2.7G)
```

Dataset referenced on this page is copyrighted by comma.ai and published under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same license.

## Dataset structure
The dataset consists of 10 videos clips of variable size recorded at 20 Hz
with a camera mounted on the windshield of an Acura ILX 2016. In parallel to the videos
we also recorded some measurements such as car's speed, acceleration,
steering angle, GPS coordinates, gyroscope angles. See the full `log` list [here](Logs.md).
These measurements are transformed into a uniform 100 Hz time base.

The dataset folder structure is the following:
```bash
+-- dataset
|   +-- camera
|   |   +-- 2016-04-21--14-48-08
|   |   ...
|   +-- log
|   |   +-- 2016-04-21--14-48-08
|   |   ...
```

All the files come in hdf5 format and are named with the time they were recorded.
The camera dataset has shape `number_frames x 3 x 160 x 320` and `uint8` type.
One of the `log` hdf5-datasets is called `cam1_ptr` and addresses the alignment
between camera frames and the other measurements.},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/58c41e8bcc8eb4e2204a3b263cdf728c0a7331eb</link>
</item>
<item>
<title>Non-contrast head/brain CT CQ500 Dataset (Dataset)</title>
<description>@article{,
title= {Non-contrast head/brain CT CQ500 Dataset},
keywords= {},
author= {Qure.ai},
abstract= {CQ500 dataset of 491 Computed tomography scans with 193,317 slices

Anonymized dicoms for all the scans and the corresponding radiologists' reads.

![](https://i.imgur.com/wor2XEA.png)

Paper: https://arxiv.org/abs/1803.05854},
terms= {},
license= {Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License},
superseded= {},
url= {https://web.archive.org/web/20220816011051/http://headctstudy.qure.ai/}
}

</description>
<link>https://academictorrents.com/download/47e9d8aab761e75fd0a81982fa62bddf3a173831</link>
</item>
<item>
<title>CS231n: Convolutional Neural Networks Spring 2017 (Course)</title>
<description>@article{,
title= {CS231n: Convolutional Neural Networks Spring 2017},
keywords= {},
author= {Stanford},
abstract= {Stanford course on Convolutional Neural Networks for Visual Recognition

# Course Description

Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment will involve training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet). We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.

https://i.imgur.com/ps0x3Wo.png },
terms= {},
license= {},
superseded= {},
url= {},
year= {2017}
}

</description>
<link>https://academictorrents.com/download/ed8a16ebb346e14119a03371665306609e485f13</link>
</item>
<item>
<title>Medical Segmentation Decathlon Datasets (Dataset)</title>
<description>@article{,
title= {Medical Segmentation Decathlon Datasets},
keywords= {},
author= {},
abstract= {https://i.imgur.com/QqgA5n4.jpg

With recent advances in machine learning, semantic segmentation algorithms are becoming increasingly general purpose and translatable to unseen tasks. Many key algorithmic advances in the field of medical imaging are commonly validated on a small number of tasks, limiting our understanding of the generalisability of the proposed contributions. A model which works out-of-the-box on many tasks, in the spirit of AutoML, would have a tremendous impact on healthcare. The field of medical imaging is also missing a fully open source and comprehensive benchmark for general purpose algorithmic validation and testing covering a large span of challenges, such as: small data, unbalanced labels, large-ranging object scales, multi-class labels, and multimodal imaging, etc. This challenge and dataset aims to provide such resource thorugh the open sourcing of large medical imaging datasets on several highly different tasks, and by standardising the analysis and validation process. 

```
4.6M    ./Task06_Lung/labelsTr
5.7G    ./Task06_Lung/imagesTr
2.9G    ./Task06_Lung/imagesTs
8.6G    ./Task06_Lung
240K    ./Task05_Prostate/labelsTr
150M    ./Task05_Prostate/imagesTr
79M     ./Task05_Prostate/imagesTs
229M    ./Task05_Prostate
15M     ./Task01_BrainTumour/labelsTr
4.5G    ./Task01_BrainTumour/imagesTr
2.7G    ./Task01_BrainTumour/imagesTs
7.1G    ./Task01_BrainTumour
8.6M    ./Task07_Pancreas/labelsTr
7.6G    ./Task07_Pancreas/imagesTr
3.9G    ./Task07_Pancreas/imagesTs
12G     ./Task07_Pancreas
388K    ./Task02_Heart/labelsTr
249M    ./Task02_Heart/imagesTr
186M    ./Task02_Heart/imagesTs
435M    ./Task02_Heart
8.7M    ./Task08_HepaticVessel/labelsTr
5.8G    ./Task08_HepaticVessel/imagesTr
3.0G    ./Task08_HepaticVessel/imagesTs
8.8G    ./Task08_HepaticVessel
1.3M    ./Task09_Spleen/labelsTr
1.1G    ./Task09_Spleen/imagesTr
461M    ./Task09_Spleen/imagesTs
1.5G    ./Task09_Spleen
14M     ./Task10_Colon/labelsTr
4.0G    ./Task10_Colon/imagesTr
1.9G    ./Task10_Colon/imagesTs
5.9G    ./Task10_Colon
30M     ./Task03_Liver/labelsTr
19G     ./Task03_Liver/imagesTr
8.6G    ./Task03_Liver/imagesTs
27G     ./Task03_Liver
1.1M    ./Task04_Hippocampus/labelsTr
19M     ./Task04_Hippocampus/imagesTr
8.8M    ./Task04_Hippocampus/imagesTs
29M     ./Task04_Hippocampus
71G     .
```

Competition site: https://decathlon-10.grand-challenge.org/},
terms= {},
license= {CC-BY-SA 4.0},
superseded= {},
url= {http://medicaldecathlon.com/}
}

</description>
<link>https://academictorrents.com/download/274be65156ed14828fb7b30b82407a2417e1924a</link>
</item>
<item>
<title>MoNuSeg Training Data - Multi-organ nuclei segmentation from H&amp;E stained histopathological images (Dataset)</title>
<description>@article{,
title= {MoNuSeg Training Data - Multi-organ nuclei segmentation from H&amp;E stained histopathological images},
keywords= {},
author= {},
abstract= {Nuclear segmentation in digital microscopic tissue images can enable extraction of high-quality features for nuclear morphometrics and other analysis in computational pathology.  Techniques that accurately segment nuclei in diverse images spanning a range of patients, organs, and disease states, can significantly contribute to the development of clinical and medical research software. Once accurately segmented, nuclear morphometric and appearance features such as density, nucleus-to-cytoplasm ratio, average size, and pleomorphism can be used to assess not only cancer grades but also for predicting treatment effectiveness. Identifying different types of nuclei based on their segmentation can also yield information about gland shapes, which, for example, is important for cancer grading.

This challenge will showcase the best nuclei segmentation techniques that will work on a diverse set of H&amp;E stained histology images obtained from different hospitals spanning multiple patients and organs. This will enable training and testing of readily usable  (or generalized) nuclear segmentation softwares.

The dataset for this challenge was obtained by carefully annotating tissue images of several patients with tumors of different organs and who were diagnosed at multiple hospitals. This dataset was created by downloading H&amp;E stained tissue images captured at 40x magnification from TCGA archive. H&amp;E staining is a routine protocol to enhance the contrast of a tissue section and is commonly used for tumor assessment (grading, staging, etc.). Given the diversity of nuclei appearances across multiple organs and patients, and the richness of staining protocols adopted at multiple hospitals, the training datatset will enable the development of robust and generalizable nuclei segmentation techniques that will work right out of the box.


![](https://i.imgur.com/2p2GMWt.png)



#### Citation Request

N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane and A. Sethi, "A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology," in IEEE Transactions on Medical Imaging, vol. 36, no. 7, pp. 1550-1560, July 2017},
terms= {},
license= {Attribution 3.0 Unported (CC BY 3.0)},
superseded= {},
url= {https://monuseg.grand-challenge.org/}
}

</description>
<link>https://academictorrents.com/download/c87688437fb416f66eecbd8c419aba00dd12997f</link>
</item>
<item>
<title>Holistic Recognition of Low Quality License Plates (HDR dataset) (Dataset)</title>
<description>@article{,
title= {Holistic Recognition of Low Quality License Plates (HDR dataset)},
keywords= {},
author= {},
abstract= {This dataset focuses on recognition of license plates in low resolution and low quality images.

![](https://i.imgur.com/4y2lGaX.png)

### Citation Request

J. Špaňhel, J. Sochor, R. Juránek, A. Herout, L. Maršík and P. Zemčík, "Holistic recognition of low quality license plates by CNN using track annotated data," 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, 2017, pp. 1-6.
doi: 10.1109/AVSS.2017.8078501},
terms= {},
license= {Attribution-NonCommercial-ShareAlike 4.0 International},
superseded= {},
url= {https://medusa.fit.vutbr.cz/traffic/research-topics/general-traffic-analysis/holistic-recognition-of-low-quality-license-plates-by-cnn-using-track-annotated-data-iwt4s-avss-2017/}
}

</description>
<link>https://academictorrents.com/download/8ed33d02d6b36c389dd077ea2478cc83ad117ef3</link>
</item>
<item>
<title>VizWiz v1.0 dataset (Answering Visual Questions from Blind People) (Dataset)</title>
<description>@article{,
title= {VizWiz v1.0 dataset (Answering Visual Questions from Blind People)},
keywords= {},
author= {},
abstract= {We propose an artificial intelligence challenge to design algorithms that assist people who are blind to overcome their daily visual challenges. For this purpose, we introduce the VizWiz dataset, which originates from a natural visual question answering setting where blind people each took an image and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. Our proposed challenge addresses the following two tasks for this dataset: (1) predict the answer to a visual question and (2) predict whether a visual question cannot be answered. Ultimately, we hope this work will educate more people about the technological needs of blind people while providing an exciting new opportunity for researchers to develop assistive technologies that eliminate accessibility barriers for blind people.

```
VizWiz v1.0 dataset download:

20,000 training image/question pairs
200,000 training answer/answer confidence pairs
3,173 image/question pairs
31,730 validation answer/answer confidence pairs
8,000 image/question pairs
Python API to read and visualize the VizWiz dataset
Python challenge evaluation code
```

![](https://i.imgur.com/zXB6Qci.png)


### Publications

Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, and Jeffrey P. Bigham. "VizWiz Grand Challenge: Answering Visual Questions from Blind People." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samuel White, and Tom Yeh. "VizWiz: Nearly Real-time Answers to Visual Questions." ACM User Interface Software and Technology Symposium (UIST), 2010.},
terms= {},
license= {Creative Commons Attribution-ShareAlike 4.0 International License},
superseded= {},
url= {http://vizwiz.org/}
}

</description>
<link>https://academictorrents.com/download/b633e14aa084fab57f20ad0b4612e0932ae1f2dc</link>
</item>
<item>
<title>LiTS – Liver Tumor Segmentation Challenge (LiTS17) (Dataset)</title>
<description>@article{,
title= {LiTS – Liver Tumor Segmentation Challenge (LiTS17)},
keywords= {},
author= {Patrick Christ},
abstract= {The liver is a common site of primary (i.e. originating in the liver like hepatocellular carcinoma, HCC) or secondary (i.e. spreading to the liver like colorectal cancer) tumor development. Due to their heterogeneous and diffusive shape, automatic segmentation of tumor lesions is very challenging. Until now, only interactive methods achieved acceptable results segmenting liver lesions.

With our challenge we encourage researchers to develop automatic segmentation algorithms to segment liver lesions in contrast­-enhanced abdominal CT scans. The data and segmentations are provided by various clinical sites around the world. The training data set contains 130 CT scans and the test data set 70 CT scans. The challenge is organised in conjunction with ISBI 2017 and MICCAI 2017. For MICCAI 2017 we added tasks for liver segmentation and tumor burden estimation.

![](https://i.imgur.com/ia2qGlH.png)

![](https://i.imgur.com/eDN20ck.png)

Paper reference: https://arxiv.org/abs/1901.04056


},
terms= {},
license= {https://creativecommons.org/licenses/by-nc-nd/4.0/},
superseded= {},
url= {https://competitions.codalab.org/competitions/17094}
}

</description>
<link>https://academictorrents.com/download/27772adef6f563a1ecc0ae19a528b956e6c803ce</link>
</item>
<item>
<title>North America roads GIS data (Dataset)</title>
<description>@article{,
title= {North America roads GIS data},
keywords= {Roads, Speed Limit, PBF, OSM, Geofabrik, GIS, North America},
journal= {},
author= {Geofabrik, OSM},
year= {},
url= {https://download.geofabrik.de/north-america.html},
license= {},
abstract= {},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/0a853fdcc1d28c306d75e29195a5536087f6e2b4</link>
</item>
<item>
<title>Corpus of Russian news articles collected from Lenta.Ru (Dataset)</title>
<description>@article{,
title= {Corpus of Russian news articles collected from Lenta.Ru},
keywords= {dataset, russian, corpus, lenta, lentaru, news, nlp, w2v},
journal= {},
author= {Dmitry Yutkin},
year= {},
url= {https://github.com/yutkin/Lenta.Ru-News-Dataset},
license= {},
abstract= {This dataset contains 699.746 news articles from popular Russian media Lenta.Ru.},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/cfc4ba252fe56176d9db31b0609f0ece6a389b09</link>
</item>
<item>
<title>LUng Nodule Analysis (LUNA16) All Images (Dataset)</title>
<description>@article{,
title= {LUng Nodule Analysis (LUNA16) All Images},
keywords= {radiology},
author= {Consortium for Open Medical Image Computing},
abstract= {| ![](https://i.imgur.com/8Oolu7D.png)      | ![](https://i.imgur.com/5WsoKqU.png)   |
|-- |-  |

Lung cancer is the leading cause of cancer-related death worldwide. Screening high risk individuals for lung cancer with low-dose CT scans is now being implemented in the United States and other countries are expected to follow soon. In CT lung cancer screening, many millions of CT scans will have to be analyzed, which is an enormous burden for radiologists. Therefore there is a lot of interest to develop computer algorithms to optimize screening. 

A vital first step in the analysis of lung cancer screening CT scans is the detection of pulmonary nodules, which may or may not represent early stage lung cancer. Many Computer-Aided Detection (CAD) systems have already been proposed for this task. The LUNA16 challenge will focus on a large-scale evaluation of automatic nodule detection algorithms on the LIDC/IDRI data set.

The LIDC/IDRI data set is publicly available, including the annotations of nodules by four radiologists. The LUNA16 challenge is therefore a completely open challenge. We have tracks for complete systems for nodule detection, and for systems that use a list of locations of possible nodules. We provide this list to also allow teams to participate with an algorithm that only determines the likelihood for a given location in a CT scan to contain a pulmonary nodule.

### Motivation

Lung cancer is the leading cause of cancer-related death worldwide. The National Lung Screening Trial (NLST), a randomized control trial in the U.S. including more than 50,000 high-risk subjects, showed that lung cancer screening using annual low-dose computed tomography (CT) reduces lung cancer mortality by 20% in comparison to annual screening with chest radiography [1]. In 2013, the U.S. Preventive Services Task Force (USPSTF) has given low-dose CT screening a grade B recommendation for high-risk individuals [2] and early 2015, the U.S. Centers for Medicare and Medicaid Services (CMS) has approved CT lung cancer screening for Medicare recipients. As a result of these developments, lung cancer screening programs using low-dose CT are being implemented in the United States and other countries. Computer-aided detection (CAD) of pulmonary nodules could play an important role when screening is implemented on a large scale.

Large evaluation studies investigating the performance of different state-of-the-art CAD systems are scarce. Therefore, we organize a novel CAD detection challenge using a large public LIDC-IDRI dataset. The detailed description of the challenge is now available in this article. We believe that this challenge is important for a reliable comparison of CAD algorithms and to encourage rapid development of new algorithms using state-of-the-art computer vision technology.

### Challenge tracks

We invite the research community to participate in one or two of the following challenge tracks:

1. Nodule detection (NDET)
Using raw CT scans, the goal is to identify locations of possible nodules, and to assign a probability for being a nodule to each location. The pipeline typically consists of two stages: candidate detection and false positive reduction.

2. False positive reduction (FPRED)
Given a set of candidate locations, the goal is to assign a probability for being a nodule to each candidate location. Hence, one could see this as a classification task: nodule or not a nodule. Candidate locations will be provided in world coordinates. This set detects 1,162/1,186 nodules.

### Open challenge

LUNA16 is a completely open challenge. This means that unlike other challenges, images and reference standard are publicly available. The goal of LUNA16 is to provide an opportunity for participants to test their algorithm on common database with a standardized evaluation protocol. With the spirit of speeding-up scientic progress, the results listed on the website can be used as an indication on how well state-of-the-art CAD algorithms perform. We hope LUNA16 will yield several results that are worthwhile for the CAD research community.

We are committed to maintain this site as a public repository of benchmark results for nodule detection on a common database in the spirit of cooperative scientific progress. In return, we ask everyone who uses this site to respect the rules below.

### Rules

The following rules apply to those who register a team and download the data:

The downloaded data sets or any data derived from these data sets, may not be given or redistributed under any circumstances to persons not belonging to the registered team.

All information entered when registering a team, including the name of the contact person, the affiliation (institute, organization or company the team's contact person works for) and the e-mail address must be complete and correct. In other words, anonymous registration is not allowed. If you want to submit anonymous, for example because you want to submit your results to a conference that requires anonymous submission, please contact the organizers.

The LUNA16 organizers reserve the right to request a pdf file describing the system to accompany the submitted result. The organizers may refuse to evaluate systems whose description does not meet minimal requirements.
Results uploaded to this website will be made publicly available on this site (see the Results Section), and by submitting results, you grant us permission to do so. Obviously, teams maintain full ownership and rights to the method.
Teams must notify the maintainers of this site about any publication that is (partly) based on the data on this site, in order for us to maintain a list of publications associated with the LUNA16 study.

### References

[1] Aberle D. R., Adams A. M., Berg C. D., Black W. C., Clapp J. D., Fagerstrom R. M., Gareen I. F., Gatsonis C., Marcus P. M., and Sicks J. D. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med, 365:395–409, 2011.
 
[2] Moyer VA, U.S. Preventive Services Task Force. Screening for lung cancer: U.S. Preventive Services Task Force recommendation statement. Ann Intern Med, 160:330-338 2014.
 
[3] Armato SG, McLennan G, Bidaut L, McNitt-Gray MF, Meyer CR, Reeves AP et al. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys, 38:915–931, 2011.



### Organizers

Colin Jacobs (Radboud University Medical Center, Nijmegen, The Netherlands)

Arnaud Arindra Adiyoso Setio (Radboud University Medical Center, Nijmegen, The Netherlands)

Alberto Traverso (Polytechnic University of Turin and Turin Section of INFN, Turin, Italy)

Bram van Ginneken (Radboud University Medical Center, Nijmegen, The Netherlands)



![](https://i.imgur.com/8Oolu7D.png)

![](https://i.imgur.com/5WsoKqU.png)


},
terms= {},
license= {},
superseded= {},
url= {}
}

</description>
<link>https://academictorrents.com/download/58b053204337ca75f7c2e699082baeb57aa08578</link>
</item>
<item>
<title>A collection of sport activity datasets with an emphasis on powermeter data (Dataset)</title>
<description>@article{,
title= {A collection of sport activity datasets with an emphasis on powermeter data},
keywords= {sport, dataset, triathlon, cycling},
journal= {Technical report, 2017},
author= {Iztok Fister Jr. and Samo Rauter and Dusan Fister and Iztok Fister},
year= {2017},
url= {},
license= {},
abstract= {},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/bf76b193960a96a683f9c2afde70acab9d3d757d</link>
</item>
<item>
<title>Malignant lymphoma classification (Dataset)</title>
<description>@article{,
title= {Malignant lymphoma classification},
keywords= {},
author= {Elaine Jaffe (National Cancer Institute) and Nikita Orlov (National Institute on Aging)},
abstract= {Malignant lymphoma is a cancer affecting lymph nodes. Three types of malignant lymphoma are represented in the set: CLL (chronic lymphocytic leukemia), FL (follicular lymphoma), and MCL (mantle cell lymphoma).
The ability to distinguish classes of lymphoma from biopsies sectioned and stained with Hematoxylin/Eosin (H+E) would allow for more consistent and less demanding diagnosis of this disease. Only the most expert pathologists specializing in these types of lymphomas are able to consistently and accurately classify these three lymphoma types from H+E-stained biopsies. The standard practice is to use class-specific probes in order to distinguish these classes reliably.

The dataset presented is a collection of samples prepared by different pathologists at different sites. There is a large degree of staining variation that one would normally expect from such samples. 

A randomly selected image from each class: 


![](https://i.imgur.com/qoo1AAM.png)},
terms= {},
license= {},
superseded= {},
url= {https://ome.grc.nia.nih.gov/iicbu2008/lymphoma/index.html}
}

</description>
<link>https://academictorrents.com/download/3cde17e7e4d9886513630c1005ba20b8d37c333a</link>
</item>
<item>
<title>Breast Cancer Cell Segmentation (Dataset)</title>
<description>@article{,
title= {Breast Cancer Cell Segmentation},
keywords= {},
author= {Elisa Drelie Gelasca and Jiyun Byun and Boguslaw Obara and B.S. Manjunath},
abstract= {There are about 58 H&amp;E stained histopathology images used in breast cancer cell detection with associated ground truth data available. Routine histology uses the stain combination of hematoxylin and eosin, commonly referred to as H&amp;E. These images are stained since most cells are essentially transparent, with little or no intrinsic pigment. Certain special stains, which bind selectively to particular components, are be used to identify biological structures such as cells. In those images, the challenging problem is cell segmentation for subsequent classification in benign and malignant cells. The ground truth have been obtained for one image containing benign cells.


| Image: |Ground Truth: |
|---|---|
| ![](https://i.imgur.com/haa5X8O.png) | ![](https://i.imgur.com/gqBikTa.png) |






All images:

![](https://i.imgur.com/QM22bG2.png)},
terms= {},
license= {},
superseded= {},
url= {http://bioimage.ucsb.edu/research/bio-segmentation}
}

</description>
<link>https://academictorrents.com/download/b79869ca12787166de88311ca1f28e3ebec12dec</link>
</item>
<item>
<title>Introduction to Computer Science [CS50x] [Harvard] [2018] (Course)</title>
<description>@article{,
title= {Introduction to Computer Science [CS50x] [Harvard] [2018]},
keywords= {course, computer, cs50x, Science, introduction, harvard, yale},
journal= {},
author= {},
year= {},
url= {},
license= {},
abstract= {"Demanding, but definitely doable. Social, but educational. A focused topic, but broadly applicable skills. CS50 is the quintessential Harvard (and Yale!) course.

Hello, world! This is CS50 (aka CS50x through edX), Harvard University's introduction to the intellectual enterprises of computer science and the art of programming. 

Introduction to the intellectual enterprises of computer science and the art of programming. This course teaches students how to think algorithmically and solve problems efficiently. Topics include abstraction, algorithms, data structures, encapsulation, resource management, security, software engineering, and web development. Languages include C, Python, SQL, and JavaScript plus CSS and HTML. Problem sets inspired by real-world domains of biology, cryptography, finance, forensics, and gaming. Designed for majors and non-majors alike, with or without prior programming experience.

},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/52da574b6412862e199abeaea63e51bf8cea2140</link>
</item>
<item>
<title>GANGogh training data set (Dataset)</title>
<description>@article{,
title= {GANGogh training data set},
keywords= {GANGogh training data, Generative Adversarial Networks (GANS), Machine Learning, Art},
journal= {},
author= {},
year= {},
url= {https://github.com/rkjones4/GANGogh},
license= {},
abstract= {This is a training data set that can be used for the GANGogh machine learning model.

Once downloaded, modify the styles variable in tflib/wikiartGenre.py as follows:
styles = {'abstract': 14999,
          'animal-painting': 1798,
          'cityscape': 6598,
          'figurative': 4500,
          'flower-painting': 1800,
          'genre-painting': 14997,
          'landscape': 15000,
          'marina': 1800,
          'mythological-painting': 2099,
          'nude-painting-nu': 3000,
          'portrait': 14999,
          'religious-painting': 8400,
          'still-life': 2996,
          'symbolic-painting': 2999}},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/1d154cde2fab9ec8039becd03d9bb877614d351b</link>
</item>
<item>
<title>Electron Microscopy (CA1 hippocampus) Dataset (Dataset)</title>
<description>@article{,
title= {Electron Microscopy (CA1 hippocampus) Dataset},
keywords= {},
author= {},
abstract= {The dataset available for download on this webpage represents a 5x5x5µm section taken from the CA1 hippocampus region of the brain, corresponding to a 1065x2048x1536 volume. The resolution of each voxel is approximately 5x5x5nm. The data is provided as multipage TIF files that can be loaded in Fiji.

![](https://i.imgur.com/rTCKgHn.png)

![](https://i.imgur.com/DkDkaMH.gif)

We annotated mitochondria in two sub-volumes. Each sub-volume consists of the first 165 slices of the 1065x2048x1536 image stack. The volume used for training our algorithm in the publications mentionned at the bottom of this page is the top part while the bottom part was used for testing.

Although our line of research was primarily motivated by the need to accurately segment mitochondria and synapses, other structures are of interest for neuroscientists such as vesicles or cell boundaries. This dataset was acquired by Graham Knott and Marco Cantoni at EPFL. It is made publicly available in the hope of encouraging similar sharing of useful data amongst researchers and also accelerating neuroscientific research.

For further information, please visit http://cvlab.epfl.ch/research/medical/em/mitochondria.

```
total 3.7G
124M testing_groundtruth.tif
124M testing.tif
124M training_groundtruth.tif
124M training.tif
3.2G volumedata.tif
```

### References

A. Lucchi Y. Li and P. Fua, Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets, Conference on Computer Vision and Pattern Recognition, 2013.
 
A. Lucchi, K.Smith, R. Achanta, G. Knott, P. Fua, Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features, IEEE Transactions on Medical Imaging, Vol. 30, Nr. 11, October 2011.
},
terms= {},
license= {},
superseded= {},
url= {https://cvlab.epfl.ch/data/em}
}

</description>
<link>https://academictorrents.com/download/3ada3ae6ec71097e63d897cf878051bba3eaba25</link>
</item>
<item>
<title>Animals with Attributes 2 (AwA2) dataset (Dataset)</title>
<description>@article{,
title= {Animals with Attributes 2 (AwA2) dataset},
keywords= {},
author= {},
abstract= {This dataset provides a platform to benchmark transfer-learning algorithms, in particular attribute base classification and zero-shot learning [1]. It can act as a drop-in replacement to the original Animals with Attributes (AwA) dataset [2,3], as it has the same class structure and almost the same characteristics. 

It consists of 37322 images of 50 animals classes with pre-extracted feature representations for each image. The classes are aligned with Osherson's classical class/attribute matrix [3,4], thereby providing 85 numeric attribute values for each class. Using the shared attributes, it is possible to transfer information between different classes. 
The image data was collected from public sources, such as Flickr, in 2016. In the process we made sure to only include images that are licensed for free use and redistribution, please see the archive for the individual license files.

![](https://cvml.ist.ac.at/AwA2/awa2_banner.jpg)


### Publications

Please cite the following paper when using the dataset:

[1] Y. Xian, C. H. Lampert, B. Schiele, Z. Akata. "Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly" arXiv:1707.00600 [cs.CV]
Attribute based classification and the original Animals with Attributes (AwA) data is described in:

[2] C. H. Lampert, H. Nickisch, and S. Harmeling. "Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer". In CVPR, 2009 (pdf)

[3] C. H. Lampert, H. Nickisch, and S. Harmeling. "Attribute-Based Classification for Zero-Shot Visual Object Categorization". IEEE T-PAMI, 2013 (pdf)
The class/attribute matrix was originally created by:

[4] D. N. Osherson, J. Stern, O. Wilkie, M. Stob, and E. E. Smith. "Default probability". Cognitive Science, 15(2), 1991.

[5] C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. "Learning systems of concepts with an infinite relational model". In AAAI, 2006.},
terms= {},
license= {},
superseded= {},
url= {https://cvml.ist.ac.at/AwA2/}
}

</description>
<link>https://academictorrents.com/download/1490aec815141cdb50a32b81ef78b1eaf6b38b03</link>
</item>
<item>
<title>UrbanMapper 3D (Digital Surface Model and Digital Terrain Model) Dataset (Dataset)</title>
<description>@article{,
title= {UrbanMapper 3D (Digital Surface Model and Digital Terrain Model) Dataset},
keywords= {},
author= {USSOCOM},
abstract= {Competitors will receive an orthorectified color image, Digital Surface Model (DSM), and Digital Terrain Model (DTM) for each geographic area of interest (AOI). The DSM indicates the height of the earth, with objects such as buildings and trees included. The DTM indicates only the height of the ground. Both should be expected to include some errors, and errors may be expected to be similar in the provisional and sequestered data sets. The difference in the DSM and DTM indicates height of objects above ground. All input files provided are raster GeoTIFF images. Ground truth building labels will also be provided for a subset of the data to be used for training

![](https://i.imgur.com/fnAqq30.png)

![](https://i.imgur.com/vMOXxGr.png)},
terms= {},
license= {},
superseded= {},
url= {https://www.topcoder.com/challenges/db36b53a-c2f3-4899-9698-13e96148ffcd}
}

</description>
<link>https://academictorrents.com/download/4ccd3743861d827ac80f0d2b234d7fcfdad2a31d</link>
</item>
<item>
<title>UC Merced Land Use Dataset (Dataset)</title>
<description>@article{,
title= {UC Merced Land Use Dataset},
keywords= {},
author= {Yi Yang and Shawn Newsam},
abstract= {This is a 21 class land use image dataset meant for research purposes.

There are 100 images for each of the following classes:

```
agricultural
airplane
baseballdiamond
beach
buildings
chaparral
denseresidential
forest
freeway
golfcourse
harbor
intersection
mediumresidential
mobilehomepark
overpass
parkinglot
river
runway
sparseresidential
storagetanks
tenniscourt
```

Each image measures 256x256 pixels.

![](https://i.imgur.com/dT8q6Qi.png)


The images were manually extracted from large images from the USGS National Map Urban Area Imagery collection for various urban areas around the country. The pixel resolution of this public domain imagery is 1 foot.

Please cite the following paper when publishing results that use this dataset:

Yi Yang and Shawn Newsam, "Bag-Of-Visual-Words and Spatial Extensions for Land-Use Classification," ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS), 2010.

Shawn D. Newsam
Assistant Professor and Founding Faculty
Electrical Engineering &amp; Computer Science
University of California, Merced
Email: snewsam@ucmerced.edu
Web: http://faculty.ucmerced.edu/snewsam
This material is based upon work supported by the National Science Foundation under Grant No. 0917069.},
terms= {},
license= {},
superseded= {},
url= {http://vision.ucmerced.edu/datasets/landuse.html}
}

</description>
<link>https://academictorrents.com/download/e9ac5edf285a43309e57e1289e8816a4e78a937c</link>
</item>
<item>
<title>NIH Chest X-ray Dataset of 14 Common Thorax Disease Categories (Dataset)</title>
<description>@article{,
title= {NIH Chest X-ray Dataset of 14 Common Thorax Disease Categories},
journal= {},
author= {National Institutes of Health - Clinical Center},
year= {},
url= {https://www.nih.gov/news-events/news-releases/nih-clinical-center-provides-one-largest-publicly-available-chest-x-ray-datasets-scientific-community},
abstract= {![](https://i.imgur.com/1InHgLs.png)

(1, Atelectasis; 2, Cardiomegaly; 3, Effusion; 4, Infiltration; 5, Mass; 6, Nodule; 7, Pneumonia; 8, Pneumothorax; 9, Consolidation; 10, Edema; 11, Emphysema; 12, Fibrosis; 13, Pleural_Thickening; 14 Hernia) 

### Background &amp; Motivation: 
Chest X-ray exam is one of the most frequent and cost-effective medical imaging examination. However clinical diagnosis of chest X-ray can be challenging, and sometimes believed to be harder than diagnosis via chest CT imaging. Even some promising work have been reported in the past, and especially in recent deep learning work on Tuberculosis (TB) classification. To achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites on all data settings of chest X-rays is still very difficult, if not impossible when only several thousands of images are employed for study. This is evident from [2] where the performance deep neural networks for thorax disease recognition is severely limited by the availability of only 4143 frontal view images [3] (Openi is the previous largest publicly available chest X-ray dataset to date).

In this database, we provide an enhanced version (with 6 more disease categories and more images as well) of the dataset used in the recent work [1] which is approximately 27 times of the number of frontal chest x-ray images in [3]. Our dataset is extracted from the clinical PACS database at National Institutes of Health Clinical Center and consists of ~60% of all frontal chest x-rays in the hospital. Therefore we expect this dataset is significantly more representative to the real patient population distributions and realistic clinical diagnosis challenges, than any previous chest x-ray datasets. Of course, the size of our dataset, in terms of the total numbers of images and thorax disease frequencies, would better facilitate deep neural network training [2]. Refer to [1] on the details of how the dataset is extracted and image labels are mined through natural language processing (NLP).

### Details:
ChestX-ray dataset comprises 112,120 frontal-view X-ray images of 30,805 unique patients with the text-mined fourteen disease image labels (where each image can have multi-labels), mined from the associated radiological reports using natural language processing. Fourteen common thoracic pathologies include Atelectasis, Consolidation, Infiltration, Pneumothorax, Edema, Emphysema, Fibrosis, Effusion, Pneumonia, Pleural_thickening, Cardiomegaly, Nodule, Mass and Hernia, which is an extension of the 8 common disease patterns listed in our CVPR2017 paper. Note that original radiology reports (associated with these chest x-ray studies) are not meant to be publicly shared for many reasons. The text-mined disease labels are expected to have accuracy &gt;90%.Please find more details and benchmark performance of trained models based on 14 disease labels in our arxiv paper: https://arxiv.org/abs/1705.02315

### Contents:
1. 112,120 frontal-view chest X-ray PNG images in 1024*1024 resolution (under images folder)
2. Meta data for all images (Data_Entry_2017.csv): Image Index, Finding Labels, Follow-up #, Patient ID, Patient Age, Patient Gender, View Position, Original Image Size and Original Image Pixel Spacing.
3. Bounding boxes for ~1000 images (BBox_List_2017.csv):Image Index, Finding Label, Bbox[x, y, w, h]. [x y] are coordinates of each box's topleft corner. [w h] represent the width and height of each box.

If you find the dataset useful for your research projects, please cite our CVPR 2017 paper:Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, MohammadhadiBagheri, Ronald M. Summers.ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases, IEEE CVPR, pp. 3462-3471,2017

```
@InProceedings{wang2017chestxray,author    = {Wang, Xiaosong and Peng, Yifan and Lu, Le and Lu, Zhiyong and Bagheri, Mohammadhadi and Summers, Ronald},
title = {ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases},
booktitle = {2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR)},
pages     = {3462--3471},
year      = {2017}}
```

### Questions/Comments:
(xiaosong.wang@nih.gov; le.lu@nih.gov; rms@nih.gov)

### Limitations:
1. The image labels are NLP extracted so there would be some erroneous labels but the NLP labelling accuracy is estimated to be &gt;90%. 
2. Very limited numbers of disease region bounding boxes. 
3. Chest x-ray radiology reports are not anticipated to be publicly shared. Parties who use this public dataset are encouraged to share their “updated” image labels and/or new bounding boxes in their own studied later, maybe through manual annotation.

### Acknowledgement:
This work was supported by the Intramural Research Program of the NIH Clinical Center (clinicalcenter.nih.gov) and National Library of Medicine (www.nlm.nih.gov). We thank NVIDIA Corporation for the GPU donations.

### Reference:
[1] Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, MohammadhadiBagheri, Ronald Summers, ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common ThoraxDiseases, IEEE CVPR, pp. 3462-3471,2017

[2] Hoo-chang Shin, Kirk Roberts, Le Lu, Dina Demner-Fushman, Jianhua Yao, Ronald M. Summers, Learning to Read Chest X-Rays: Recurrent Neural CascadeModel for Automated Image Annotation, IEEE CVPR, pp. 2497-2506, 2016

[3] Open-i: An open access biomedical search engine. https: //openi.nlm.nih.gov

![](https://www.nih.gov/sites/default/files/styles/featured_media_breakpoint-medium/public/news-events/news-releases/2017/20170927-lung-mass.jpg?itok=wSFXjg6d&amp;timestamp=1506520936)
},
keywords= {},
terms= {},
license= {"The usage of the data set is unrestricted"},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/557481faacd824c83fbf57dcf7b6da9383b3235a</link>
</item>
<item>
<title>MICCAI 2015 Challenge on Multimodal Brain Tumor Segmentation (BraTS2015) (Dataset)</title>
<description>@article{,
title= {MICCAI 2015 Challenge on Multimodal Brain Tumor Segmentation (BraTS2015)},
keywords= {},
journal= {},
author= {},
year= {2015},
url= {http://braintumorsegmentation.org/},
license= {Creative Commons Attribution-NonCommercial 3.0 license. (CC BY NC SA 3.0)},
abstract= {Brain tumor image data used in this article were obtained from the MICCAI Challenge on Multimodal Brain Tumor Segmentation. The challenge database contain fully anonymized images from the Cancer Imaging Archive.


1 for necrosis

2 for edema

3 for non-enhancing tumor

4 for enhancing tumor

0 for everything else
    
```
here are 3 requirements for the successfull upload and validation of your segmentation:
Use the MHA filetype to store your segmentations (not mhd) [use short or ushort if you experience any upload problems]
Keep the same labels as the provided truth.mha (see above)
Name your segmentations according to this template: VSD.your_description.###.mha 
replace the ### with the ID of the corresponding Flair MR images. This allows the system to relate your segmentation to the correct training truth. Download an example list for the training data and testing data.
```

![](https://i.imgur.com/umg5BKD.png)

### Publications

B. H. Menze et al., "The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)," in IEEE Transactions on Medical Imaging, vol. 34, no. 10, pp. 1993-2024, Oct. 2015.
doi: 10.1109/TMI.2014.2377694
http://ieeexplore.ieee.org/document/6975210/

Kistler et. al, The virtual skeleton database: an open access repository for biomedical research and collaboration. JMIR, 2013.},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/c4f39a0a8e46e8d2174b8a8a81b9887150f44d50</link>
</item>
<item>
<title>Non-Small Cell Lung Cancer CT Scan Dataset (NSCLC-Radiomics-Genomics) (Dataset)</title>
<description>@article{,
title= {Non-Small Cell Lung Cancer CT Scan Dataset (NSCLC-Radiomics-Genomics)},
keywords= {},
journal= {},
author= {},
year= {},
url= {http://doi.org/10.7937/K9/TCIA.2015.L4FRET6Z},
license= {Creative Commons Attribution 3.0 Unported License},
abstract= {This collection contains images from 89 non-small cell lung cancer (NSCLC) patients that were treated with surgery. For these patients pretreatment CT scans, gene expression, and clinical data are available. This dataset refers to the Lung3 dataset of the study published in Nature Communications.
 
In short, this publication applies a radiomic approach to computed tomography data of 1,019 patients with lung or head-and-neck cancer. Radiomics refers to the comprehensive quantification of tumour phenotypes by applying a large number of quantitative image features. In present analysis 440 features quantifying tumour image intensity, shape and texture, were extracted.  We found that a large number of radiomic features have prognostic power in independent data sets, many of which were not identified as significant before. Radiogenomics analysis revealed that a prognostic radiomic signature, capturing intra-tumour heterogeneity, was associated with underlying gene-expression patterns. These data suggest that radiomics identifies a general prognostic phenotype existing in both lung and head-and-neck cancer. This may have a clinical impact as imaging is routinely used in clinical practice, providing an unprecedented opportunity to improve decision-support in cancer treatment at low cost.

The dataset described here (Lung3) was used to investigate the association of radiomic imaging features with gene-expression profiles. The Lung2 dataset used for training the radiomic biomarker and consisting of 422 NSCLC CT scans with outcome data can be found here: NSCLC-Radiomics.

For scientific inquiries about this dataset, please contact Dr. Hugo Aerts of the Dana-Farber Cancer Institute / Harvard Medical School (hugo_aerts@dfci.harvard.edu).


Gene-expression Data
Corresponding microarray data acquired for the imaging samples are available at National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (Link to GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58661). The patient names used to identify the cases on GEO are identical to those used in the DICOM files on TCIA and in the clinical data spreadsheet.
Clinical Data
Corresponding clinical data can be found here: Lung3.metadata.xls.
Please note that survival time is measured in days from start of treatment. DICOM patients names are identical in TCIA and clinical data file.


![](https://wiki.cancerimagingarchive.net/download/thumbnails/16056856/image2014-6-30%2014%3A56%3A33.png)

### Publications

Aerts, H. J. W. L., Velazquez, E. R., Leijenaar, R. T. H., Parmar, C., Grossmann, P., Cavalho, S., … Lambin, P. (2014, June 3). Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature Communications. Nature Publishing Group. http://doi.org/10.1038/ncomms5006

},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/95b58ebfc1952780cfe2102dd7290889feefad66</link>
</item>
<item>
<title>Ischemic Stroke Lesion Segmentation Challenge 2017 (ISLES2017) (Dataset)</title>
<description>@article{,
title= {Ischemic Stroke Lesion Segmentation Challenge 2017 (ISLES2017)},
keywords= {},
journal= {},
author= {},
year= {2017},
url= {http://www.isles-challenge.org/},
license= {Open Database License},
abstract= {Ischemic Stroke Lesion Segmentation (ISLES), a medical image segmentation challenge at the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2017. On the SMIR, you can register for the challenge, download the test data and submit your results. For more information, visit the official ISLES homepage under www.isles-challenge.org.

### THE ISLES CHALLENGE

This challenge for stroke lesions segmentation has been very popular the past two years (2015, 2016) and yielded various methods, that help to tackle important challenges of modern stroke imaging analysis. This year the challenge provides acute stroke imaging scans and manually outlined lesions on follow-up scans.

### HOW IT WORKS

If you are interested in participating, you are invited to download the training set, including both MRI scans as well as the corresponding expert segmentations of stroke lesions. This will allow you to validate and optimise your method as much as you favour.

Shortly before MICCAI 2017 will take place, a set of test cases will be released of which participants will be asked to run their algorithm on and upload their segmentation results in form of binary image maps. To complete a successful participation, participants will need to submit an abstract, describing the employed method.

The organizers will then evaluate each case and establish a ranking of the participating teams. All results will be presented during SWITCH at MICCAI 2017 and will be discussed with invited experts and all workshop attendees.

Each team will have the opportunity to present their submitted method as a poster, while selected teams will be asked to give a brief presentation detailing their approach. Eventually, submissions will be included in the workshops LNCS post-proceedings and potentially compiled for a high-impact journal paper to summarise and present the findings.


### Please cite the challenge article if you use the data:

Oskar Maier et al.
        ISLES 2015 - A public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI
        Medical Image Analysis, Available online 21 July 2016, ISSN 1361-8415
        http://dx.doi.org/10.1016/j.media.2016.07.009. 
        
Kistler et al. 
        The virtual skeleton database: an open access repository for biomedical research and collaboration. JMIR, 2013
        http://doi.org//10.2196/jmir.2930
        },
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/5bdb401695ad36d4ccd73da90c2f9f8ab6f82092</link>
</item>
<item>
<title>NIH Pancreas-CT Dataset (Dataset)</title>
<description>@article{,
title= {NIH Pancreas-CT Dataset},
keywords= {},
journal= {},
author= {Holger R. Roth and Amal Farag and Evrim B. Turkbey and Le Lu and Jiamin Liu and Ronald M. Summers. },
year= {},
url= {http://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU},
license= {Creative Commons Attribution 3.0 Unported License},
abstract= {### Summary

The National Institutes of Health Clinical Center performed 82 abdominal contrast enhanced 3D CT scans (~70 seconds after intravenous contrast injection in portal-venous) from 53 male and 27 female subjects.  Seventeen of the subjects are healthy kidney donors scanned prior to nephrectomy.  The remaining 65 patients were selected by a radiologist from patients who neither had major abdominal pathologies nor pancreatic cancer lesions.  Subjects' ages range from 18 to 76 years with a mean age of 46.8 ± 16.7. The CT scans have resolutions of 512x512 pixels with varying pixel sizes and slice thickness between 1.5 − 2.5 mm, acquired on Philips and Siemens MDCT scanners (120 kVp tube voltage).

A medical student manually performed slice-by-slice segmentations of the pancreas as ground-truth and these were verified/modified by an experienced radiologist.

The images were processed into nii files using the following script:

```
for i in `ls . | grep PAN`; do 
   echo $i; 
   dcm2niix -vox 1 -z y -o ./data/ -m y -s y -f %n $i
done
```

### Citation

Roth HR, Lu L, Farag A, Shin H-C, Liu J, Turkbey EB, Summers RM. DeepOrgan: Multi-level Deep Convolutional Networks for Automated Pancreas Segmentation. N. Navab et al. (Eds.): MICCAI 2015, Part I, LNCS 9349, pp. 556–564, 2015. 

### Examples

![](https://i.imgur.com/4aZNgw6.gifv)

![](https://i.imgur.com/kfhhH7x.png)

![](https://i.imgur.com/kGbz9hl.png)

},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/80ecfefcabede760cdbdf63e38986501f7becd49</link>
</item>
<item>
<title>N+1 fish, N+2 fish dataset (test_videos) (Dataset)</title>
<description>@article{,
title= {N+1 fish, N+2 fish dataset (test_videos)},
journal= {},
author= {N+1 fish, N+2 fish},
year= {},
url= {},
abstract= {Our video data was collected for an ongoing monitoring project involving our partners at The Nature Conservancy-Massachusetts (TNC) and the Gulf of Maine Research Institute (GMRI). We worked with the fishermen to create a dataset of video footage from different boats that can be released to the public. 

The video data is in the form of standard and well supported MP4 v2 format with H.264 compression. These videos are on average 35 minutes long at a resolution of 640x480 and and vary between 150MB to 550MB in size. We will also provide cropped fish images to aid in training complementary models. Our annotations will be categorial with each fish having a single species. 

There are six species of interest for this competition, which appear in a non-balanced proportion. As we aim to achieve performance on each of these species, we will be compiling a data set that is mostly balanced across these species. Complicating factors include different catch distributions across different vessels, which can have an adverse performance on the types of algorithms submitted if people attempt to game the system by parameterizing these distributions. We are attempting to balance the data set by providing at least 100 unique fish for each species within the videos from multiple vessels.




![](https://i.imgur.com/qHK6WUz.png)

![](https://i.imgur.com/T7UDAJA.png)},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/6fc9279d862d4f6e42ec2613c5b5ceea165cff00</link>
</item>
<item>
<title>Richard Feynman's Lectures on Physics (The Messenger Lectures) (Course)</title>
<description>@article{,
title= {Richard Feynman's Lectures on Physics (The Messenger Lectures)},
keywords= {},
journal= {},
author= {The Great Explainer},
year= {},
url= {http://www.feynmanlectures.caltech.edu/},
license= {},
abstract= {Volume I - mainly mechanics, radiation, and heat

Volume II - mainly electromagnetism and matter

Volume III - quantum mechanics},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/c5af268ec55cf2d3b439e7311ad43101ba8322eb</link>
</item>
<item>
<title>AVA: A Large-Scale Database for Aesthetic Visual Analysis (Dataset)</title>
<description>@article{,
title= {AVA: A Large-Scale Database for Aesthetic Visual Analysis},
keywords= {semantic, quality, AVA, DPChallenge, images, aesthetics},
journal= {},
author= {Naila Murray and Luca Marchesotti and Florent Perronnin},
year= {},
url= {},
license= {},
abstract= {Aesthetic Visual Analysis (AVA) contains over 250,000 images along with a rich variety of meta-data including a large number of aesthetic scores for each image, semantic labels for over 60 categories as well as labels related to photographic style for high-level image quality categorization.},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/71631f83b11d3d79d8f84efe0a7e12f0ac001460</link>
</item>
<item>
<title>New York Taxi Data 2009-2016 in Parquet Fomat (Dataset)</title>
<description>@article{,
title= {New York Taxi Data 2009-2016 in Parquet Fomat},
keywords= {},
journal= {},
author= {New York Taxi and Limousine Commission},
year= {},
url= {http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml},
license= {},
abstract= {Trip record data from the Taxi and Limousine Commission (http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml) from January 2009-December 2016 was consolidated and brought into a consistent Parquet format by Ravi Shekhar &lt;ravi dot shekhar at gmail dot com&gt;.

Data is released under the New York Open Data Law. },
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/4f465810b86c6b793d1c7556fe3936441081992e</link>
</item>
<item>
<title>Small Object Dataset (Dataset)</title>
<description>@article{,
title= {Small Object Dataset},
keywords= {},
author= {Zheng Ma and Lei Yu and Antoni B. Chan},
abstract= {Images of small objects for small instance detections.  Currently four object types are available.

![](http://visal.cs.cityu.edu.hk/wp/wp-content/uploads/smallobject.jpg)

We collect four datasets of small objects from images/videos
on the Internet (e.g.YouTube or Google).

Fly Dataset: contains 600 video frames with an average
of 86 ± 39 flies per frame (648×72 @ 30 fps). 32 images
are used for training (1:6:187) and 50 images for testing
(301:6:600).

Honeybee Dataset: contains 118 images with an average
of 28 ± 6 honeybees per image (640×480). The dataset is
divided evenly for training and test sets. Only the first 32
images are used for training.

Fish Dataset: contains 387 frames of video with an average
of 56±9 fish per frame (300×410 @ 30 fps). 32 images
are used for training (1:3:94) and 65 for testing (193:3:387).

Seagull Dataset: contains three high-resolution images
(624×964) with an average of 866±107 seagulls per image.
The first image is used for training, and the rest for testing.

Cite this paper: http://visal.cs.cityu.edu.hk/static/pubs/conf/cvpr15-densdet.pdf
},
terms= {},
license= {},
superseded= {},
url= {http://visal.cs.cityu.edu.hk/downloads/smallobjects/}
}

</description>
<link>https://academictorrents.com/download/8e751c111cf90123374b5f0cf61e6af9f5e5231e</link>
</item>
<item>
<title>Downsampled ImageNet 32x32 (Dataset)</title>
<description>@article{,
title= {Downsampled ImageNet 32x32},
keywords= {},
author= {Aaron van den Oord and Nal Kalchbrenner and Koray Kavukcuoglu},
abstract= {This page includes downsampled ImageNet images, which can be used for density estimation and generative modeling experiments. Images come in two resolutions: 32x32 and 64x64, and were introduced in Pixel Recurrent Neural Networks. Please refer to the Pixel RNN paper for more details and results. 

![](https://i.imgur.com/s6gdDuX.jpg)},
terms= {},
license= {},
superseded= {},
url= {http://image-net.org/small/download.php}
}

</description>
<link>https://academictorrents.com/download/bf62f5051ef878b9c357e6221e879629a9b4b172</link>
</item>
<item>
<title>Downsampled ImageNet 64x64 (Dataset)</title>
<description>@article{,
title= {Downsampled ImageNet 64x64},
keywords= {},
author= {Aaron van den Oord and Nal Kalchbrenner and Koray Kavukcuoglu},
abstract= {This page includes downsampled ImageNet images, which can be used for density estimation and generative modeling experiments. Images come in two resolutions: 32x32 and 64x64, and were introduced in Pixel Recurrent Neural Networks. Please refer to the Pixel RNN paper for more details and results. 

![](https://i.imgur.com/s6gdDuX.jpg)},
terms= {},
license= {},
superseded= {},
url= {http://image-net.org/small/download.php}
}

</description>
<link>https://academictorrents.com/download/96816a530ee002254d29bf7a61c0c158d3dedc3b</link>
</item>
<item>
<title>Human acute monocytic leukemia (Dataset)</title>
<description>@article{,
title= {Human acute monocytic leukemia},
keywords= {},
journal= {},
author= {Antony C.S. Chan},
year= {},
url= {http://dx.doi.org/10.1038/srep44608},
license= {MIT License},
abstract= {Complete dataset of the imaging flow cytometry of the human acute monocytic leukemia (THP-1) cells acquired by ultrafast optical time-stretch microscopy technique.

Published in: Antony C. S. Chan, Ho-Cheung Ng, Sharat C. V. Bogaraju, Hayden K. H. So, Edmund Y. Lam &amp; Kevin K. Tsia, "All-passive pixel super-resolution of time-stretch imaging" Scientific Reports 7, 44608 (2017)
http://dx.doi.org/10.1038/srep44608

Preprint: https://arxiv.org/abs/1610.05802

To access the serial-temporal line-scans of the cellular images in MATLAB:

trace8192 = h5read('leukemia_20161201.h5', '/raw2016Dec1_2224', [1,1],
[16,8192/16]);

This will load the first 8192 photodetector samples from the dataset.},
superseded= {},
terms= {Copyright (c) 2016, Antony C. S. Chan &lt;cschan@eee.hku.hk&gt; 
Department of Electrical &amp; Electronic Engineering,
The University of Hong Kong
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.}
}

</description>
<link>https://academictorrents.com/download/8464b9f9166c143040fee655f0284085fe251a80</link>
</item>
<item>
<title>Didi Data Release #2 - Round 1 Test Sequence and Training (Dataset)</title>
<description>@article{,
title= {Didi Data Release #2 - Round 1 Test Sequence and Training},
keywords= {},
journal= {},
author= {},
year= {},
url= {},
license= {},
abstract= {Udacity is using a new dataset production method that allows for quick processing and release cycles. Instead of spending weeks (or months) waiting on 3D annotation data to be produced by third-party companies, we have elected to try out something new that enables datasets to be released immediately after they are recorded. While we do lose some sample distribution on each individual dataset due to the same obstacles being used for each session, the massive speedup in production and reduction in cost allows us to release new datasets daily (and with different obstacles with each session). In this manner, we can directly control the type of data being recorded so that we can cover all situations without hoping for them to happen on real roads, and we have extreme precision on obstacle location with differential RTK GPS technology.

Due to this new approach, there are some major differences from the Kitti datasets. It is important to note that recorded positions are recorded with respect to the base station, not the capture vehicle. The NED positions in the 'rtkfix' topic are therefore in relation to a FIXED POINT, NOT THE CAPTURE OR OBSTACLE VEHICLES. The relative positions can be calculated easily, as the NED frame is cartesian space, not polar. The single obstacle vehicle in this dataset is located in the 'obstacle/obs1/rear' topic namespace. Orientation of obstacles are not evaluated in Round 1, but will be evaluated in Round 2. The pose section of the ROS bags included in this release IS NOT A VALID QUATERNION, and does not represent either the pose of the capture vehicle or the obstacle. However, in this dataset, we have included an additional GPS antenna mounted on the rear of the capture vehicle to get a proper orientation. The tracklet generation code (link below) is currently being modified to translate the XML files into the proper vehicle frame with the capture vehicle orientation. Since this is open source code, we welcome your contributions and are looking forward to accepting Pull Requests.

Metadata about each obstacle (length, width, height, GPS antenna location as measured from the rear/left/ground) is included in each obstacle data directory. Tracklet file generation code, as well as sensor transforms/URDF files are available at this repository: https://github.com/udacity/didi-competition

This release requires running a ROS Velodyne driver for a HDL-32E to decode '/velodyne_packets' into '/velodyne_points'. The ROI for the captured camera imagery has also been enlarged at the community request to provide more data. Metadata for the obstacle has also been made available for Round 1.
},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/18d7f6be647eb6d581f5ff61819a11b9c21769c7</link>
</item>
<item>
<title>VGG Cell Dataset from Learning To Count Objects in Images   (Dataset)</title>
<description>@article{,
title= {VGG Cell Dataset from Learning To Count Objects in Images  },
keywords= {},
journal= {},
author= {Lempitsky, V. and Zisserman, A.},
booktitle= {Advances in Neural Information Processing Systems},
year= {2010},
url= {http://www.robots.ox.ac.uk/~vgg/research/counting/index_org.html},
license= {},
abstract= {![](https://i.imgur.com/ydlsPEh.png)

We generated a dataset of  200 images, and used random subsets of the first 100 images to perform training and parameter validations, and the second 100 images to test the counting accuracy. Below, we show some representative results for cell counting for the previously unseen images 



### Acknowledgements

This work is a part of the EU VisRec project (ERC grant VisRec no. 228180). },
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/b32305598175bb8e03c5f350e962d772a910641c</link>
</item>
<item>
<title>A collection of sport activity datasets for data analysis and data mining 2017a (Dataset)</title>
<description>@article{,
title= {A collection of sport activity datasets for data analysis and data mining 2017a},
keywords= {sport, dataset, gpx, triathlon, cycling, running, multisport},
journal= {Technical report 2017a},
author= {Iztok Fister Jr. and Samo Rauter and Dusan Fister and Iztok Fister},
year= {},
url= {},
license= {},
abstract= {},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/f2221a292540ff3e6c85025754f775361c7cd886</link>
</item>
<item>
<title>Udacity Didi $100k Challenge Dataset 1 (Dataset)</title>
<description>@article{,
title= {Udacity Didi $100k Challenge Dataset 1},
keywords= {},
journal= {},
author= {Udacity and Didi},
year= {},
url= {https://challenge.udacity.com/home/},
license= {},
abstract= {First Full Dataset Release - Udacity/Didi $100k Challenge

One of the most important aspects of operating an autonomous vehicle is understanding the surrounding environment in order to make safe decisions. Udacity and Didi Chuxing are partnering together to provide incentive for students to come up with the best way to detect obstacles using camera and LIDAR data. This challenge will allow for pedestrian, vehicle, and general obstacle detection that is useful to both human drivers and self-driving car systems.

Competitors will need to process LIDAR and Camera frames to output a set of obstacles, removing noise and environmental returns. Participants will be able to build on the large body of work that has been put into the Kitti datasets and challenges, using existing techniques and their own novel approaches to improve the current state-of-the-art.

Specifically, students will be competing against each other in the Kitti Object Detection Evaluation Benchmark. While a current leaderboard exists for academic publications, Udacity and Didi will be hosting our own leaderboard specifically for this challenge, and we will be using the standard object detection development kit that enables us to evaluate approaches as they are done in academia and industry.

IMPORTANT NOTICE

There are some major differences between this Udacity dataset and the Kitti datasets. It is important to note that recorded positions are recorded with respect to the base station, not the capture vehicle. The NED positions in the ‘rtkfix’ topic are therefore in relation to a FIXED POINT, NOT THE CAPTURE OR OBSTACLE VEHICLES. The relative positions can be calculated easily, as the NED frame is cartesian space, not polar. 

The XML tracklet files will, however, be in the frame of the capture vehicle. This means that the capture vehicle is also included in the recorded positions, and is denoted by the ROS topic '/gps/rtkfix' in this first dataset. The single obstacle vehicle in this dataset is located in the 'obs1/' topic namespace, but this will be changed to '/obstacles/obstacle_name' in future releases to accommodate the creation of XML tracklet files for multiple obstacles. 

Orientation of obstacles are not evaluated in Round 1, but will be evaluated in Round 2. The pose section of the ROS bags included in this release IS NOT A VALID QUATERNION, and does not represent either the pose of the capture vehicle or the obstacle.

There is no XML tracklet file included with these datasets. They will be released as soon as they are available, in conjunction with the opening of the online leaderboard.},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/76352487923a31d47a6029ddebf40d9265e770b5</link>
</item>
<item>
<title>Public Health 241, 001 - Spring 2011 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Public Health 241, 001 - Spring 2011 - UC Berkeley},
author= {Nicholas P. Jewell},
keywords= {},
abstract= {Biostatistical concepts and modeling relevant to the design and analysis of multifactor population-based cohort and case-control studies, including matching. Measures of association, causal inference, confounding interaction. Introduction to binary regression, including logistic regression.},
terms= {},
url= {https://schedulebuilder.berkeley.edu/explore/courses/SP/2011/5946},
year= {2011}
}

</description>
<link>https://academictorrents.com/download/8469a366bf4b71ff62d9e2327537771bdc145dfa</link>
</item>
<item>
<title>Law 271, Environmental Law and Policy - Fall 2009 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Law 271, Environmental Law and Policy - Fall 2009 - UC Berkeley},
author= {Bob Infelise},
keywords= {},
abstract= {This introductory course is designed to explore fundamental legal and policy issues in environmental law. Through examination of environmental common law and key federal environmental statutes, including the National Environmental Policy Act, Clean Air Act, and Clean Water Act, it exposes students to the major challenges to environmental law and the principal approaches to meeting those challenges, including litigation, command and control regulation, technology forcing, market incentives, and information disclosure requirements. With the addition of cross-cutting topics such as risk assessment and environmental federalism, it also gives students a grounding in how choices about regulatory standards and levels of regulatory authority are made.},
terms= {}
}

</description>
<link>https://academictorrents.com/download/c878ea12eff8c99cc6b1983a2d7724a18bb1a94d</link>
</item>
<item>
<title>Peace and Conflict Studies 164B - Spring 2007 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Peace and Conflict Studies 164B - Spring 2007 - UC Berkeley},
author= {},
keywords= {},
abstract= {This course introduces students to a broad range of issues, concepts, and approaches integral to the study of peace and conflict. Subject areas include the war system and war prevention, conflict resolution and nonviolence, human rights and social justice, development and environmental sustainability. Required of all Peace and Conflict Studies majors. },
terms= {},
url= {http://guide.berkeley.edu/courses/pacs/},
year= {2007}
}

</description>
<link>https://academictorrents.com/download/0ec86c151be43be4857e2da370fb5508ff418146</link>
</item>
<item>
<title>Statistics 21 - 001 - Spring 2010 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Statistics 21 - 001 - Spring 2010 - UC Berkeley},
author= {Fletcher H Ibser},
keywords= {},
abstract= {Descriptive statistics, probability models and related concepts, sample surveys, estimates, confidence intervals, tests of significance, controlled experiments vs. observational studies, correlation and regression.},
terms= {},
url= {https://schedulebuilder.berkeley.edu/explore/courses/FL/2010/3249}
}

</description>
<link>https://academictorrents.com/download/56b38a7013673c92cb951eb79bcd3a26e8158095</link>
</item>
<item>
<title>Statistics 21 - Fall 2009 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Statistics 21 - Fall 2009 - UC Berkeley},
author= {Philip Stark},
keywords= {},
abstract= {Descriptive statistics, probability models and related concepts, sample surveys, estimates, confidence intervals, tests of significance, controlled experiments vs. observational studies, correlation and regression.},
terms= {},
url= {https://schedulebuilder.berkeley.edu/explore/courses/FL/2009/3249},
year= {2009}
}

</description>
<link>https://academictorrents.com/download/4d505fe0b3cbcbeff32bfd7b75a783f900dc8c6d</link>
</item>
<item>
<title>International and Area Studies 107, 001 - Spring 2011 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {International and Area Studies 107, 001 - Spring 2011 - UC Berkeley},
author= {J. Bradford Delong},
keywords= {},
abstract= {This course is designed as a comprehensive overview of intermediate macroeconomic theory focusing on economic growth and international economics. It covers a number of topics including history of economic growth, industrial revolution, post-industrial revolution divergence, flexible-price and sticky-price macroeconomics, and macroeconomic policy. Course is structured for majors in International and Area Studies and other non-economic social science majors.},
terms= {},
url= {https://ninjacourses.berkeley.edu/explore/courses/FL/2011/472},
year= {2011}
}

</description>
<link>https://academictorrents.com/download/97e704bba2ad3fb2dc5d932f4ed693fcb2f85b30</link>
</item>
<item>
<title>Multivariable Calculus - Math 53 - Fall 2009 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Multivariable Calculus - Math 53 - Fall 2009 - UC Berkeley},
author= {Edward Frenkel},
keywords= {},
abstract= {Math 53 - Section 1 - Multivariable Calculus
Instructor: Edward Frenkel

Lectures: TT 3:30-5:00pm, Room 155 Dwinelle

Course Control Number: 54296

Office: 819 Evans

Office Hours: TBA

Prerequisites: Math 1A, 1B.

Required Text: Stewart, Multivariable Calculus, (custom edition).

Recommended Reading:

Syllabus:

Course Webpage: To be linked from http://math.berkeley.edu/~frenkel/Math53

Grading: 25% quizzes and HW, 20% each midterm, 35% final

Homework: Homework for the entire course will be assigned at the beginning of the semester, and weekly 
homework will be due at the beginning of each week.

Comments: Students have to make sure that they have no scheduling conflicts with the final exam. Missing final exam means automatic Fail grade for the entire course.},
terms= {},
url= {https://web.archive.org/web/20100501160242/http://math.berkeley.edu/~frenkel/Math53},
year= {2009}
}

</description>
<link>https://academictorrents.com/download/d90733721eb2a2ba839434decce91ce4803cbf1e</link>
</item>
<item>
<title>Political Science 179 - Spring 2008 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Political Science 179 - Spring 2008 - UC Berkeley},
author= {Alan Ross},
keywords= {},
abstract= {Political issues facing the state of California, the United States, or the international community.
},
terms= {},
url= {https://schedulebuilder.berkeley.edu/explore/courses/SP/2008/2765},
year= {2008}
}

</description>
<link>https://academictorrents.com/download/e75a329db4adabdc45502c401a1c4b69712cbb98</link>
</item>
<item>
<title>Chemistry 1A, 002 - Spring 2010 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Chemistry 1A, 002 - Spring 2010 - UC Berkeley},
author= {Heino Nitsche},
keywords= {},
abstract= {Stoichiometry of chemical reactions, quantum mechanical description of atoms, the elements and periodic table, chemical bonding, real and ideal gases, thermochemistry, introduction to thermodynamics and equilibrium, acid-base and solubility equilibria, introduction to oxidation-reduction reactions, introduction to chemical kinetics.},
terms= {},
url= {https://schedulebuilder.berkeley.edu/explore/courses/FL/2010/257}
}

</description>
<link>https://academictorrents.com/download/ebe62de0d85ba9563c566cc5b082416792bc00ca</link>
</item>
<item>
<title>Chemistry 3B, 002 - Fall 2014 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Chemistry 3B, 002 - Fall 2014 - UC Berkeley},
author= {K. Peter Vollhardt},
keywords= {},
abstract= {Conjugation, aromatic chemistry, carbonyl compounds, carbohydrates, amines, carboxylic acids, amino acids, peptides, proteins, and nucleic acid chemistry. Ultraviolet spectroscopy and mass spectrometry will be introduced.},
terms= {},
url= {https://schedulebuilder.berkeley.edu/explore/courses/SP/2014/263}
}

</description>
<link>https://academictorrents.com/download/769ef081e79307987fd52ed97c82fe7c590c88f8</link>
</item>
<item>
<title>Economics 1, 001 - Fall 2011 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Economics 1, 001 - Fall 2011 - UC Berkeley},
author= {Ken Train},
keywords= {},
abstract= {A survey of economics designed to give an overview of the field.},
terms= {},
url= {https://schedulebuilder.berkeley.edu/explore/courses/SP/2011/330}
}

</description>
<link>https://academictorrents.com/download/93628c9e317768a5bf994eec845834d9e4a749e9</link>
</item>
<item>
<title>Physics 10, 001 - Spring 2006 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Physics 10, 001 - Spring 2006 - UC Berkeley},
author= {Richard A. Muller},
keywords= {},
abstract= {The most interesting and important topics in physics, stressing conceptual understanding rather than math, with applications to current events. Topics covered may vary and may include energy and conservation, radioactivity, nuclear physics, the Theory of Relativity, lasers, explosions, earthquakes, superconductors, and quantum physics.},
terms= {}
}

</description>
<link>https://academictorrents.com/download/5140da14dd72b2a6f19a5ca08d2e2d015754909a</link>
</item>
<item>
<title>Chemical &amp; Biomolecular Engineering 179 Process Technology of Solid-State Materials Devices  - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Chemical &amp; Biomolecular Engineering 179 Process Technology of Solid-State Materials Devices  - UC Berkeley},
author= {},
keywords= {},
abstract= {Chemical processing and properties of solid-state materials. Crystal growth and purification. Thin film technology. Application of chemical processing to the manufacture of semiconductors and solid-state devices. },
terms= {}
}

</description>
<link>https://academictorrents.com/download/f1baa15065060f1830d74111c1ef7741a73c9e98</link>
</item>
<item>
<title>Psychology 1 - General Psychology - Fall 2007 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Psychology 1 - General Psychology - Fall 2007 - UC Berkeley},
author= {John Kihlstrom},
keywords= {},
abstract= {Introduction to the principal areas, problems, and concepts of psychology},
terms= {},
year= {2007},
url= {https://schedulebuilder.berkeley.edu/explore/courses/FL/2007/609}
}

</description>
<link>https://academictorrents.com/download/687bfcdf88598c04edf98c56c3b5f838d43ec2a6</link>
</item>
<item>
<title>Environmental Economics and Policy 145 - Fall 2014 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Environmental Economics and Policy 145 - Fall 2014 - UC Berkeley},
author= {},
keywords= {},
year= {2014},
abstract= {This course introduces students to key issues and findings in the field of health and environmental economics. The first half of the course focuses on the theoretical and statistical frameworks used to analyze instances of market failure in the provision of health and environmental goods. The second half focuses on policy-relevant empirical findings in the field.},
terms= {},
url= {https://ninjacourses.berkeley.edu/explore/courses/FL/2015/729}
}

</description>
<link>https://academictorrents.com/download/c633df0181d560050d3f392501c6815135cfb60e</link>
</item>
<item>
<title>Nuclear Engineering 101, 001 - Fall 2014 - UC Berkeley (Course)</title>
<description>@article{,
license={Creative Commons 3.0: Attribution-NonCommercial-NoDerivs},
title= {Nuclear Engineering 101, 001 - Fall 2014 - UC Berkeley},
keywords= {},
author= {},
abstract= {### Course Title: 
Nuclear Reactions and Radiation

### Catalog Description: 
Energetics and kinetics of nuclear reactions and radioactive decay, fission, fusion, and reactions of energetic neutrons, properties of the fission products and the actinides; nuclear models and transition probabilities; interaction of radiation with matter.

### Course Prerequisite: 
Physics 7ABC Physics for scientists and engineers
Prerequisite Knowledge and/or Skills: 
The course uses the following knowledge and skills from prerequisite and lower-division courses:

- solve linear, first and second order differential equations.
- understand and apply the fundamental laws of physical chemistry such as the Boltzmann distribution for particles in an ideal gas.
- understand and apply the fundamentals of classical mechanics, electricity and magnetism and the elements of quantum mechanics to idealized representations of the structure of nuclei and nuclear reactions.
- understand and apply the fundamental notions of probability and probability distributions.

### Course Objectives: 

- Provide the students with a solid understanding of the fundamentals of those aspect of low-energy nuclear physics that are most important to applications in such areas as nuclear engineering, nuclear and radiochemistry, geosciences, biotechnology, etc.

### Course Outcomes: 

- calculate the consequences of radioactive growth and decay and nuclear reactions.
- calculate estimates of nuclear masses and energetics based on empirical data and nuclear models.
- calculate estimates of the lifetimes of nuclear states that are unstable to alpha-,beta- and gamma decay and internal conversion based on the theory of simple nuclear models.
- use nuclear models to predict low-energy level structure and level energies.
- use nuclear models to predict the spins and parities of low-lying levels and estimate their consequences with respect to radioactive decay.
- use nuclear models to understand the properties of neutron capture and the Breit-Wigner single level formula to calculate cross sections at resonance and thermal energies.
- calculate the kinematics of the interaction of photons with matter and apply stopping power to determine the energy loss rate and ranges of charged particles in matter
- calculate the energies of fission fragments and understand the charge and mass distributions of the fission products, and prompt neutron and gamma rays from fission

### Topics Covered: 

- Introduction to nuclear reactions and radioactive decay - mass and energy balances and decay modes
- Nuclear and Atomic masses - empirical data and the semiempirpical mass formula
- Application of the Semiempirical mass formula to determine the nuclear mass surface and the general characteristics of the energetics of alpha- and beta-decay and nuclear fission
- Application of the Semiempirical mass formula to uncover empirical evidence for nuclear shell structure; the magic numbers
Introduction to the facts of quantum mechanics and conserved quantities – angular momentum and parity, the Schroedinger equation and the particle in the box model
- The Spherical Shell Model - particle motion , angular momentum and parity in the spherical potential well and the isotropic harmonic oscillator potentials
- The Empirical Shell Model and low-lying levels of spherical and near spherical nuclei
- The Electric Potential of Nuclei and Evidence for Deformed Nuclei – multipole expansion of the electric potential and empirical data on quadrupole moments
- Predictions of the Quantized Rigid Rotor and Harmonic Vibrator - comparisons of the idealized models with empirical data on rotational and vibrational spectra of deformed nuclei
- Alpha Decay - energetics and the decay probability in the limit of the Gamow model. Comparison of model predictions with empirical data. Alpha decay schemes
- Beta Decay - beta decay, positron emission and electron capture; the Fermi theory of allowed beta decay; forbidden transitions; Fermi and Gamow-Teller decay; empirical beta decay schemes and correlations with elementary beta decay theory and spherical shell structure
- Gamma Decay and Internal Conversion- multipole expansion of the radiation field and qualitative consideration of decay probabilities in the limit of the Moskowski and Weisskopf models; nuclear isomerism; internal conversion; nuclear structure and empirical data on gamma decay
- Nuclear Fission - energetics and empirical data on mass distributions and shell structure, charge distribution of the fission fragments, prompt neutrons and gamma rays
- Nuclear Reactions - reaction types and energetics; kinematics of two-body elastic scattering and nuclear reactions; applications to moderation of neutrons and the interaction of charged particles with matter; direct and compound nuclear reactions; resonances and physical plausibility of the form of the Breit-Wigner single level formula; the Breit-Wigner single level formula and resonances properties of neutron reactions
- Introduction to the Interaction of Charged Particles with Matter; ranges of leptons and heavy charged particles in matter
- Introduction to the Interaction of Photons with Matter - the Compton Effect; qualitative discussion of the effect of electron binding; pair production; macroscopic cross sections and attenuation coefficients

},
terms= {},
url= {https://www.nuc.berkeley.edu/courses/ne-101}
}

</description>
<link>https://academictorrents.com/download/92644e4132e893c70c3e0ad9ac1d58bef554bd14</link>
</item>
<item>
<title>Astronomy C12, 001 - Fall 2014 - UC Berkeley	 (Course)</title>
<description>@article{,
title= {Astronomy C12, 001 - Fall 2014 - UC Berkeley},
keywords= {},
author= {Geoffrey W. Marcy},
abstract= {A tour of the mysteries and inner workings of our solar system. What are planets made of? Why do they orbit the sun the way they do? How do planets form, and what are they made of? Why do some bizarre moons have oceans, volcanoes, and ice floes? What makes the Earth hospitable for life? Is the Earth a common type of planet or some cosmic quirk? This course will introduce basic physics, chemistry, and math to understand planets, moons, rings, comets, asteroids, atmospheres, and oceans. Understanding other worlds will help us save our own planet and help us understand our place in the universe. Also listed as Letters and Science C70T and Earth and Planetary Science C12.},
terms= {},
url={https://schedulebuilder.berkeley.edu/explore/courses/FL/2012/4204}
}

</description>
<link>https://academictorrents.com/download/1433dd89d4366df2b534c5e3b6b267776a67e7af</link>
</item>
<item>
<title>Bioengineering 200, 001 - Spring 2014  - UC Berkeley (Course)</title>
<description>@article{,
title= {Bioengineering 200, 001 - Spring 2014  - UC Berkeley},
keywords= {},
author= {},
abstract= {An introduction to research in bioengineering including specific case studies and organization of this rapidly expanding and diverse field.},
terms= {},
url= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/c4cd3183550f9cc1cfdcb69f15b076ba439ee062</link>
</item>
<item>
<title>Public Health 150E, 001 - Spring 2015 - UC Berkeley (Course)</title>
<description>@article{,
title= {Public Health 150E, 001 - Spring 2015 - UC Berkeley},
keywords= {},
author= {William A. Satariano},
abstract= {},
terms= {},
license= {},
superseded= {},
url= {}
}

</description>
<link>https://academictorrents.com/download/057e7009bdf9e3d09b1ef56ffbd82a1d1a5de23c</link>
</item>
<item>
<title>Electrical Engineering 123, 001 - Spring 2015 - UC Berkeley (Course)</title>
<description>@article{,
title= {Electrical Engineering 123, 001 - Spring 2015 - UC Berkeley},
keywords= {},
author= {Shimon Michael Lustig},
abstract= {Catalog Description: (4 units) Discrete time signals and systems: Fourier and Z transforms, DFT, 2-dimensional versions. Digital signal processing topics: flow graphs, realizations, FFT, quantization effects, linear prediction. Digital filter design methods: windowing, frequency sampling, S-to-Z methods, frequency-transformation methods, optimization methods, 2-dimensional filter design.

Prerequisites: EECS 120, or instructor permission.

Course objectives: To develop skills for analyzing and synthesizing algorithms and systems that process discrete time signals, with emphasis on realization and implementation.

Why should you care? Digital signal processing is one of the most important and useful tools an electrical engineer could have. It impacts all modern aspects of life and sciences; from communication, entertainment to health and economics.},
terms= {},
url= {https://inst.eecs.berkeley.edu/~ee123/sp15/}
}

</description>
<link>https://academictorrents.com/download/530416f4f3a4b2cac90e61d5df72d1610dec68b4</link>
</item>
<item>
<title>Integrative Biology 131 - General Human Anatomy Online Course Videos - UCBerkeley (Course)</title>
<description>@article{,
title= {Integrative Biology 131 - General Human Anatomy Online Course Videos - UCBerkeley},
keywords= {},
journal= {},
author= {Marian Diamond},
year= {2008},
url= {},
license= {},
abstract= {Integrative Biology 131: General Human Anatomy. Fall 2005. Professor Marian Diamond. The functional anatomy of the human body as revealed by gross and microscopic examination.

The Department of Integrative Biology offers a program of instruction that focuses on the integration of structure and function in the evolution of diverse biological systems. It investigates integration at all levels of organization from molecules to the biosphere, and in all taxa of organisms from viruses to higher plants and animals.

},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/5a0d6b38ab0adb9e52182164ffa8db19822f73ef</link>
</item>
<item>
<title>[Coursera] Algorithms: Design and Analysis, Part 1 (Stanford University) (algo) (Course)</title>
<description>@article{,
title= {[Coursera] Algorithms: Design and Analysis, Part 1 (Stanford University) (algo)},
author= {Stanford University},
keywords= {Coursera, algo},
abstract= {},
terms= {},
license= {},
superseded= {},
url= {}
}

</description>
<link>https://academictorrents.com/download/7bfcfbaf2c53588b23ba1ebccae47a2b9c5197b7</link>
</item>
<item>
<title>Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia (Original Dataset) (Dataset)</title>
<description>@article{,
title= {Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia (Original Dataset)},
author= {Sameer Singh and Amarnag Subramanya and Fernando Pereira and Andrew McCallum},
abstract= {Cross-document coreference resolution is the task of grouping the entity mentions in a collection of documents into sets that each represent a distinct entity. It is central to knowledge base construction and also useful for joint inference with other NLP components. Obtaining large, organic labeled datasets for training and testing cross-document coreference has previously been difficult. We use a method for automatically gathering massive amounts of naturally-occurring cross-document reference data to create the Wikilinks dataset comprising of 40 million mentions over 3 million entities. Our method is based on finding hyperlinks to Wikipedia from a web crawl and using anchor text as mentions. In addition to providing large-scale labeled data without human effort, we are able to include many styles of text beyond newswire and many entity types beyond people.

### Introduction


 The Wikipedia links (WikiLinks) data consists of web pages that
 satisfy the following two constraints:

a. contain at least one hyperlink that points to Wikipedia, and
b. the anchor text of that hyperlink closely matches the title of the target Wikipedia page.

  We treat each page on Wikipedia as representing an entity
  (or concept or idea), and the anchor text as a mention of the
  entity. The WikiLinks data set was obtained by iterating 
  over Google's web index. 

####  Content


  This dataset is accompanied by the following tech report:

  https://web.cs.umass.edu/publication/docs/2012/UM-CS-2012-015.pdf

  Please cite the above report if you use this data.

  The dataset is divided over 10 gzipped text files
  data-0000[0-9]-of-00010.gz. Each of these files can be viewed
  without uncompressing them using zcat. For example:

  zcat data-00001-of-00010.gz | head 

  gives:


  URLftp://217.219.170.14/Computer%20Group/Faani/vaset%20fani/second/sattari/word/2007/source/s%20crt.docx

  MENTIONvacuum tube421http://en.wikipedia.org/wiki/Vacuum_tube

  MENTIONvacuum tubes10838http://en.wikipedia.org/wiki/Vacuum_tube

  MENTIONelectron gun598http://en.wikipedia.org/wiki/Electron_gun

  MENTIONfluorescent790http://en.wikipedia.org/wiki/Fluorescent

  MENTIONoscilloscope1307http://en.wikipedia.org/wiki/Oscilloscope

  MENTIONcomputer monitor1503http://en.wikipedia.org/wiki/Computer_monitor

  MENTIONcomputer monitors3066http://en.wikipedia.org/wiki/Computer_monitor

  MENTIONradar1657http://en.wikipedia.org/wiki/Radar

  MENTIONplasma screens2162http://en.wikipedia.org/wiki/Plasma_screen


  Each file is in the following format:

  -------

  URL\t&lt;url&gt;\n

  MENTION\t&lt;mention&gt;\t&lt;byte_offset&gt;\t&lt;target_url&gt;\n

  MENTION\t&lt;mention&gt;\t&lt;byte_offset&gt;\t&lt;target_url&gt;\n

  MENTION\t&lt;mention&gt;\t&lt;byte_offset&gt;\t&lt;target_url&gt;\n

  ...

  TOKEN\t&lt;token&gt;\t&lt;byte_offset&gt;\n

  TOKEN\t&lt;token&gt;\t&lt;byte_offset&gt;\n

  TOKEN\t&lt;token&gt;\t&lt;byte_offset&gt;\n

  ...

  \n\n

  URL\t&lt;url&gt;\n

  ...



  where each web-page is identified by its url (annotated
  by "URL"). For every mention (denoted by "MENTION"), we provide the
  actual mention string, the byte offset of the mention from the start
  of the page and the target url all separated by a tab. It is
  possible (and in many cases very likely) that the contents of a
  web-page may change over time. The dataset also contains information
  about the top 10 least frequent tokens on that page at the time
  it was crawled. These line started with a "TOKEN" and contain
  the string of the token and the byte offset from the start of the page.
  These token strings can be used as fingerprints
  to verify if the page used to generate the data has changed. Finally,
  pages are separated from each other by two blank lines.

####  Basic Statistics


  Number of Document: 11 million
  Number of entities:  3 million
  Number of mentions: 40 million


  Finally please note that this dataset was created automatically
  from the web and therefore contains some amount of noise.

  Enjoy!

  Amar Subramanya (asubram@google.com)

  Sameer Singh (sameer@cs.umass.edu)

  Fernando Pereira (pereira@google.com)

  Andrew McCallum (mccallum@cs.umass.edu)
},
keywords= {},
terms= {},
license= {Attribution 3.0 Unported (CC BY 3.0) Human-Readable Summary}
}

</description>
<link>https://academictorrents.com/download/beefa2ec4161432cd1d9f693a88d3670aae68357</link>
</item>
<item>
<title>Open Payments Dataset - 2015 Program Year  (Dataset)</title>
<description>@article{,
title= {Open Payments Dataset - 2015 Program Year },
keywords= {},
year= {2015},
url= {https://www.cms.gov/OpenPayments/Explore-the-Data/Data-Overview.html},
author= {U.S. Centers for Medicare &amp; Medicaid Services},
abstract= {Every year, CMS will update the Open Payments data at least once after its initial publication. The refreshed data will include updates to data disputes and other data corrections made since the initial publication of this data documenting payments or transfers of value to physicians and teaching hospitals, and physician ownership and investment interests. This financial data is submitted by applicable manufacturers and applicable group purchasing organizations (GPOs).  

#### What data is collected?
Applicable manufacturers and GPOs submit data to Open Payments about payments or other transfers of value between applicable manufacturers and GPOs and physicians or teaching hospitals:

1. Paid directly to physicians and teaching hospitals (known as direct payments)
2. Paid indirectly to physicians and teaching hospitals (known as indirect payments) through an intermediary such as a medical specialty society
3. Designated by physicians or teaching hospitals to be paid to another party (known as third party payments)
There are three distinct ways for you to review and search the data (and remember, you can view the summary data dashboard for an overview of published data):

The Open Payments Final Rule §403.910 provides applicable manufacturers and applicable GPO's the opportunity to request a delay in publication for a period not to exceed four calendar years after the date the payment or other transfer of value was made, or upon the approval, licensure or clearance of the covered drug, device, biological, or medical supply by the FDA.},
terms= {}
}

</description>
<link>https://academictorrents.com/download/de413718a03cd670535c772cf68116775a9e2537</link>
</item>
<item>
<title>Open Payments Dataset - 2014 Program Year  (Dataset)</title>
<description>@article{,
title= {Open Payments Dataset - 2014 Program Year },
keywords= {},
year= {2014},
url= {https://www.cms.gov/OpenPayments/Explore-the-Data/Data-Overview.html},
author= {U.S. Centers for Medicare &amp; Medicaid Services},
abstract= {Every year, CMS will update the Open Payments data at least once after its initial publication. The refreshed data will include updates to data disputes and other data corrections made since the initial publication of this data documenting payments or transfers of value to physicians and teaching hospitals, and physician ownership and investment interests. This financial data is submitted by applicable manufacturers and applicable group purchasing organizations (GPOs).  

#### What data is collected?
Applicable manufacturers and GPOs submit data to Open Payments about payments or other transfers of value between applicable manufacturers and GPOs and physicians or teaching hospitals:

1. Paid directly to physicians and teaching hospitals (known as direct payments)
2. Paid indirectly to physicians and teaching hospitals (known as indirect payments) through an intermediary such as a medical specialty society
3. Designated by physicians or teaching hospitals to be paid to another party (known as third party payments)
There are three distinct ways for you to review and search the data (and remember, you can view the summary data dashboard for an overview of published data):

The Open Payments Final Rule §403.910 provides applicable manufacturers and applicable GPO's the opportunity to request a delay in publication for a period not to exceed four calendar years after the date the payment or other transfer of value was made, or upon the approval, licensure or clearance of the covered drug, device, biological, or medical supply by the FDA.},
terms= {}
}

</description>
<link>https://academictorrents.com/download/88f6fff84d7c2a2769348ab4c2b0ecb318b43752</link>
</item>
<item>
<title>Open Payments Dataset - 2013 Program Year  (Dataset)</title>
<description>@article{,
title= {Open Payments Dataset - 2013 Program Year },
keywords= {},
year={2013},
url= {https://www.cms.gov/OpenPayments/Explore-the-Data/Data-Overview.html},
author= {U.S. Centers for Medicare &amp; Medicaid Services},
abstract= {Every year, CMS will update the Open Payments data at least once after its initial publication. The refreshed data will include updates to data disputes and other data corrections made since the initial publication of this data documenting payments or transfers of value to physicians and teaching hospitals, and physician ownership and investment interests. This financial data is submitted by applicable manufacturers and applicable group purchasing organizations (GPOs).  

#### What data is collected?
Applicable manufacturers and GPOs submit data to Open Payments about payments or other transfers of value between applicable manufacturers and GPOs and physicians or teaching hospitals:

1. Paid directly to physicians and teaching hospitals (known as direct payments)
2. Paid indirectly to physicians and teaching hospitals (known as indirect payments) through an intermediary such as a medical specialty society
3. Designated by physicians or teaching hospitals to be paid to another party (known as third party payments)
There are three distinct ways for you to review and search the data (and remember, you can view the summary data dashboard for an overview of published data):

The Open Payments Final Rule §403.910 provides applicable manufacturers and applicable GPO's the opportunity to request a delay in publication for a period not to exceed four calendar years after the date the payment or other transfer of value was made, or upon the approval, licensure or clearance of the covered drug, device, biological, or medical supply by the FDA.},
terms= {}
}

</description>
<link>https://academictorrents.com/download/92a1aeaaf741f3d1669ad0f0186d96ec168ee550</link>
</item>
<item>
<title>[Coursera ] Text Mining and Analytics (Course)</title>
<description>@article{,
title= {[Coursera ] Text Mining and Analytics},
keywords= {text mining, analytics},
journal= {},
author= {Coursera},
year= {},
url= {},
license= {},
abstract= {},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/e2c129491a3841bfac5d7b08b41ad79387132a23</link>
</item>
<item>
<title>A Brief Review of Nature-Inspired Algorithms for Optimization (Paper)</title>
<description>@article{fister2013briefreview,
author= {Fister Jr., Iztok and Yang, Xin-She and Fister, Iztok and Brest, Janez and Fister, Dusan},
journal= {Elektrotehniski vestnik},
number= {3},
pages= {116--122},
title= {A Brief Review of Nature-Inspired Algorithms for Optimization},
volume= {80},
year= {2013},
abstract= {},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/aec97f8374cfa5b8bce86cd542870fe849e1afb5</link>
</item>
<item>
<title>Pre-configured (Mint) linux based virtual machine image (Dataset)</title>
<description>@article{,
title= {Pre-configured (Mint) linux based virtual machine image},
author= {Swami Iyer},
abstract= {Pre-configured (Mint) linux based virtual machine image for courses taught by Swami Iyer at UMass Boston in Spring 2017.},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/5ceb6902b46a344de6db18c2ec5a14bb24a7df4a</link>
</item>
<item>
<title>Microsoft Academic Graph - 2016/02/05 (Dataset)</title>
<description>@article{,
title= {Microsoft Academic Graph - 2016/02/05},
keywords= {},
author= {Arnab Sinha and Zhihong Shen and Yang Song and Hao Ma and Darrin Eide and Bo-June (Paul) Hsu and Kuansan Wang},
abstract= {The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study. This graph is used to power experiences in Bing, Cortana, and in Microsoft Academic.

},
terms= {We kindly request that any published research that makes use of this data cites our data paper listed below.

Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839},
url= {https://academicgraph.blob.core.windows.net/graph-2016-02-05/index.html}
}

</description>
<link>https://academictorrents.com/download/1e0a00b9c606cf87c03e676f75929463c7756fb5</link>
</item>
<item>
<title>US Stock Market End of Day dataset (Dataset)</title>
<description>@article{,
title= {US Stock Market End of Day dataset},
keywords= {Stock Market High Low Close Open Volume Ticker Date},
journal= {},
author= {Atreyuroc},
year= {},
url= {},
license= {},
abstract= {4974 Stock Symbols End of day data. Includes close open high low volume and date. Data was collected from Google finance public data. 

```
+----------+------------+
| Table    | Size in MB |
+----------+------------+
| surf_eod |    1109.00 |
+----------+------------+
1 row in set (0.00 sec)

mysql&gt; SELECT COUNT(DISTINCT(`ticker`)) FROM surf_eod;
+---------------------------+
| COUNT(DISTINCT(`ticker`)) |
+---------------------------+
|                      4974 |
+---------------------------+
1 row in set (6.31 sec)

mysql&gt; describe surf_eod;
+--------+-------------+------+-----+-------------------+-----------------------------+
| Field  | Type        | Null | Key | Default           | Extra                       |
+--------+-------------+------+-----+-------------------+-----------------------------+
| ticker | varchar(10) | YES  | MUL | NULL              |                             |
| date   | date        | YES  |     | NULL              |                             |
| close  | varchar(20) | YES  |     | NULL              |                             |
| high   | varchar(20) | YES  |     | NULL              |                             |
| low    | varchar(20) | YES  |     | NULL              |                             |
| open   | varchar(20) | YES  |     | NULL              |                             |
| volume | varchar(20) | YES  |     | NULL              |                             |
| time   | timestamp   | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------+-------------+------+-----+-------------------+-----------------------------+
8 rows in set (0.04 sec)

mysql&gt; SELECT COUNT(*) FROM surf_eod;
+----------+
| COUNT(*) |
+----------+
| 17726722 |
+----------+
1 row in set (25.18 sec)
```},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/c5a49e46249fef6a3219919fef96fd0265da4d3a</link>
</item>
<item>
<title>AOL Search Data 20M web queries (2006) (Dataset)</title>
<description>@article{,
title= {AOL Search Data 20M web queries (2006)},
keywords= {web, www, User Session},
journal= {},
author= {AOL},
year= {2006},
url= {http://gregsadetsky.com/_aol-data/U500k_README.txt},
license= {},
abstract= {#### 500k User Session Collection

This collection is distributed for NON-COMMERCIAL RESEARCH USE ONLY. 
Any application of this collection for commercial purposes is STRICTLY PROHIBITED.

#### Brief description:

This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged. 

The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research. 

The data set includes {AnonID, Query, QueryTime, ItemRank, ClickURL}.
```
        AnonID - an anonymous user ID number.
        Query  - the query issued by the user, case shifted with
                 most punctuation removed.
        QueryTime - the time at which the query was submitted for search.
        ItemRank  - if the user clicked on a search result, the rank of the
                    item on which they clicked is listed. 
        ClickURL  - if the user clicked on a search result, the domain portion of 
                    the URL in the clicked result is listed.
```

Each line in the data represents one of two types of events:
        1. A query that was NOT followed by the user clicking on a result item.
        2. A click through on an item in the result list returned from a query.
In the first case (query only) there is data in only the first three columns/fields -- namely AnonID, Query, and QueryTime (see above). 
In the second case (click through), there is data in all five columns.  For click through events, the query that preceded the click through is included.  Note that if a user clicked on more than one result in the list returned from a single query, there will be TWO lines in the data to represent the two events.  Also note that if the user requested the next "page" or results for some query, this appears as a subsequent identical query with a later time stamp.

CAVEAT EMPTOR -- SEXUALLY EXPLICIT DATA!  Please be aware that these queries are not filtered to remove any content.  Pornography is prevalent on the Web and unfiltered search engine logs contain queries by users who are looking for pornographic material.  There are queries in this collection that use SEXUALLY EXPLICIT LANGUAGE.  This collection of data is intended for use by mature adults who are not easily offended by the use of pornographic search terms.  If you are offended by sexually explicit language you should not read through this data.  Also be aware that in some states it may be illegal to expose a minor to this data.  Please understand that the data represents REAL WORLD USERS, un-edited and randomly sampled, and that AOL is not the author of this data.

Basic Collection Statistics
Dates:
  01 March, 2006 - 31 May, 2006

Normalized queries:
  36,389,567 lines of data
  21,011,340 instances of new queries (w/ or w/o click-through)
   7,887,022 requests for "next page" of results
  19,442,629 user click-through events
  16,946,938 queries w/o user click-through
  10,154,742 unique (normalized) queries
     657,426 unique user ID's

},
superseded= {},
terms= {Please reference the following publication when using this collection:

G. Pass, A. Chowdhury, C. Torgeson,  "A Picture of Search"  The First 
International Conference on Scalable Information Systems, Hong Kong, June, 
2006.}
}

</description>
<link>https://academictorrents.com/download/cd339bddeae7126bb3b15f3a72c903cb0c401bd1</link>
</item>
<item>
<title>MovieLens 20M Dataset (Dataset)</title>
<description>@article{,
title= {MovieLens 20M Dataset},
keywords= {},
journal= {},
author= {},
year= {},
url= {http://files.grouplens.org/datasets/movielens/ml-20m-README.html},
license= {},
abstract= {Stable benchmark dataset. 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags. Released 4/2015; updated 10/2016 to update links.csv and add tag genome data.

### Summary

This dataset (ml-20m) describes 5-star rating and free-text tagging activity from MovieLens, a movie recommendation service. It contains 20000263 ratings and 465564 tag applications across 27278 movies. These data were created by 138493 users between January 09, 1995 and March 31, 2015. This dataset was generated on October 17, 2016.

Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

The data are contained in six files, genome-scores.csv, genome-tags.csv, links.csv, movies.csv, ratings.csv and tags.csv. More details about the contents and use of all these files follows.

This and other GroupLens data sets are publicly available for download at http://grouplens.org/datasets/.

#### Further Information About GroupLens

GroupLens is a research group in the Department of Computer Science and Engineering at the University of Minnesota. Since its inception in 1992, GroupLens's research projects have explored a variety of fields including:

recommender systems
online communities
mobile and ubiquitious technologies
digital libraries
local geographic information systems
GroupLens Research operates a movie recommender based on collaborative filtering, MovieLens, which is the source of these data. We encourage you to visit http://movielens.org to try it out! If you have exciting ideas for experimental work to conduct on MovieLens, send us an email at grouplens-info@cs.umn.edu - we are always interested in working with external collaborators.

#### Content and Use of Files

#### Verifying the Dataset Contents

We encourage you to verify that the dataset you have on your computer is identical to the ones hosted at grouplens.org. This is an important step if you downloaded the dataset from a location other than grouplens.org, or if you wish to publish research results based on analysis of the MovieLens dataset.

We provide a MD5 checksum with the same name as the downloadable .zip file, but with a .md5 file extension. To verify the dataset:

on linux
md5sum ml-20m.zip; cat ml-20m.zip.md5

on OSX
md5 ml-20m.zip; cat ml-20m.zip.md5

windows users can download a tool from Microsoft (or elsewhere) that verifies MD5 checksums
Check that the two lines of output contain the same hash value.

#### Formatting and Encoding

The dataset files are written as comma-separated values files with a single header row. Columns that contain commas (,) are escaped using double-quotes ("). These files are encoded as UTF-8. If accented characters in movie titles or tag values (e.g. Misérables, Les (1995)) display incorrectly, make sure that any program reading the data, such as a text editor, terminal, or script, is configured for UTF-8.

#### User Ids

MovieLens users were selected at random for inclusion. Their ids have been anonymized. User ids are consistent between ratings.csv and tags.csv (i.e., the same id refers to the same user across the two files).

#### Movie Ids

Only movies with at least one rating or tag are included in the dataset. These movie ids are consistent with those used on the MovieLens web site (e.g., id 1 corresponds to the URL https://movielens.org/movies/1). Movie ids are consistent between ratings.csv, tags.csv, movies.csv, and links.csv (i.e., the same id refers to the same movie across these four data files).

#### Ratings Data File Structure (ratings.csv)

All ratings are contained in the file ratings.csv. Each line of this file after the header row represents one rating of one movie by one user, and has the following format:

userId,movieId,rating,timestamp
The lines within this file are ordered first by userId, then, within user, by movieId.

Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).

Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.

Tags Data File Structure (tags.csv)

All tags are contained in the file tags.csv. Each line of this file after the header row represents one tag applied to one movie by one user, and has the following format:

userId,movieId,tag,timestamp
The lines within this file are ordered first by userId, then, within user, by movieId.

Tags are user-generated metadata about movies. Each tag is typically a single word or short phrase. The meaning, value, and purpose of a particular tag is determined by each user.

Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.

Movies Data File Structure (movies.csv)

Movie information is contained in the file movies.csv. Each line of this file after the header row represents one movie, and has the following format:

movieId,title,genres
Movie titles are entered manually or imported from https://www.themoviedb.org/, and include the year of release in parentheses. Errors and inconsistencies may exist in these titles.

Genres are a pipe-separated list, and are selected from the following:

Action
Adventure
Animation
Children's
Comedy
Crime
Documentary
Drama
Fantasy
Film-Noir
Horror
Musical
Mystery
Romance
Sci-Fi
Thriller
War
Western
(no genres listed)
Links Data File Structure (links.csv)

Identifiers that can be used to link to other sources of movie data are contained in the file links.csv. Each line of this file after the header row represents one movie, and has the following format:

movieId,imdbId,tmdbId
movieId is an identifier for movies used by https://movielens.org. E.g., the movie Toy Story has the link https://movielens.org/movies/1.

imdbId is an identifier for movies used by http://www.imdb.com. E.g., the movie Toy Story has the link http://www.imdb.com/title/tt0114709/.

tmdbId is an identifier for movies used by https://www.themoviedb.org. E.g., the movie Toy Story has the link https://www.themoviedb.org/movie/862.

Use of the resources listed above is subject to the terms of each provider.

Tag Genome (genome-scores.csv and genome-tags.csv)

This data set includes a current copy of the Tag Genome.

The tag genome is a data structure that contains tag relevance scores for movies. The structure is a dense matrix: each movie in the genome has a value for every tag in the genome.

As described in this article, the tag genome encodes how strongly movies exhibit particular properties represented by tags (atmospheric, thought-provoking, realistic, etc.). The tag genome was computed using a machine learning algorithm on user-contributed content including tags, ratings, and textual reviews.

The genome is split into two files. The file genome-scores.csv contains movie-tag relevance data in the following format:

movieId,tagId,relevance
The second file, genome-tags.csv, provides the tag descriptions for the tag IDs in the genome file, in the following format:

tagId,tag
The tagId values are generated when the data set is exported, so they may vary from version to version of the MovieLens data sets.

#### Cross-Validation

Prior versions of the MovieLens dataset included either pre-computed cross-folds or scripts to perform this computation. We no longer bundle either of these features with the dataset, since most modern toolkits provide this as a built-in feature. If you wish to learn about standard approaches to cross-fold computation in the context of recommender systems evaluation, see LensKit for tools, documentation, and open-source code examples.},
superseded= {},
terms= {### Usage License

Neither the University of Minnesota nor any of the researchers involved can guarantee the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set. The data set may be used for any research purposes under the following conditions:

The user may not state or imply any endorsement from the University of Minnesota or the GroupLens Research Group.
The user must acknowledge the use of the data set in publications resulting from the use of the data set (see below for citation information).
The user may not redistribute the data without separate permission.
The user may not use this information for any commercial or revenue-bearing purposes without first obtaining permission from a faculty member of the GroupLens Research Project at the University of Minnesota.
The executable software scripts are provided "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of them is with you. Should the program prove defective, you assume the cost of all necessary servicing, repair or correction.
In no event shall the University of Minnesota, its affiliates or employees be liable to you for any damages arising out of the use or inability to use these programs (including but not limited to loss of data or data being rendered inaccurate).

If you have any further questions or comments, please email grouplens-info@umn.edu

### Citation

To acknowledge use of the dataset in publications, please cite the following paper:

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872}
}

</description>
<link>https://academictorrents.com/download/296054417b4d8eeeb4c7b1c842570bf792ee4d14</link>
</item>
<item>
<title>Udacity Self Driving Car Dataset 3-1: El Camino (Dataset)</title>
<description>@article{,
title= {Udacity Self Driving Car Dataset 3-1: El Camino},
keywords= {},
journal= {},
author= {},
year= {},
url= {},
license= {},
abstract= {Dataset of two drives from the Udacity office to San Francisco up (and down) El Camino Real, the path of the final drive and where the test sets of Challenges 2 &amp; 3 will take place. Sunny afternoon and evening drives, an attempt was made to stay in the same lane, but obstacles and construction sometimes required lane changes. While this is an official dataset for Challenge 3, it has all required information to be used in Challenge 2. Note, only the center camera feed will be available in the test set.

Also, this dataset includes Velodyne VLP-16 LIDAR packets. This is so that you may see the format of the LIDAR we will be publishing, but it is not useful (or allowed) in Challenges 2&amp;3.

# To utilize compressed image topics
You need the install a dependency:

$ sudo apt-get install ros-indigo-image-transport*

# To playback data
copy the udacity_launch package found on our github project () to your catkin workspace, compile and source so that it is reachable.

Location of launch files: https://github.com/udacity/self-driving-car/tree/master/datasets/udacity_launch

$ cd udacity-dataset-2-1
$ rosbag play --clock *.bag
$ roslaunch udacity_launch bag_play.launch

# For visualization

$ roslaunch udacity_launch rviz.launch

# Dataset Info
MD5:13f107727bed0ee5731647b4e114a545

file:udacity-dataset_2016-10-20-13-46-48_0.bag
duration:1hr 25:26s (5126s)
start:Oct 20 2016 13:46:48.34 (1476996408.34)
end:Oct 20 2016 15:12:15.15 (1477001535.15)

file:udacity-dataset_2016-10-20-15-13-30_0.bag
duration:1hr 58:44s (7124s)
start:Oct 20 2016 15:13:30.91 (1477001610.91)
end:Oct 20 2016 17:12:15.64 (1477008735.64)},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/c9dae89d2e3897e6aa98c0c8196348c444998a2a</link>
</item>
<item>
<title>vgg19_normalized.pkl (Dataset)</title>
<description>@article{,
title= {vgg19_normalized.pkl},
keywords= {},
journal= {},
author= {},
year= {},
url= {http://www.robots.ox.ac.uk/~vgg/research/very_deep/},
license= {},
abstract= {This is a Python pickle of the parameters for a VGG-16 layer model implemented in Lasagne.
},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/854efbd8e2c085e8e0e5fb2d254ed0e21da6008e</link>
</item>
<item>
<title>MNIST Database (mnist.pkl.gz) (Dataset)</title>
<description>@article{,
title= {MNIST Database (mnist.pkl.gz)},
keywords= {mnist.pkl.gz},
journal= {},
author= {Christopher J.C. Burges and Yann LeCun and Corinna Cortes },
year= {},
url= {},
license= {},
abstract= {The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

With some classification methods (particuarly template-based methods, such as SVM and K-nearest neighbors), the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications.

The MNIST database was constructed from NIST's Special Database 3 and Special Database 1 which contain binary images of handwritten digits. NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected among Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database by mixing NIST's datasets.

The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made sure that the sets of writers of the training set and test set were disjoint.

SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly 30,000 examples each. The new training set was completed with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,000 from SD-3) is available on this site. The full 60,000 sample training set is available.

Many methods have been tested with this training set and test set. Here are a few examples. Details about the methods are given in an upcoming paper. Some of those experiments used a version of the database where the input images where deskewed (by computing the principal axis of the shape that is closest to the vertical, and shifting the lines so as to make it vertical). In some other experiments, the training set was augmented with artificially distorted versions of the original training samples. The distortions are random combinations of shifts, scaling, skewing, and compression. 


},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/323a0048d87ca79b68f12a6350a57776b6a3b7fb</link>
</item>
<item>
<title>Udacity Dataset 2-3 Compressed (Dataset)</title>
<description>@article{,
title= {Udacity Dataset 2-3 Compressed},
keywords= {},
journal= {},
author= {Udacity, Auro Robotics},
year= {},
url= {},
license= {MIT},
abstract= {3 hours of daytime driving on highway and city locations. Includes three cameras, can, diagnostic, and other data.

We’re Building an Open Source Self-Driving Car, and we want your help!

At Udacity, we believe in democratizing education. How can we provide opportunity to everyone on the planet? We also believe in teaching really amazing and useful subject matter. When we decided to build the Self-Driving Car Nanodegree program, to teach the world to build autonomous vehicles, we instantly knew we had to tackle our own self-driving car too.

Together with Google Self-Driving Car founder and Udacity President Sebastian Thrun, we formed our core Self-Driving Car Team. One of the first decisions we made? Open source code, written by hundreds of students from across the globe!

https://github.com/udacity/self-driving-car

To playback data
=================
copy the udacity_launch package to you catkin workspace,
compile and source so that it is reachable.

cd udacity-dataset-2-1
rosbag play --clock *.bag
roslaunch udacity_launch bag_play.launch

#For visualization
roslaunch udacity_launch rviz.launch},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/1d7fa5116a809b1537bf521fd19897de5d69b7a3</link>
</item>
<item>
<title>Udacity Self-Driving Car Driving Data 9/29/2016 (dataset.bag.tar.gz) (Dataset)</title>
<description>@article{,
title= {Udacity Self-Driving Car Driving Data 9/29/2016 (dataset.bag.tar.gz)},
keywords= {},
journal= {},
author= {Udacity},
year= {},
url= {https://github.com/udacity/self-driving-car},
license= {},
abstract= {| Date      | Lighting Conditions | Duration | Compressed Size | Uncompressed | MD5                              |
|-----------|---------------------|----------|-----------------|--------------|-----------------|
| 9/29/2016 | Sunny               | 12:40    | 25G             | 40G          | 33a10f7835068eeb29b2a3274c216e7d |},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/6011d0e932970efc999809e9cafab8e791c93bb8</link>
</item>
<item>
<title>Udacity Self-Driving Car Dataset 2-2 (Dataset)</title>
<description>@article{,
title= {Udacity Self-Driving Car Dataset 2-2},
keywords= {},
journal= {},
author= {Udacity, Auro Robotics},
year= {},
url= {},
license= {MIT},
abstract= {We’re Building an Open Source Self-Driving Car, and we want your help!

At Udacity, we believe in democratizing education. How can we provide opportunity to everyone on the planet? We also believe in teaching really amazing and useful subject matter. When we decided to build the Self-Driving Car Nanodegree program, to teach the world to build autonomous vehicles, we instantly knew we had to tackle our own self-driving car too.

Together with Google Self-Driving Car founder and Udacity President Sebastian Thrun, we formed our core Self-Driving Car Team. One of the first decisions we made? Open source code, written by hundreds of students from across the globe!

https://github.com/udacity/self-driving-car},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/bcde779f81adbaae45ef69f9dd07f3e76eab3b27</link>
</item>
<item>
<title>Udacity Self-Driving Car Dataset 2-1 (Dataset)</title>
<description>@article{,
title= {Udacity Self-Driving Car Dataset 2-1},
keywords= {},
journal= {},
author= {Udacity, Auro Robotics},
year= {},
url= {},
license= {MIT},
abstract= {We’re Building an Open Source Self-Driving Car, and we want your help!

At Udacity, we believe in democratizing education. How can we provide opportunity to everyone on the planet? We also believe in teaching really amazing and useful subject matter. When we decided to build the Self-Driving Car Nanodegree program, to teach the world to build autonomous vehicles, we instantly knew we had to tackle our own self-driving car too.

Together with Google Self-Driving Car founder and Udacity President Sebastian Thrun, we formed our core Self-Driving Car Team. One of the first decisions we made? Open source code, written by hundreds of students from across the globe!

https://github.com/udacity/self-driving-car},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/f2666220bb74417dfc43815b710a1565cd1a6b76</link>
</item>
<item>
<title>A collection of IRONMAN, IRONMAN 70.3 and Ultra-triathlon race results (Dataset)</title>
<description>@article{,
title= {A collection of IRONMAN, IRONMAN 70.3 and Ultra-triathlon race results},
keywords= {IRONMAN, Ultra-triathlon, Race results, Sport},
journal= {},
author= {Iztok F., Dušan F.},
year= {},
url= {},
license= {},
abstract= {The presented Technical Report presents a collection of IRONMAN,
IRONMAN 70.3 and Ultra-triathlon race results that was scraped from
the web. The collection is intended for data mining purposes.},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/2269d7d1c77375aea732eea0905e370d4741575f</link>
</item>
<item>
<title>Modified PubMed Dataset used by WSU-IR team at TREC 2015 Clinical Decision Support Track (Dataset)</title>
<description>@inproceedings{balaneshin2015wsu,
title= {Modified PubMed Dataset used by WSU-IR team at TREC 2015 Clinical Decision Support Track},
author= {Balaneshin-kordan, Saeid and Kotov, Alexander and Xisto, Railan},
abstract= {The corresponding paper to this dataset describes participation of WSU-IR group in TREC 2015 Clinical Decision Support (CDS) track. We present a Markov Random Fields-based retrieval model and an optimization method for jointly weighting statistical and semantic unigram, bigram and multi-phrase concepts from the query and PRF documents as well as three specific instantiations of this model that we used to obtain the runs submitted for each task in this track. These instantiations consider different types of concepts and use different parts of topics as queries.},
keywords= {Markov Random Fields, Optimization, Pseudo Relevance Feedback, UMLS, Clincal Decision Support, Semantic Analysis},
terms= {}
}

</description>
<link>https://academictorrents.com/download/371a9244d2e9344a196a449f898e0a4385b6b43a</link>
</item>
<item>
<title>UCSD Pedestrian Database (Dataset)</title>
<description>@article{,
title= {UCSD Pedestrian Database},
journal= {},
author= {Statistical Visual Computing Lab},
year= {2008},
url= {http://www.svcl.ucsd.edu/projects/peoplecnt/db/readme.pdf},
abstract= {This is the UCSD pedestrian database used in “Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures”

## Database Format

The database contains video of pedestrians on UCSD walkways, taken from a stationary camera.
All videos are 8-bit grayscale, with dimensions 238 × 158 at 10 fps. The database is split into
scenes, taken from different viewpoints (currently, only one scene is available...more are coming).
Each scene is in its own directory vidX where X is a letter (e.g. vidf), and is split into video clips
of length 200 named vidfXY 33 ZZZ.y, where Y and ZZZ are numbers. Finally, each video clip is
saved as a set of .png files. Examples from each scene are presented in Figure 1. If you use this
database, please reference.


![](https://i.imgur.com/VTk15Rx.png)},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/fed43599b7e8e0a0fbe1e22062cdb54d36cf951d</link>
</item>
<item>
<title>Gland Segmentation in Histology Images Challenge (GlaS) Dataset (Dataset)</title>
<description>@article{,
title= {Gland Segmentation in Histology Images Challenge (GlaS) Dataset},
keywords= {},
journal= {},
author= {Korsuk Sirinukunwattana},
year= {},
url= {http://www2.warwick.ac.uk/fac/sci/dcs/research/combi/research/bic/glascontest/},
license= {},
abstract= {![](http://www2.warwick.ac.uk/fac/sci/dcs/research/combi/research/bic/glascontest/glas3resize2.png)


"We aim to bring together researchers who are interested in the gland segmentation problem, to validate the performance of their existing or newly invented algorithms on the same standard dataset. In this challenge, we will provide the participants with images of Haematoxylin and Eosin (H&amp;E) stained slides, consisting of a wide range of histologic grades."


![](https://i.imgur.com/GzSJCu4.png)


## Introduction

Glands are important histological structures which are present in most organ systems as the main mechanism for secreting proteins and carbohydrates. It has been shown that malignant tumours arising from glandular epithelium, also known as adenocarcinomas, are the most prevalent form of cancer. The morphology of glands has been used routinely by pathologists to assess the degree of malignancy of several adenocarcinomas, including prostate, breast, lung, and colon.

Accurate segmentation of glands is often a crucial step to obtain reliable morphological statistics. Nonetheless, the task by nature is very challenging due to the great variation of glandular morphology in different histologic grades. Up until now, the majority of studies focus on gland segmentation in healthy or benign samples, but rarely on intermediate or high grade cancer, and quite often, they are optimised to specific datasets.

In this challenge, participants are encouraged to run their gland segmentation algorithms on images of Hematoxylin and Eosin (H&amp;E) stained slides, consisting of a variety of histologic grades. The dataset is provided together with ground truth annotations by expert pathologists. The participants are asked to develop and optimise their algorithms on the provided training dataset, and validate their algorithm on the test dataset.

## Data Description

The challenge will be conducted on a dataset, acquired by a team of pathologists at the University Hospitals Coventry and Warwickshire, UK. Details of the dataset are as follows.

| Dataset                    | Warwick-QU |
|----------------------------|------------|
| Cancer Type                |  Colorectal Cancer          | 
| Resolution/                |    20X (0.62005 \mu{m}/pixel)        |
| Scanner |      Zeiss MIRAX MIDI          |
| Number of Images                  |     165       |
| Format                     |   bmp       | 


The composition of the dataset is as follows.

| Split    | Warwick-QU                 |
|----------|----------------------------|
| Training | benign : 37 malignant : 48 |
| Test     | benign : 37 malignant : 43 |

The ground truth for each image in the training dataset is stored in a BMP file, one ground truth object per label.

## Challenge Tasks

After registration, the team will receive a username and password for downloading the training datasets. Each team are asked to submit a short paper, which includes a description of their segmentation algorithm and some preliminary results on the training dataset. See submission section for more details. Teams that have submitted a short paper will be invited to present their work at GlaS challenge at MICCAI 2015. The test dataset will be made available upon the acceptance of your invitation. The organisers will evaluate the performance of a segmentation algorithm based on the test datasets and announce the final competition result at the GlaS Challenge event.
},
superseded= {},
terms= {The dataset used in this competition is provided for research purposes only. Commercial uses are not allowed.
If you intend to publish research work that uses this dataset, you must cite our review paper to be published after the competition

K. Sirinukunwattana, J. P. W. Pluim, H. Chen, X Qi, P. Heng, Y. Guo, L. Wang, B. J. Matuszewski, E. Bruni, U. Sanchez, A. Böhm, O. Ronneberger, B. Ben Cheikh, D. Racoceanu, P. Kainz, M. Pfeiffer, M. Urschler, D. R. J. Snead, N. M. Rajpoot, "Gland Segmentation in Colon Histology Images: The GlaS Challenge Contest" http://arxiv.org/abs/1603.00275 [Preprint]

AND the following paper, wherein the same dataset was first used:
K. Sirinukunwattana, D.R.J. Snead, N.M. Rajpoot, "A Stochastic Polygons Model for Glandular Structures in Colon Histology Images," in IEEE Transactions on Medical Imaging, 2015
doi: 10.1109/TMI.2015.2433900}
}

</description>
<link>https://academictorrents.com/download/208814dd113c2b0a242e74e832ccac28fcff74e5</link>
</item>
<item>
<title>The Cars Overhead With Context (COWC) (Dataset)</title>
<description>@article{,
title= {The Cars Overhead With Context (COWC)},
journal= {},
author= {},
year= {},
url= {http://gdo-datasci.ucllnl.org/cowc/},
abstract= {The Cars Overhead With Context (COWC) data set is a large set of annotated cars from overhead. It is useful for training a device such as a deep neural network to learn to detect and/or count cars. More information can be obtained by reading our paper here.

The dataset has the following attributes:

(1) Data from overhead at 15 cm per pixel resolution at ground (all data is EO). 

(2) Data from six distinct locations: Toronto Canada, Selwyn New Zealand, Potsdam and Vaihingen Germany, Columbus and Utah United States. 

(3) 32,716 unique annotated cars. 58,247 unique negative examples.

(4) Intentional selection of hard negative examples.

(5) Established baseline for detection and counting tasks.

(6) Extra testing scenes for use after validation.


Data can be downloaded from our FTP server. The data includes wide area imagery with annotations as well as precompiled image sets for training/validation of classification and counting. Examples of the precompiled image sets are seen on the right.

The dataset and research to create this data was done by members of the Computer Vision group within the Computation Engineering Division at Lawrence Livermore National Laboratory under grant from NA-22 in the Global Security Directorate. No Llamas were harmed in the creation of this set.

![](https://i.imgur.com/0dsvDo0.jpg)},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/210dfc51f11dcfced602ad226962b7590e08c50a</link>
</item>
<item>
<title>Avantes Dual Spectrograph.zip (Dataset)</title>
<description>@article{,
title= {Avantes Dual Spectrograph.zip},
keywords= {VM},
journal= {},
author= {Austin Kootz},
year= {},
url= {},
license= {},
abstract= {For debugging dual spectrograph.},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/ff051a9469c9ceda93ea914a21639a639cbae793</link>
</item>
<item>
<title>True Marble Global Image Dataset GeoTIFF (Dataset)</title>
<description>@article{,
title= {True Marble Global Image Dataset GeoTIFF},
keywords= {},
journal= {},
author= {Unearthed Outdoors},
year= {},
url= {http://www.unearthedoutdoors.net/global_data/true_marble/download},
license= {Creative Commons Attribution 3.0 United States License.},
abstract= {Download and use the 250m True Marble global dataset for free! This is a low resolution version of our full 15m product, but it is quite useful. Download to use on your web page or preview a purchase. We only ask that you display our copyright and reference this page when using it.

Two types of files are available for download: GeoTIFF and PNG. The GeoTIFF files are better suited for GIS programs, but are generally a larger file size. The PNG files are for general image processing programs, but are not georeferenced. Most of these files are much too large for your web browser to display, so be sure to save the file directly to disk.

![](http://www.unearthedoutdoors.net/imgs/global_data/thumbs/TrueMarble.500m.A1.jpg)
![](http://www.unearthedoutdoors.net/imgs/global_data/thumbs/TrueMarble.500m.B1.jpg)
![](http://www.unearthedoutdoors.net/imgs/global_data/thumbs/TrueMarble.500m.C1.jpg)
![](http://www.unearthedoutdoors.net/imgs/global_data/thumbs/TrueMarble.500m.D1.jpg)

![](http://www.unearthedoutdoors.net/imgs/global_data/thumbs/TrueMarble.500m.A2.jpg)
![](http://www.unearthedoutdoors.net/imgs/global_data/thumbs/TrueMarble.500m.B2.jpg)
![](http://www.unearthedoutdoors.net/imgs/global_data/thumbs/TrueMarble.500m.C2.jpg)
![](http://www.unearthedoutdoors.net/imgs/global_data/thumbs/TrueMarble.500m.D2.jpg)},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/b9b284d9c0074846fee28e78aac4440fd7c0f51c</link>
</item>
<item>
<title>Sentiment Labelled Sentences Data Set  (Dataset)</title>
<description>@article{,
title= {Sentiment Labelled Sentences Data Set },
keywords= {},
journal= {},
author= {},
year= {},
url= {},
license= {},
abstract= {This dataset was created for the Paper 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015 
Please cite the paper if you want to use it :) It contains sentences labelled with positive or negative sentiment. 

### Format: 
sentence score 

### Details: 
Score is either 1 (for positive) or 0 (for negative)
The sentences come from three different websites/fields: 

imdb.com 
amazon.com 
yelp.com 

For each website, there exist 500 positive and 500 negative sentences. Those were selected randomly for larger datasets of reviews. 
We attempted to select sentences that have a clearly positive or negative connotaton, the goal was for no neutral sentences to be selected. 



### Attribute Information:
The attributes are text sentences, extracted from reviews of products, movies, and restaurants


### Relevant Papers:
'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015
},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/07e05fc1229555e124df72160a01b2540d04cebf</link>
</item>
<item>
<title>Enron Email Dataset (Dataset)</title>
<description>@article{,
title= {Enron Email Dataset},
keywords= {email, enron},
journal= {},
author= {Enron},
year= {},
url= {https://www.cs.cmu.edu/~enron/},
license= {},
abstract= {To quote the data source:
"This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.
The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems, and it is thanks to them (not me) that the dataset is available. The dataset here does not include attachments, and some messages have been deleted "as part of a redaction effort due to requests from affected employees". Invalid email addresses were converted to something of the form user@enron.com whenever possible (i.e., recipient is specified in some parse-able format like "Doe, John" or "Mary K. Smith") and to no_address@enron.com when no recipient was specified.

I get a number of questions about this corpus each week, which I am unable to answer, mostly because they deal with preparation issues and such that I just don't know about. If you ask me a question and I don't answer, please don't feel slighted.

I am distributing this dataset as a resource for researchers who are interested in improving current email tools, or understanding how email is currently used. This data is valuable; to my knowledge it is the only substantial collection of "real" email that is public. The reason other datasets are not public is because of privacy concerns. In using this dataset, please be sensitive to the privacy of the people involved (and remember that many of these people were certainly not involved in any of the actions which precipitated the investigation.)"},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/4697a6e1e7841602651b087d84f904d43590d4ff</link>
</item>
<item>
<title>ISBI Challenge: Segmentation of neuronal structures in EM stacks (Dataset)</title>
<description>@article{,
title= {ISBI Challenge: Segmentation of neuronal structures in EM stacks},
keywords= {},
journal= {},
author= {Albert Cardon and Stephan Saalfeld and Stephan Preibisch and Benjamin Schmid and Anchi Cheng and Jim Pulokas and Pavel Tomancak and Volker Hartenstein},
year= {},
url= {},
license= {},
abstract= {|File Name|Description|
|------|
|train-volume.tif (7.5 MB)|Original training image, 8-bit grayscale, 512x512x30 pixels|
|train-labels.tif (7.5 MB)|Training image labels (0 - membranes, 255 - non-membranes), 8-bit grayscale, 512x512x30 pixels|
|test-volume.tif (7.5 MB)|Test image, 8-bit grayscale, 512x512x30 pixels|

The training and test datasets are two stacks of 30 sections from a serial section Transmission Electron Microscopy (ssTEM) data set of the Drosophila first instar larva ventral nerve cord (VNC). The microcube measures 2 x 2 x 1.5 microns approx., with a resolution of 4x4x50 nm/pixel.},
superseded= {},
terms= {License:
You are free to use this data set for the purpose of generating or testing non-commercial image segmentation software. If any scientific publications derive from the usage of this data set, you must cite TrakEM2 and the following publication:

Cardona A, Saalfeld S, Preibisch S, Schmid B, Cheng A, Pulokas J, Tomancak P, Hartenstein V. 2010. An Integrated Micro- and Macroarchitectural Analysis of the Drosophila Brain by Computer-Assisted Serial Section Electron Microscopy. PLoS Biol 8(10): e1000502. doi:10.1371/journal.pbio.1000502.}
}

</description>
<link>https://academictorrents.com/download/42714f859770f1a9d8b27985f9f16ea17e8ba2f6</link>
</item>
<item>
<title>QuantQuote Free Historical Stock Data 2013 (Dataset)</title>
<description>@article{,
title= {QuantQuote Free Historical Stock Data 2013},
keywords= {},
journal= {},
author= {QuantQuote},
year= {2013},
url= {https://quantquote.com/historical-stock-data},
license= {},
abstract= {The files are formatted as follows:

Date, Time, Open, High, Low, Close, Volume
Date – This provides the date as an integer where 20100527 would represent May 27th, 2010.
Time – This gives the time as an integer where 1426 would represent 2:26PM EST.
Open – The open price.
High – The high price.
Low – The low price.
Close – The close price.
Volume – The trading volume during the interval. Note that it is extremely difficult to get accurate volume information. The volume is adjusted for splits so that the total value of shares traded remains constant even if a split occurs.

https://quantquote.com/docs/QuantQuote_Minute.pdf},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/49daf05ef35c487331013c22450988bbf7e511b0</link>
</item>
<item>
<title>Yelp Restaurant Photo Classification Data (Dataset)</title>
<description>@article{,
title= {Yelp Restaurant Photo Classification Data},
keywords= {yelp},
journal= {},
author= {Yelp},
year= {},
url= {https://www.kaggle.com/c/yelp-restaurant-photo-classification},
license= {},
abstract= {At Yelp, there are lots of photos and lots of users uploading photos. These photos provide rich local business information across categories. Teaching a computer to understand the context of these photos is not an easy task. Yelp engineers work on deep learning image classification projects in-house, and you can read about them here. 

In this competition, you are given photos that belong to a business and asked to predict the business attributes. There are 9 different attributes in this problem:

0: good_for_lunch
1: good_for_dinner
2: takes_reservations
3: outdoor_seating
4: restaurant_is_expensive
5: has_alcohol
6: has_table_service
7: ambience_is_classy
8: good_for_kids

These labels are annotated by the Yelp community. Your task is to predict these labels purely from the business photos uploaded by users. 

Since Yelp is a community driven website, there are duplicated images in the dataset. They are mainly due to:

users accidentally upload the same photo to the same business more than once (e.g., this and this)
chain businesses which upload the same photo to different branches
Yelp is including these as part of the competition, since these are challenges Yelp researchers face every day. 

File descriptions

train_photos.tgz - photos of the training set
test_photos.tgz - photos of the test set
train_photo_to_biz_ids.csv - maps the photo id to business id
test_photo_to_biz_ids.csv - maps the photo id to business id
train.csv - main training dataset. Includes the business id's, and their corresponding labels. },
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/19c3aa2166d7bfceaf3d76c0d36f812e0f1b87bc</link>
</item>
<item>
<title>Online News Popularity Data Set  (Dataset)</title>
<description>@article{,
title= {Online News Popularity Data Set },
keywords= {},
journal= {},
author= {Kelwin Fernandes and Pedro Vinagre and Paulo Cortez and Pedro Sernadela},
year= {},
url= {},
license= {},
abstract= {##Data Set Information:

* The articles were published by Mashable (www.mashable.com) and their content as the rights to reproduce it belongs to them. Hence, this dataset does not share the original content but some statistics associated with it. The original content be publicly accessed and retrieved using the provided urls. 
* Acquisition date: January 8, 2015 
* The estimated relative performance values were estimated by the authors using a Random Forest classifier and a rolling windows as assessment method. See their article for more details on how the relative performance values were set.


##Attribute Information:

Number of Attributes: 61 (58 predictive attributes, 2 non-predictive, 1 goal field) 

0. url: URL of the article (non-predictive) 
1. timedelta: Days between the article publication and the dataset acquisition (non-predictive) 
2. n_tokens_title: Number of words in the title 
3. n_tokens_content: Number of words in the content 
4. n_unique_tokens: Rate of unique words in the content 
5. n_non_stop_words: Rate of non-stop words in the content 
6. n_non_stop_unique_tokens: Rate of unique non-stop words in the content 
7. num_hrefs: Number of links 
8. num_self_hrefs: Number of links to other articles published by Mashable 
9. num_imgs: Number of images 
10. num_videos: Number of videos 
11. average_token_length: Average length of the words in the content 
12. num_keywords: Number of keywords in the metadata 
13. data_channel_is_lifestyle: Is data channel 'Lifestyle'? 
14. data_channel_is_entertainment: Is data channel 'Entertainment'? 
15. data_channel_is_bus: Is data channel 'Business'? 
16. data_channel_is_socmed: Is data channel 'Social Media'? 
17. data_channel_is_tech: Is data channel 'Tech'? 
18. data_channel_is_world: Is data channel 'World'? 
19. kw_min_min: Worst keyword (min. shares) 
20. kw_max_min: Worst keyword (max. shares) 
21. kw_avg_min: Worst keyword (avg. shares) 
22. kw_min_max: Best keyword (min. shares) 
23. kw_max_max: Best keyword (max. shares) 
24. kw_avg_max: Best keyword (avg. shares) 
25. kw_min_avg: Avg. keyword (min. shares) 
26. kw_max_avg: Avg. keyword (max. shares) 
27. kw_avg_avg: Avg. keyword (avg. shares) 
28. self_reference_min_shares: Min. shares of referenced articles in Mashable 
29. self_reference_max_shares: Max. shares of referenced articles in Mashable 
30. self_reference_avg_sharess: Avg. shares of referenced articles in Mashable 
31. weekday_is_monday: Was the article published on a Monday? 
32. weekday_is_tuesday: Was the article published on a Tuesday? 
33. weekday_is_wednesday: Was the article published on a Wednesday? 
34. weekday_is_thursday: Was the article published on a Thursday? 
35. weekday_is_friday: Was the article published on a Friday? 
36. weekday_is_saturday: Was the article published on a Saturday? 
37. weekday_is_sunday: Was the article published on a Sunday? 
38. is_weekend: Was the article published on the weekend? 
39. LDA_00: Closeness to LDA topic 0 
40. LDA_01: Closeness to LDA topic 1 
41. LDA_02: Closeness to LDA topic 2 
42. LDA_03: Closeness to LDA topic 3 
43. LDA_04: Closeness to LDA topic 4 
44. global_subjectivity: Text subjectivity 
45. global_sentiment_polarity: Text sentiment polarity 
46. global_rate_positive_words: Rate of positive words in the content 
47. global_rate_negative_words: Rate of negative words in the content 
48. rate_positive_words: Rate of positive words among non-neutral tokens 
49. rate_negative_words: Rate of negative words among non-neutral tokens 
50. avg_positive_polarity: Avg. polarity of positive words 
51. min_positive_polarity: Min. polarity of positive words 
52. max_positive_polarity: Max. polarity of positive words 
53. avg_negative_polarity: Avg. polarity of negative words 
54. min_negative_polarity: Min. polarity of negative words 
55. max_negative_polarity: Max. polarity of negative words 
56. title_subjectivity: Title subjectivity 
57. title_sentiment_polarity: Title polarity 
58. abs_title_subjectivity: Absolute subjectivity level 
59. abs_title_sentiment_polarity: Absolute polarity level 
60. shares: Number of shares (target)


##Relevant Papers:

K. Fernandes, P. Vinagre and P. Cortez. A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News. Proceedings of the 17th EPIA 2015 - Portuguese Conference on Artificial Intelligence, September, Coimbra, Portugal.



##Citation Request:

K. Fernandes, P. Vinagre and P. Cortez. A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News. Proceedings of the 17th EPIA 2015 - Portuguese Conference on Artificial Intelligence, September, Coimbra, Portugal.

##Source:

Kelwin Fernandes (kafc â€˜@â€™ inesctec.pt, kelwinfc â€™@â€™ gmail.com) - INESC TEC, Porto, Portugal/Universidade do Porto, Portugal. 
Pedro Vinagre (pedro.vinagre.sousa â€™@â€™ gmail.com) - ALGORITMI Research Centre, Universidade do Minho, Portugal 
Paulo Cortez - ALGORITMI Research Centre, Universidade do Minho, Portugal 
Pedro Sernadela - Universidade de Aveiro},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/95d3b03397a0bafd74a662fe13ba3550c13b7ce1</link>
</item>
<item>
<title>Educational Process Mining (EPM): A Learning Analytics Data Set Data Set  (Dataset)</title>
<description>@article{,
title= {Educational Process Mining (EPM): A Learning Analytics Data Set Data Set },
keywords= {},
journal= {},
author= {Mehrnoosh Vahdatand Luca Oneto and Davide Anguita and Mathias Funk and Matthias Rauterberg},
year= {},
url= {},
license= {},
abstract= {## Data Set Information:

The experiments have been carried out with a group of 115 students of first-year, undergraduate Engineering major of the University of Genoa. 

We carried out this study over a simulation environment named Deeds (Digital Electronics Education and Design Suite) which is used for e-learning in digital electronics. The environment provides learning materials through specialized browsers for the students, and asks them to solve various problems with different levels of difficulty. For more information about the Deeds simulator used for this course look at: [Web Link] 

and to know more about the exercises contents of each session see 'exercises_info.txt'. 

Our data set contains the students' time series of activities during six sessions of laboratory sessions of the course of digital electronics. There are 6 folders containing the studentsâ€™ data per session. Each 'Session' folder contains up to 99 CSV files each dedicated to a specific student log during that session. The number of files in each folder changes due to the number of students present in each session. Each file contains 13 features. See 'features_info.txt' for more details. 

For the details of activities performed by the students during the course, see 'activities_info.txt' 


The data set includes the following files: 
========================================= 

- 'README.txt' 

- 'features_info.txt': contains information about the variables used on the feature vector. 

- 'features.txt': List of all features. 

- 'activities_info.txt': contains information about the variable 'activity'. 

- 'activities.txt': list of all activities. 

- 'exercises_info.txt': contains information about the variable 'exercise'. 

- 'grades_info.txt': contains information about the grade data. 


Data: 
====== 

- 'Processes': contains the data files from Session 1 to 6. 

- 'logs.txt': shows information about the log data per student Id. It shows whether a student has a log in each session (0: has no log, 1: has log). 

- 'final_grades.xlsx': contains the results of the final exam in two sheets. 

- 'intermediate_grades.xlsx': contains the grades for the students' assignments per session. 

- 'final_exam.pdf': shows the content of the final exam (original in Italian). 

- 'final_exam_ENG.pdf': shows the content of the final exam translated in English. 

## Notes: 

For more information about this data set please look at: 

www.la.smartlab.ws 
la '@' smartlab.ws 


## Attribute Information:

The features selected for this data set come from pre-processing of data collected through a logging program. 

Due to ethical reasons and to ensure the anonymity of our users, we cannot share the original log files, instead, we share the data transformed and cleaned in an appropriate format. 

The original logs contain the logging data of client system per approximately a second, while the features are calculated in order to be allocated to a particular activity. 

The features are selected and presented in a suitable format for Process Mining. In this sense, the data is presented per session, per student, and per exercise. Each CSV file belongs to a specific session and a specific student (named by the student Id). Each file contains several exercises of that session presented in 'exercise' feature. Each 'exercise' contains activities, which start-time, end-time, and other features are allocated to that. 

For further information about each feature, see 'features_info.txt'.


## Relevant Papers:

M. Vahdat, L. Oneto, A. Ghio, G. Donzellini, D. Anguita, M. Funk, M. Rauterberg.: A learning analytics methodology to profile students behavior and explore interactions with a digital electronics simulator. In: de Freitas, S., Rensing, C., Ley, T., Munoz-Merino, P.J. (eds.) EC-TEL 2014. LNCS, vol. 8719, pp. 596â€“597. Springer (2014). 

M. Vahdat, A. Ghio, L. Oneto, D. Anguita, M. Funk, M. Rauterberg, Advances in learning analytics and educational data mining, in: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2015). 



## Citation Request:

M. Vahdat, L. Oneto, D. Anguita, M. Funk, M. Rauterberg.: A learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. In: G. Conole et al. (eds.): EC-TEL 2015, LNCS 9307, pp. 352-366. Springer (2015). 
DOI: 10.1007/978-3-319-24258-3 26 

## Source:

Mehrnoosh Vahdat(1,2), Luca Oneto(1), Davide Anguita(1), Mathias Funk (2), and Matthias Rauterberg (2) 

1 - Smartlab - Non-Linear Complex Systems Laboratory 
DIBRIS - UniversitÃ  degli Studi di Genova, Genoa (I-16145), Italy. 
2 - Department of Industrial Design, Eindhoven University of Technology, 5612AZ Eindhoven, The Netherlands 

la '@' smartlab.ws 
},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/e24e083cc337695bb84a2b68707695579c0ab4d8</link>
</item>
<item>
<title>Terrestrial Ecological Systems of the United States (Version 3.0; Updated March 2014) (Dataset)</title>
<description>@article{,
title= {Terrestrial Ecological Systems of the United States (Version 3.0; Updated March 2014)},
journal= {},
author= {NatureServe},
year= {2014},
url= {http://www.natureserve.org/conservation-tools/terrestrial-ecological-systems-united-states},
abstract= {Overview:
NatureServe ecologists lead efforts to develop internationally standardized classifications for terrestrial ecosystems and vegetation.  One classification approach is terrestrial ecological systems, mid- to local- scale ecological units useful for standardized mapping and conservation assessments of habitat diversity and landscape conditions. Each ecological system type describes complexes of plant communities influenced by similar physical environments and dynamic ecological processes (like fire or flooding). The classification defines some 800 units across the United States and has provided an effective means of mapping ecological concepts at regional/national scales in greater detail than was previously possible.  

![](http://i.imgur.com/ygP3gfS.jpg)},
keywords= {Classification Concepts and Maps for Ecosystem Assessment, Planning, Management and Monitoring Data, Maps &amp; Tools, United States},
terms= {}
}

</description>
<link>https://academictorrents.com/download/f1f67ca3faef718afcc35a530eebbd72c20b0eac</link>
</item>
<item>
<title>Labeled Fishes in the Wild (Dataset)</title>
<description>@article{,
title= {Labeled Fishes in the Wild},
journal= {},
author= {Cutter, G. and Stierhoff, K. and Zeng, J.},
year= {},
url= {http://swfscdata.nmfs.noaa.gov/labeled-fishes-in-the-wild/},
abstract= {The labeled fishes in the wild image dataset is provided by NOAA Fisheries (National Marine Fisheries Service) to encourage development, testing, and performance assessment of automated image analysis algorithms for unconstrained underwater imagery.

The dataset includes images of fish, invertebrates, and the seabed that were collected using camera systems deployed on a remotely operated vehicle (ROV) for fisheries surveys. Annotation data are included in accompanying data files (.dat, .vec, and .info) that describe the locations of the marked fish targets in the images.

The manuscript (Cutter et al., 2015) demonstrates methods for automated detection of fish based on classifiers developed using the training image dataset, and evaluated using the test set. This dataset is offered for further development of detection of fish or invertebrates in complex environments; tracking of multiple animal targets in video image sequences; recognition and classification of animal species; measurement of animals in stereo image pairs; and characterization of seabed habitats.

Recommended citation: Cutter, G.; Stierhoff, K.; Zeng, J. (2015) "Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: labeled fishes in the wild," IEEE Winter Conference on Applications of Computer Vision Workshops, pp. 57-62.

The NOAA scientists who are stewards of these data may have archives of images that can provide additional opportunities for collaboration to apply and assess algorithms. Credit for use of these datasets should be provided in publications, as described in the “how-to-cite.txt” documents included in the dataset archive or as shown above.

##Dataset
Labeled Fishes in the Wild image dataset (v. 1.1) (Download 423 MB).
Labeled fishes in the wild has three components: a training and validation positive image set (verified fish), a negative image set (non-fish), and a test image set. The training and test sets have accompanying annotation data that define the location and extent of each marked fish target object in the images. These represent bounding rectangles defined by expert analysts, and are in the format of .dat files used by OpenCV.

Training and validation positive image set: contains images of rockfish (Sebastes spp.) and other associated species near the seabed, collected using a forward-oblique-looking digital still camera deployed on a remotely operated vehicle (ROV) by the Southwest Fisheries Science Center during surveys of rocky seabed environments offshore of southern California. Still frames from these cameras represent instances during a survey where the ROV was moving slowly, and motion effects are not a factor. The training set comprises 929 image files, containing 1005 marked fish with associated annotations (their marked locations and bounding rectangles). The marks define fish of various species, sizes, and ranges to the camera, and includes portions of different background composition.

Training and validation negative image set: includes 3167 images. The 147 seabed negative images provided in the downloadable archive were extracted from the labeled fishes in the wild training and test image sets (regions containing no fish were extracted). The remaining 3020 images are available from the tutorial on OpenCV HaarTraining, and available from the data negatives directory.

Test image set: contains an image sequence collected using the ROV’s high-definition (HD; 1080i) video camera during a near-seabed survey of fish. The test imagery for detection comprises video footage from ROV surveys. The video clip (“TEST_VIDEO_ROV10.mp4”; 210 frames at 3 frames per second (fps)) used to evaluate detectors for this study represents every 10th frame of the original video sequence (2-minute duration, approximately 30 fps). All fish targets are annotated for the 210-frame, 3fps test video. Annotations of fish in the test video include a descriptor, “verified” or “apparent,” where verified indicates that a video analyst could identify the fish as such, and apparent objects were believed to be fish, but were not verifiable based on attributes visible in a single frame. These apparent fish may appear as faint blobs in the distance. These distinctions are made in the annotation data because we believe that some classifiers will detect these apparent fish, but we do not expect the classifier to do so; nor do we necessarily want the detector to do so. That is, if a classifier is detecting those apparent fish, then it is probably detecting many other non-fish targets in the images, thereby making it inefficient and impractical. A total of 2061 fish objects were marked in the annotated frames of the dataset test video. Of those, 1008 were verified fish, and 1053 were apparent fish. During the sequence the ROV is moving; the background appears to be moving and is illuminated from different directions (as the ROV moves and rotates); small particles in the water current stream past; fish are still or moving at various speeds; fish are oriented in many directions; some fish are hidden partially behind rocks or in crevices; some indistinct fish-like objects appear in the distance.

The original Labeled fishes in the wild dataset (v1.0, Dec. 2014) contained only the decimated test video sequence ("Test_ROV_video_h264_decim.mp4") that contained only the marked frames from the original video. One tenth of the frames of the full frame-rate video were marked for locations of fish targets. This version of the dataset (v1.1, Jan. 2015) also contains the full test video sequence ("Test_ROV_video_h264_full.mp4"). Both the full and decimated videos have accompanying text files with analyst marks (following OpenCV .dat file conventions). Generally, for m marks, the format is: Video-filename(frame#) #-of-marks x1 y1 w1 h1 x2 y2 w2 h2 ... xm ym wm hm. For example, in the case of two marks, the final eight values define the bounding rectangles: Test_ROV_video_h264_full.mp4(fr_14) 2 1021 362 94 63 953 289 90 61. The marks file for the decimated video ("Test_ROV_video_h264_decim_marks.dat") indicates the frame number for the decimated and full sequence, e.g. Test_ROV_video_h264_decim.mp4(fr_1)(fullfr_14) 2 1021 362 94 63 953 289 90 61. There are 2101 frames in the full video and 210 frames in the decimated video, but 206 frames were marked; i.e. a few of the examined frames did not contain fish.

Contact: george dot cutter at noaa dot gov.

![](http://i.imgur.com/U9Y7nNc.gif)},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/41bc10c77d54b49fb0a96ff5d4a0814bc2ab7da7</link>
</item>
<item>
<title>BuzzFeed News transcription of Airbnb NYC data (Dataset)</title>
<description>@article{,
title= {BuzzFeed News transcription of Airbnb NYC data},
journal= {},
author= {BuzzFeed and AirBnb},
year= {},
url= {https://www.airbnbaction.com/blog/data-on-the-airbnb-community-in-nyc},
license= {},
abstract= {"The data in this spreadsheet were transcribed from the dataset referenced in Airbnb's Dec. 1, 2015, blog post.

Because Airbnb did not allow the data to be downloaded, photographed, or copy-pasted, BuzzFeed News copied the data manually over a series of three visits with the company. Some of the worksheets have not been copied in full; ""[...]"" indicates that a particular column of data continues in the original, but were not transcribed.

To the fullest extent possible, BuzzFeed News attempted to avoid transcription errors; some, however, may have snuck through."
},
keywords= {, AirBnb},
terms= {}
}

</description>
<link>https://academictorrents.com/download/968a3ff5e4182cdecd239980ecfd257a37451003</link>
</item>
<item>
<title>NYPD 7 Major Felony Incidents (Dataset)</title>
<description>@article{,
title= {NYPD 7 Major Felony Incidents},
journal= {},
author= {NYPD},
year= {},
url= {https://data.cityofnewyork.us/Public-Safety/NYPD-7-Major-Felony-Incidents/hyij-8hr7},
license= {},
abstract= {Quarterly update of Seven Major Felonies at the incident level. For privacy reasons, incidents have been moved to the midpoint of the street segment on which they occur.},
keywords= {New York, Police, Crime},
terms= {}
}

</description>
<link>https://academictorrents.com/download/5c195d570d910402727638f4ba123d171694fbdc</link>
</item>
<item>
<title>Vincent van Gogh Paintings (Dataset)</title>
<description>@article{,
title= {Vincent van Gogh Paintings},
keywords= {paintings, artwork},
journal= {},
author= {Vincent van Gogh},
year= {},
url= {},
license= {},
abstract= {"Vincent Willem van Gogh was a Dutch post-Impressionist painter whose work had far-reaching influence on 20th-century art. His paintings include portraits, self portraits, landscapes, still lifes of cypresses, wheat fields and sunflowers."  - Wikipedia

##Sample Images:
![](http://i.imgur.com/QbWnjct.png)},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/c8b687c984d3d902310f27d56759ed69f5e1b4a7</link>
</item>
<item>
<title>A collection of sport activity files for data analysis and data mining 2016a (Dataset)</title>
<description>@article{,
title= {A collection of sport activity files for data analysis and data mining 2016a},
keywords= {sport, dataset, gpx},
journal= {Technical report 0101, University of Ljubljana and University of Maribor 2016a},
author= {Samo Rauter et al.},
year= {2016},
url= {},
license= {},
abstract= {Dataset consists of seven cyclists, who upload their activities to Strava and Garmin Connect profiles. Typically, these activities can be downloaded as a GPX format, which basically presents an XML format. Following features of each training can be extracted: GPS location, elevation, duration, distance, heart rate and even power.
},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/af55533bf8229c3bff260b77a652f8b8058f6c9e</link>
</item>
<item>
<title>Structured Web Data Extraction Dataset (SWDE) (Dataset)</title>
<description>@article{,
title= {Structured Web Data Extraction Dataset (SWDE)},
keywords= {},
journal= {},
author= {Qiang Hao},
year= {2011},
url= {https://swde.codeplex.com/},
license= {},
abstract= {## Motivation
This dataset is a real-world web page collection used for research on the automatic extraction of structured data (e.g., attribute-value pairs of entities) from the Web. We hope it could serve as a useful benchmark for evaluating and comparing different methods for structured web data extraction.

## Contents of the Dataset

Currently the dataset involves:

8 verticals with diverse semantics;
80 web sites (10 per vertical);
124,291 web pages (200 ~ 2,000 per web site), each containing a single data record with detailed information of an entity;
32 attributes (3 ~ 5 per vertical) associated with carefully labeled ground-truth of corresponding values in each web page. The goal of structured data extraction is to automatically identify the values of these attributes from web pages.
The involved verticals are summarized as follows:

|Vertical  |#Sites|#Pages|#Attributes|Attributes|
|-----------|------|----------|----------------|-----------------|
|Auto|10|17,923|4|model, price, engine, fuel_economy|
|Book|10|20,000|5|title, author, isbn_13, publisher, publication_date|
|Camera|10  |5,258|3|model, price, manufacturer|
|Job|10|20,000|4|title, company, location, date_posted|
|Movie|10|20,000|4|title, director, genre, mpaa_rating|
|NBA Player|10|  4,405|4|name, team, height, weight|
|Restaurant|10|20,000|4|name, address, phone, cuisine|
|University|10|16,705|4|name, phone, website, type|


# Format of Web Pages

Each web page in the dataset is stored as one .htm file (in UTF-8 encoding) where the first tag encodes the source URL of the page.

# Format of Ground-truth Files

For each web site, the page-level ground-truth of attribute values has been labeled using handcrafted regular expressions and stored in .txt files (in UTF-8 encoding) named as: "&lt;vertical&gt;-&lt;site&gt;-&lt;attribute&gt;.txt".

# In each such file:

The first line stores the names of vertical, site, and attribute, separated by TAB characters ('\t').

The second line stores some statistics (separated by TABs) w.r.t. the corresponding site and attribute, including:

* the total number of pages,
* the number of pages containing attribute values,
* the total number of attribute values contained in the pages,
* the number of unique attribute values.

Each remaining line stores the ground-truth information (separated by TABs) of one page, in sequence of:
*page ID,
*the number of attribute values in the page,
*attribute values ("&lt;NULL&gt;" in case of non-existence).

Notes on Ground-truth Labeling

The ground-truth labeling was conducted in the DOM-node level. More specifically, the candidate attribute values in a web page are the non-empty strings contained in text nodes in the corresponding DOM tree.
One page (although containing a single data record) may contain multiple distinct values that correspond to an attribute (e.g., multiple authors of a book, multiple granularity levels of addresses).
Currently, when a text node presents a mixture of multiple attributes, its string value is labeled with each of these attributes, if no substitute is available.
Before being stored in .txt files, the raw attribute values were refined by removing redundant separators (e.g., ' ', '\t', '\n').

## Reference

We would appreciate it if you cite the following paper when using the dataset:

    Qiang Hao, Rui Cai, Yanwei Pang, and Lei Zhang. "From One Tree to a Forest: a Uniﬁed Solution for Structured Web Data Extraction". in Proc. of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2011), pp.775-784, Beijing, China. July 24-28, 2011.

## Contact

If ﻿you have questions about this dataset, please contact Qiang Hao (haoq@live.com, Homepage).},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/411576c7e80787e4b40452360f5f24acba9b5159</link>
</item>
<item>
<title>Columbia University Image Library (COIL-20) (Dataset)</title>
<description>@article{,
title= {Columbia University Image Library (COIL-20)},
journal= {},
author= {S. A. Nene and S. K. Nayar and H. Murase},
year= {1996},
url= {http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php},
abstract= {To database is available in two versions. The first, [unprocessed], consists of images for five of the objects that contain both the object and the background. The second, [processed], contains images for all of the objects in which the background has been discarded (and the images consist of the smallest square that contains the object). For formal documentation look at the corresponding compressed technical report

"Columbia Object Image Library (COIL-20),"
S. A. Nene, S. K. Nayar and H. Murase,
Technical Report CUCS-005-96, February 1996.},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/1d16994c70b7fff8bfe917f83c397b1193daee7f</link>
</item>
<item>
<title>Boston Hubway Data Visualization Challenge Dataset (Dataset)</title>
<description>@article{,
title= {Boston Hubway Data Visualization Challenge Dataset},
keywords= {},
journal= {},
author= {Massachusetts Department of Transportation (MassDOT)},
year= {},
url= {http://hubwaydatachallenge.org/},
license= {},
abstract= {The Hubway trip history data includes every trip taken through Nov 2013 ? with date, time, origin and destination stations, plus the bike number and more.

Data from 2011/07 through 2013/11

The Hubway trip history data

Every time a Hubway user checks a bike out from a station, the system records basic information about the trip. Those anonymous data points have been exported into the spreadsheet. Please note, all private data including member names have been removed from these files.

What can the data tell us?

The CSV file contains data for every Hubway trip from the system launch on July 28th, 2011, through the end of September, 2012. The file contains the data points listed below for each trip. We've also posed some of the questions you could answer with this dataset - we're sure you.ll have lots more of your own.

Duration - Duration of trip. What's the average trip duration for annual members vs. casual users?
Start date - Includes start date and time. What are the peak Hubway hours?
End date - Includes end date and time. Which days of the week get the most Hubway traffic?
Start station - Includes starting station name and number. Which stations are most popular? Which stations make up the most popular origin/destination pairs?
End station - Includes ending station name and number. Which stations are the most asymmetric - more trips start there than end there, or vice versa? Are they all at the top of hills?
Bike Nr - Includes ID number of bike used for the trip. What does a year in the life of one Hubway bike look like?
Member Type - Lists whether user was an Annual or Casual (1 or 3 day) member. Which stations get the most tourist traffic, and which get the most commuters?
Zip code - Lists the zip code for annual members only. How far does Hubway really reach? Which community should be the next to get Hubway stations?
Birthdate - Lists the year in which annual members were born. Are all of the Hubway rentals at 2:00am by people under 25?
Gender - Lists gender for annual members only. Are there different top stations for male vs. female Hubway members?},
tos= {},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/3e395a74e333156daddcd67d614415fc9e237340</link>
</item>
<item>
<title>Georgia Tech face database (Dataset)</title>
<description>@article{,
title= {Georgia Tech face database},
journal= {},
author= {Ara V. Nefian},
year= {},
url= {http://www.anefian.com/research/face_reco.htm},
abstract= {Georgia Tech face database (128MB) contains images of 50 people taken in two or three sessions between 06/01/99 and 11/15/99 at the Center for Signal and Image Processing at Georgia Institute of Technology. All people in the database are represented by 15 color JPEG images with cluttered background taken at resolution 640x480 pixels. The average size of the faces in these images is 150x150 pixels. The pictures show frontal and/or tilted faces with different facial expressions, lighting conditions and scale. Each image is manually labeled to determine the position of the face in the image. The set of label files is available here. The Readme.txt file gives more details about the database.},
keywords= {Dataset},
terms= {}
}
</description>
<link>https://academictorrents.com/download/0848b2c9b40e49041eff85ac4a2da71ae13a3e4f</link>
</item>
<item>
<title>UCI Folio Leaf Dataset (Dataset)</title>
<description>@article{,
title= {UCI Folio Leaf Dataset},
keywords= {},
journal= {},
author= {Trishen Munisami and Mahess Ramsurn and Somveer Kishnah and Sameerchand Pudaruth},
year= {2015},
url= {https://archive.ics.uci.edu/ml/datasets/Folio},
license= {},
abstract= {Source:
The leaves were taken from plants in the farm of the University of Mauritius and nearby locations. 

Donors: 
Trishen Munisami 
trishen.munisami '@' gmail.com 

Mahess Ramsurn 
ramsurn.mahess '@' umail.uom.ac.mu 

Somveer Kishnah 
s.kishnah '@' uom.ac.mu 

Sameerchand Pudaruth 
sameerchand.pudaruth '@' gmail.com


Data Set Information:
- The leaves were placed on a white background and then photographed. 
- The pictures were taken in broad daylight to ensure optimum light intensity.

Attribute Information:
List of plant species:
1. Beaumier du perou
2. Eggplant
3. Fruitcitere
4. Guava
5. Hibiscus
6. Betel
7. Rose
8. Chrysanthemum
9. Ficus
10. Duranta gold
11. Ashanti blood
12. Bitter Orange
13. Coeur Demoiselle
14. Jackfruit
15. Mulberry Leaf
16. Pimento
17. Pomme Jacquot
18. Star Apple
19. Barbados Cherry
20. Sweet Olive
21. Croton
22. Thevetia
23. Vieux Garcon
24. Chocolate tree
25. Carricature plant
26. Coffee
27. Ketembilla
28. Chinese guava
29. Lychee
30. Geranium
31. Sweet potato
32. Papaya

Relevant Papers:
Munisami, T., Ramsurn, M., Kishnah, S. and Pudaruth, S., 2015. Plant leaf recognition using shape features and colour histogram with k-nearest neighbour classifiers. Procedia Computer Science (Elsevier) Journal. 58, pp. 740-747.

Citation Request:
Munisami, T., Ramsurn, M., Kishnah, S. and Pudaruth, S., 2015. Plant leaf recognition using shape features and colour histogram with k-nearest neighbour classifiers. Procedia Computer Science (Elsevier) Journal. 58, pp. 740-747.},
tos= {},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/a6c64db1e42721f5d7e7aa2b118e293a0d0d335b</link>
</item>
<item>
<title>Stanford EE364A - Convex Optimization I - Boyd (Course)</title>
<description>@article{,
title= {Stanford EE364A - Convex Optimization I - Boyd},
journal= {},
author= {Stephen Boyd},
year= {2008},
url= {https://web.stanford.edu/class/ee364a/},
license= {},
abstract= {Catalog description
Concentrates on recognizing and solving convex optimization problems that arise in applications. Convex sets, functions, and optimization problems. Basics of convex analysis. Least-squares, linear and quadratic programs, semidefinite programming, minimax, extremal volume, and other problems. Optimality conditions, duality theory, theorems of alternative, and applications. Interior-point methods. Applications to signal processing, statistics and machine learning, control and mechanical engineering, digital and analog circuit design, and finance.

Course objectives
to give students the tools and training to recognize convex optimization problems that arise in applications to present the basic theory of such problems, concentrating on results that are useful in computation to give students a thorough understanding of how such problems are solved, and some experience in solving them to give students the background required to use the methods in their own research work or applications

Videos
1. Introduction
2. Convex sets
3. Convex functions
4. Convex optimization problems
5. Duality
6. Approximation and fitting
7. Statistical estimation
8. Geometric problems
9. Numerical linear algebra background
10. Unconstrained minimization
11. Equality constrained minimization
12. Interior-point methods
13. Conclusions
},
keywords= {Optimization, Math},
terms= {}
}

</description>
<link>https://academictorrents.com/download/393dc896234b96a1cd251c14cfc65d2ff594d6e9</link>
</item>
<item>
<title>Caltech CS156 - Machine Learning - Yaser (Course)</title>
<description>@article{,
title= {Caltech CS156 - Machine Learning - Yaser},
journal= {},
author= {Yaser Abu-Mostafa},
year= {2012},
url= {http://work.caltech.edu/telecourse.html},
license= {CC BY-NC-ND},
abstract= {##Outline
This is an introductory course in machine learning (ML) that covers the basic theory, algorithms, and applications. ML is a key technology in Big Data, and in many financial, medical, commercial, and scientific applications. It enables computational systems to adaptively improve their performance with experience accumulated from the observed data. ML has become one of the hottest fields of study today, taken up by undergraduate and graduate students from 15 different majors at Caltech. This course balances theory and practice, and covers the mathematical as well as the heuristic aspects. The lectures below follow each other in a story-like fashion:

* What is learning?
* Can a machine learn?
* How to do it?
* How to do it well?
* Take-home lessons.


Lecture 01 - The Learning Problem - Introduction; supervised, unsupervised, and reinforcement learning. Components of the learning problem.

Lecture 02 - Is Learning Feasible? Can we generalize from a limited sample to the entire space? Relationship between in-sample and out-of-sample.

Lecture 03 - The Linear Model I - Linear classification and linear regression. Extending linear models through nonlinear transforms.

Lecture 04 - Error and Noise - The principled choice of error measures. What happens when the target we want to learn is noisy.

Lecture 05 - Training versus Testing - The difference between training and testing in mathematical terms. What makes a learning model able to generalize?

Lecture 06 - Theory of Generalization - How an infinite model can learn from a finite sample. The most important theoretical result in machine learning.

Lecture 07 - The VC Dimension - A measure of what it takes a model to learn. Relationship to the number of parameters and degrees of freedom.

Lecture 08 - Bias-Variance Tradeoff - Breaking down the learning performance into competing quantities. The learning curves.

Lecture 09 - The Linear Model II - More about linear models. Logistic regression, maximum likelihood, and gradient descent.

Lecture 10 - Neural Networks - A biologically inspired model. The efficient backpropagation learning algorithm. Hidden layers.

Lecture 11 - Overfitting - Fitting the data too well; fitting the noise. Deterministic noise versus stochastic noise.

Lecture 12 - Regularization - Putting the brakes on fitting the noise. Hard and soft constraints. Augmented error and weight decay.

Lecture 13 - Validation - Taking a peek out of sample. Model selection and data contamination. Cross validation.

Lecture 14 - Support Vector Machines - One of the most successful learning algorithms; getting a complex model at the price of a simple one.

Lecture 15 - Kernel Methods - Extending SVM to infinite-dimensional spaces using the kernel trick, and to non-separable data using soft margins.

Lecture 16 - Radial Basis Functions - An important learning model that connects several machine learning models and techniques.

Lecture 17 - Three Learning Principles - Major pitfalls for machine learning practitioners; Occam?s razor, sampling bias, and data snooping.

Lecture 18 - Epilogue - The map of machine learning. Brief views of Bayesian learning and aggregation methods.},
keywords= {},
terms= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/8190b5122515ab158cd29ccdb33ea946a3e529f4</link>
</item>
<item>
<title>A collection of sport activity files for data analysis and data mining (Dataset)</title>
<description>@article{,
title= {A collection of sport activity files for data analysis and data mining},
journal= {Technical report 0201, University of Ljubljana and University of Maribor},
author= {Samo Rauter et al.},
year= {2015},
url= {},
license= {},
abstract= {Dataset consists of the data produced by nine cyclists. Data were directly exported from their Strava or Garmin Connect accounts. Data format of sport's activities could be written in GPX or TCX form, which are basically the XML formats adapted to specific purposes. From each dataset, many following information can be obtained: GPS location, elevation, duration, distance, average and maximal heart rate, while some workouts include also data obtained from power meters.},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/aac04fca4cd3b4dcd580e9018d68fa0647b7d908</link>
</item>
<item>
<title>20150112.json.gz (Dataset)</title>
<description>@misc{,
title = {20150112.json.gz},
author = {Wikidata Project},
year = {2015},
license = {CCZero},
abstract = {JSON dump of the Wikidata of 2015-01-12.}
}</description>
<link>https://academictorrents.com/download/466d6a3794328acc7c068a45f0380ef3ade8345f</link>
</item>
<item>
<title>A general method applicable to the search for similarities in the amino acid sequence of two proteins  (Paper)</title>
<description>@article{Needleman1970443,
title = "A general method applicable to the search for similarities in the amino acid sequence of two proteins ",
journal = "Journal of Molecular Biology ",
volume = "48",
number = "3",
pages = "443 - 453",
year = "1970",
note = "",
issn = "0022-2836",
doi = "http://dx.doi.org/10.1016/0022-2836(70)90057-4",
url = "http://www.sciencedirect.com/science/article/pii/0022283670900574",
author = "Saul B. Needleman and Christian D. Wunsch",
abstract = "A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed. From these findings it is possible to determine whether significant homology exists between the proteins. This information is used to trace their possible evolutionary development. The maximum match is a number dependent upon the similarity of the sequences. One of its definitions is the largest number of amino acids of one protein that can be matched with those of a second protein allowing for all possible interruptions in either of the sequences. While the interruptions give rise to a very large number of comparisons, the method efficiently excludes from consideration those comparisons that cannot contribute to the maximum match. Comparisons are made from the smallest unit of significance, a pair of amino acids, one from each protein. All possible pairs are represented by a two-dimensional array, and all possible comparisons are represented by pathways through the array. For this maximum match only certain of the possible pathways must be evaluated. A numerical value, one in this case, is assigned to every cell in the array representing like amino acids. The maximum match is the largest number that would result from summing the cell values of every pathway. "
}

</description>
<link>https://academictorrents.com/download/8c6a6a95236461d9e249a820a6d67cf3dbf13dc0</link>
</item>
<item>
<title>The Extended Yale Face Database B (Dataset)</title>
<description>@article{,
title = {The Extended Yale Face Database B},
journal = {},
author = {Yale},
year = {2001},
url = {http://vision.ucsd.edu/~iskwak/ExtYaleDatabase/ExtYaleB.html},
abstract = {The extended Yale Face Database B contains 16128 images of 28 human subjects under 9 poses and 64 illumination conditions. The data format of this database is the same as http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html the Yale Face Database B. Please refer to the homepage of the Yale Face Database B for more detailed information of the data format. 

You are free to use the extended Yale Face Database B for research purposes. All publications which use this database should acknowledge the use of "the Exteded Yale Face Database B" and reference Athinodoros Georghiades, Peter Belhumeur, and David Kriegman's paper, "From Few to Many: Illumination Cone Models for Face Recognition under 
Variable Lighting and Pose", PAMI, 2001. 

The extended database as opposed to the original Yale Face Database B with 10 subjects was first reported by Kuang-Chih Lee, Jeffrey Ho, and David Kriegman in "Acquiring Linear Subspaces for Face Recognition under Variable Lighting, PAMI, May, 2005 http://vision.ucsd.edu/~leekc/papers/9pltsIEEE.pdf All test image data used in the experiments are manually aligned, cropped, and then re-sized to 168x192 images.
If you publish your experimental results with the cropped images, please reference the PAMI2005 paper as well.}
}</description>
<link>https://academictorrents.com/download/06e479f338b56fa5948c40287b66f68236a14612</link>
</item>
<item>
<title>The Extended Yale Face Database B (Cropped) (Dataset)</title>
<description>@article{,
title = {The Extended Yale Face Database B (Cropped)},
journal = {},
author = {Yale},
year = {2001},
url = {http://vision.ucsd.edu/~iskwak/ExtYaleDatabase/ExtYaleB.html},
abstract = {This is the cropped version of "The Extended Yale Face Database B"

The extended Yale Face Database B contains 16128 images of 28 human subjects under 9 poses and 64 illumination conditions. The data format of this database is the same as http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html the Yale Face Database B. Please refer to the homepage of the Yale Face Database B for more detailed information of the data format. 

You are free to use the extended Yale Face Database B for research purposes. All publications which use this database should acknowledge the use of "the Exteded Yale Face Database B" and reference Athinodoros Georghiades, Peter Belhumeur, and David Kriegman's paper, "From Few to Many: Illumination Cone Models for Face Recognition under 
Variable Lighting and Pose", PAMI, 2001. 

The extended database as opposed to the original Yale Face Database B with 10 subjects was first reported by Kuang-Chih Lee, Jeffrey Ho, and David Kriegman in "Acquiring Linear Subspaces for Face Recognition under Variable Lighting, PAMI, May, 2005 http://vision.ucsd.edu/~leekc/papers/9pltsIEEE.pdf All test image data used in the experiments are manually aligned, cropped, and then re-sized to 168x192 images.
If you publish your experimental results with the cropped images, please reference the PAMI2005 paper as well.}
}</description>
<link>https://academictorrents.com/download/aad8bf8e6ee5d8a3bf46c7ab5adfacdd8ad36247</link>
</item>
<item>
<title>MNIST Database (Dataset)</title>
<description>@article{,
title= {MNIST Database},
journal= {},
author= {Christopher J.C. Burges and Yann LeCun and Corinna Cortes },
year= {},
url= {http://yann.lecun.com/exdb/mnist/},
abstract= {The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

With some classification methods (particuarly template-based methods, such as SVM and K-nearest neighbors), the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications.

The MNIST database was constructed from NIST's Special Database 3 and Special Database 1 which contain binary images of handwritten digits. NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected among Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database by mixing NIST's datasets.

The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made sure that the sets of writers of the training set and test set were disjoint.

SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly 30,000 examples each. The new training set was completed with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,000 from SD-3) is available on this site. The full 60,000 sample training set is available.

Many methods have been tested with this training set and test set. Here are a few examples. Details about the methods are given in an upcoming paper. Some of those experiments used a version of the database where the input images where deskewed (by computing the principal axis of the shape that is closest to the vertical, and shifting the lines so as to make it vertical). In some other experiments, the training set was augmented with artificially distorted versions of the original training samples. The distortions are random combinations of shifts, scaling, skewing, and compression. 

FILE FORMATS FOR THE MNIST DATABASE

The data is stored in a very simple file format designed for storing vectors and multidimensional matrices. General info on this format is given at the end of this page, but you don't need to read that to use the data files.
All the integers in the files are stored in the MSB first (high endian) format used by most non-Intel processors. Users of Intel processors and other low-endian machines must flip the bytes of the header.

There are 4 files:

train-images-idx3-ubyte: training set images 
train-labels-idx1-ubyte: training set labels 
t10k-images-idx3-ubyte:  test set images 
t10k-labels-idx1-ubyte:  test set labels

The training set contains 60000 examples, and the test set 10000 examples.

The first 5000 examples of the test set are taken from the original NIST training set. The last 5000 are taken from the original NIST test set. The first 5000 are cleaner and easier than the last 5000.

TRAINING SET LABEL FILE (train-labels-idx1-ubyte):

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000801(2049) magic number (MSB first) 
0004     32 bit integer  60000            number of items 
0008     unsigned byte   ??               label 
0009     unsigned byte   ??               label 
........ 
xxxx     unsigned byte   ??               label
The labels values are 0 to 9.

TRAINING SET IMAGE FILE (train-images-idx3-ubyte):

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000803(2051) magic number 
0004     32 bit integer  60000            number of images 
0008     32 bit integer  28               number of rows 
0012     32 bit integer  28               number of columns 
0016     unsigned byte   ??               pixel 
0017     unsigned byte   ??               pixel 
........ 
xxxx     unsigned byte   ??               pixel
Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).

TEST SET LABEL FILE (t10k-labels-idx1-ubyte):

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000801(2049) magic number (MSB first) 
0004     32 bit integer  10000            number of items 
0008     unsigned byte   ??               label 
0009     unsigned byte   ??               label 
........ 
xxxx     unsigned byte   ??               label
The labels values are 0 to 9.

TEST SET IMAGE FILE (t10k-images-idx3-ubyte):

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000803(2051) magic number 
0004     32 bit integer  10000            number of images 
0008     32 bit integer  28               number of rows 
0012     32 bit integer  28               number of columns 
0016     unsigned byte   ??               pixel 
0017     unsigned byte   ??               pixel 
........ 
xxxx     unsigned byte   ??               pixel
Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black). 
  
THE IDX FILE FORMAT

the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.
The basic format is

magic number 
size in dimension 0 
size in dimension 1 
size in dimension 2 
..... 
size in dimension N 
data

The magic number is an integer (MSB first). The first 2 bytes are always 0.

The third byte codes the type of the data: 
0x08: unsigned byte 
0x09: signed byte 
0x0B: short (2 bytes) 
0x0C: int (4 bytes) 
0x0D: float (4 bytes) 
0x0E: double (8 bytes)

The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....

The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).

The data is stored like in a C array, i.e. the index in the last dimension changes the fastest. },
keywords= {mnist},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/ce990b28668abf16480b8b906640a6cd7e3b8b21</link>
</item>
<item>
<title>NLCD2006 Land Cover (2011 Edition) nlcd_2006_landcover_2011_edition_2014_03_31.zip (Dataset)</title>
<description>@article{,
title= {NLCD2006 Land Cover (2011 Edition) nlcd_2006_landcover_2011_edition_2014_03_31.zip},
journal= {},
author= {MRLC},
year= {2011},
url= {http://www.mrlc.gov/nlcd06_data.php},
abstract= {The most recent 2011 Edition of NLCD 2006 land cover layer for the conterminous United States for all pixels.


National Land Cover Database 2006 (NLCD2006) is a 16-class land cover classification scheme that has been applied consistently across the conterminous United States at a spatial resolution of 30 meters. NLCD2006 is based primarily on the unsupervised classification of Landsat Enhanced Thematic Mapper+ (ETM+) circa 2006 satellite data. NLCD2006 also quantifies land cover change between the years 2001 to 2006. The NLCD2006 land cover change product was generated by comparing spectral characteristics of Landsat imagery between 2001 and 2006, on an individual path/row basis, using protocols to identify and label change based on the trajectory from NLCD2001 products. It represents the first time this type of 30 meter resolution land cover change product has been produced for the conterminous United States. A formal accuracy assessment of the NLCD2006 land cover change product is planned for 2011. 

Generation of NLCD2006 products helped to identify some issues in the NLCD2001 land cover and percent developed imperviousness products only (there were no changes to the NLCD2001 percent canopy). These issues were evaluated and corrected, necessitating a reissue of NLCD2001 products (NLCD2001 Version 2.0) as part of the NLCD2006 release. A majority of the NLCD2001 updates occurred in coastal mapping zones where NLCD2001 was published prior to the completion of the National Oceanic and Atmospheric Administration (NOAA) Coastal Change Analysis Program (C-CAP) 2001 land cover products. NOAA C-CAP 2001 land cover has now been seamlessly integrated with NLCD2001 land cover for all coastal zones. NLCD2001 percent developed imperviousness was also updated as part of this process. 
},
keywords= {Dataset},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/081cae4ec8ce93a6b86ea1b55a4cca113a257593</link>
</item>
<item>
<title>MPEG-7 Core Experiment CE-Shape-1 [tar.gz] (Dataset)</title>
<description>@article{,
title = {MPEG-7 Core Experiment CE-Shape-1 [tar.gz]},
journal = {},
author = {Richard Ralph},
year = {1999},
url = {http://www.dabi.temple.edu/~shape/MPEG7/dataset.html},
license = {},
abstract = {Here are the first shapes from each class in the MPEG-7 Core Experiment CE-Shape-1 Test Set.

MPEG-7 Core Experiment CE-Shape-1 [?] is a popular database for shape matching evaluation consisting of 70 shape categories, where each category is represented by 20 different images with high intra-class variability. The shapes are defined by a binary mask outlining the objects.
The evaluation protocol for this retrieval task is the bullseye rating, in which each image is used as reference and compared to all of the other images. The mean percentage of correct images in the top 40 matches (the 40 images with the lowest shape similarity values) is taken as bullseye rating.

The Latecki group maintains an overview of recent results here:
http://knight.cis.temple.edu/ shape/MPEG7/results.html.

Download MPEG-7 Core Experiment CE-Shape-1
http://www.cis.temple.edu/ latecki/TestData/mpeg7shapeB.tar.gz

Note: It raises interesting questions how to define the shape of an object, as there are very similar objects (apples and device9) in two categories, however the octopus category has much larger intra-class variances and is still the same category. 


$ tar -ztvf mpeg7shapeB.tar.gz 
-rwxrwxrwx  0 latecki users    1723 Nov 12  1999 original/Bone-1.gif
-rwxrwxrwx  0 latecki users    1819 Nov 12  1999 original/Bone-10.gif
-rwxrwxrwx  0 latecki users    1745 Nov 12  1999 original/Bone-11.gif
-rwxrwxrwx  0 latecki users    1738 Nov 12  1999 original/Bone-12.gif
-rwxrwxrwx  0 latecki users    1322 Nov 12  1999 original/Bone-13.gif
-rwxrwxrwx  0 latecki users    1720 Nov 12  1999 original/Bone-14.gif
-rwxrwxrwx  0 latecki users    1654 Nov 12  1999 original/Bone-15.gif
-rwxrwxrwx  0 latecki users    1759 Nov 12  1999 original/Bone-16.gif
-rwxrwxrwx  0 latecki users    1739 Nov 12  1999 original/Bone-17.gif
-rwxrwxrwx  0 latecki users    1489 Nov 12  1999 original/Bone-18.gif
-rwxrwxrwx  0 latecki users    1772 Nov 12  1999 original/Bone-19.gif
-rwxrwxrwx  0 latecki users    1714 Nov 12  1999 original/Bone-2.gif
-rwxrwxrwx  0 latecki users    1459 Nov 12  1999 original/Bone-20.gif
-rwxrwxrwx  0 latecki users    1759 Nov 12  1999 original/Bone-3.gif
...........
-rwxrwxrwx  0 latecki users    1664 Nov 12  1999 original/watch-17.gif
-rwxrwxrwx  0 latecki users    1873 Nov 12  1999 original/watch-18.gif
-rwxrwxrwx  0 latecki users    1881 Nov 12  1999 original/watch-19.gif
-rwxrwxrwx  0 latecki users    2720 Nov 12  1999 original/watch-2.gif
-rwxrwxrwx  0 latecki users    1889 Nov 12  1999 original/watch-20.gif
-rwxrwxrwx  0 latecki users    1699 Nov 12  1999 original/watch-3.gif
-rwxrwxrwx  0 latecki users    1757 Nov 12  1999 original/watch-4.gif
-rwxrwxrwx  0 latecki users    1802 Nov 12  1999 original/watch-5.gif
-rwxrwxrwx  0 latecki users    1765 Nov 12  1999 original/watch-6.gif
-rwxrwxrwx  0 latecki users    1840 Nov 12  1999 original/watch-7.gif
-rwxrwxrwx  0 latecki users    1927 Nov 12  1999 original/watch-8.gif
-rwxrwxrwx  0 latecki users    1719 Nov 12  1999 original/watch-9.gif}
}</description>
<link>https://academictorrents.com/download/0a8cb3446b0de5690fee29a2c68922ff691c7f9a</link>
</item>
<item>
<title>Object and Concept Recognition for Content-Based Image Retrieval (CBIR) (Dataset)</title>
<description>@article{,
title = {Object and Concept Recognition for Content-Based Image Retrieval (CBIR)},
journal = {},
author = {University of Washington },
year = {},
url = {http://www.cs.washington.edu/research/imagedatabase/},
abstract = {Our groundtruth database consists of 21 datasets of outdoor scene images, many including a text file containing a list of visible objects for each image.

Project Summary

With the advent of powerful but inexpensive computers and storage devices and with the availability of the World Wide Web, image databases have moved from research to reality. Search engines for finding images are available from commercial concerns and from research institutes. These search engines can retrieve images by keywords or by image content such as color, texture, and simple shape properties. Content-based image retrieval is not yet a commercial success, because most real users searching for images want to specify the semantic class of the scene or the object(s) it should contain. The large commercial image providers are still using human indexers to select keywords for their images, even though their databases contain thousands or, in some cases, millions of images. Automatic object recognition is needed, but most successful computer vision object recognition systems can only handle particular objects, such as industrial parts, that can be represented by precise geometric models. Content-based retrieval requires the recognition of generic classes of objects and concepts. A limited amount of work has been done in this respect, but no general methodology has yet emerged.
The goal of this research is to develop the necessary methodology for automated recognition of generic object and concept classes in digital images. The work will build on existing object-recognition techniques in computer vision for low-level feature extraction and will design higher-level relationship and cluster features and a new unified recognition methodology to handle the difficult problem of recognizing classes of objects, instead of particular instances. Local feature representations and global summaries that can be used by general-purpose classifiers will be developed. A powerful new hierarchical multiple classifier methodology will provide the learning mechanism for automating the development of recognizers for additional objects and concepts. The resulting techniques will be evaluated on several different large image databases, including commercial databases whose images are grouped into broad classes and a ground-truth database that provides a list of the objects in each image. The results of this work will be a new generic object recognition paradigm that can immediately be applied to automated or semi-automated indexing of large image databases and will be a step forward in object recognition.

Project Impact

The results of this project will have an impact on both image retrieval from large databases and object recognition in general. It will target the recognition of classes of common objects that can appear in image databases of outdoor scenes. It will develop object class recognizers and a new learning formalism for automating the production of new classifiers for new classes of objects. It will also develop new representations for the image features that can be used to recognize these objects. It will allow content-based retrieval to become an important method for accessing real, commercial image databases, which today use only human index terms for retrieval.
Goals, Objectives and Targeted Activities

In the first year of the grant, we developed the feature extraction routines to extract features capable of recognizing an initial set of common objects representing a variety of the types of objects that appear in outdoor scenes, including city scenes and noncity scenes. We designed generic object recognition algorithms for the initial object set. We have developed such algorithms for vehicles, boats, and buildings, and have designed new high-level image features including symmetry features and cluster features. In the second year, We designed a unified representation for the image features called abstract regions. These are regions of the image that can come about from many different processes: color clustering, texture clustering, line-segment clustering, symmetry detection, and so on. All abstract regions will have a common set of features, while each different category will have its own special features. Our current emphasis is on using abstract features along with learning methodologies to recognize comon objects.
Area Background

The area of content-based image retrieval is a hybrid research area that requires knowledge of both computer vision and of database systems. Large image databases are being collected, and images from these collections made available to users in advertising, marketing, entertainment, and other areas where images can be used to enhance the product. These images are generally organized loosely by category, such as animals, natural scenes, people, and so on. All image indexing is done by human indexers who list the important objects in an image and other terms by which users may wish to access it. This method is not suitable for today's very large image databases.
Content-based retrieval systems utilize measures that are based on low-level attributes of the image itself, including color histograms, color composition, and texture. State-of-the-art research focuses on more powerful measures that can find regions of an image corresponding to known objects that users wish to retrieve. There has been some success in finding human faces of different selected sizes, human bodies, horses, zebras and other texture animals with known patterns, and such backgrounds as jungles, water, and sky. Our research will focus on a unified methodology for feature representation and object class recognition. This work will lead to automatic indexing capabilities in the future.}
}</description>
<link>https://academictorrents.com/download/d5d80c1ad9d6b44b6e80c942414f1753bf9a1970</link>
</item>
<item>
<title>Mnih Massachusetts Building Dataset (Dataset)</title>
<description>@article{,
title= {Mnih Massachusetts Building Dataset},
journal= {},
year= {2013},
url= {http://www.cs.toronto.edu/~vmnih/data/},
abstract= {"The datasets introduced in Chapter 6 of my PhD thesis are below. See the thesis for more details." },
author= {Volodymyr Mnih},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/630d2c7e265af1d957cbee270f4328c54ccef333</link>
</item>
<item>
<title>Lerman Digg 2009 Dataset (Dataset)</title>
<description>@article{,
title= {Lerman Digg 2009 Dataset},
journal= {},
author= {Kristina Lerman },
year= {},
license= {This data is made available to the community for research purposes only},
url= {http://www.isi.edu/~lerman/downloads/digg2009.html},
abstract= {Digg2009 data set contains data about stories promoted to Digg's front page over a period of a month in 2009. For each story, we collected the list of all Digg users who have voted for the story up to the time of data collection, and the time stamp of each vote. We also retrieved the voters' friendship links. The semantics of the friendship links are as follows 
user_id --&gt; friend_id 
means that user_id is watching the activities of (is a fan of) friend_id. User ids have been anonymized, but are unique in the data set: a user with a specific id in the friendship links table and a user with the same id in the votes table correspond to the same actual user.
The data is in zipped csv files that are password protected. The password is digg2009_user.

##Votes

Table digg_votes contains 3,018,197 votes on 3553 popular stories made by 139,409 distinct users. The first vote is from the story's submitter. 

##Schema of the table
|Attribute|Value|
|-|-|
|vote_date: |Unix time stamp of the vote|
|voter_id: |anonymized unique id of the voter|
|story_id: |anonymized unique id of the story|
 
![](http://www.isi.edu/~lerman/downloads/diggs_distribution.jpg)
![](http://www.isi.edu/~lerman/downloads/voting_distribution.png)

(left) Distribution of votes (diggs) per story. An outlier with more than 24,000 votes is not shown. (right)Distribution of the number of votes (diggs) made by users.

##Friendship links

Table digg_friends contains 1,731,658 friendship links of 71,367 distinct users. Voters who do not appear in the table did not specify any friends at the time data was collected. 

Schema of the digg_friends table

|Attribute|Value|
|-|-|
|mutual: |indicated whether the link represents a mutual friend relation (1) or not (0)|
|friend_date: |Unix time stamp of when the friendship link was created|
|user_id: |anonymized unique id of a user|
|friend_id: |anonymized unique id of a user|
 
![](http://www.isi.edu/~lerman/downloads/fans_distribution.png)

Distribution of the number of fans per user.

Empirical characterization of this data is described in 

Lerman, K., and Ghosh, R. (2010) "Information Contagion: an Empirical Study of Spread of News on Digg and Twitter Social Networks." In Proceedings of 4th International Conference on Weblogs and Social Media (ICWSM). (presentation) (bibtex) This data is made available to the community for research purposes only. If you use the data in a publication, please cite the above paper.},
keywords= {digg2009},
terms= {}
}

</description>
<link>https://academictorrents.com/download/d98540da6d34fb6a0150fd88b41580a377cb454d</link>
</item>
<item>
<title>Lerman Twitter 2010 Dataset (Dataset)</title>
<description>@article{,
title= {Lerman Twitter 2010 Dataset},
journal= {},
author= {Kristina Lerman },
year= {2010},
license= {This data is made available to the community for research purposes only},
url= {http://www.isi.edu/~lerman/downloads/twitter/twitter2010.html},
abstract= {Twitter_2010 data set contains tweets containing URLs that have been posted on Twitter during October 2010. In addition to tweets, we also the followee links of tweeting users, allowing us to reconstruct the follower graph of active (tweeting) users.
URLs66,059
tweets2,859,764
users736,930
links36,743,448
Tweets

Table (in csv format) link_status_search_with_ordering_real_csv contains tweets with the following information

link: URL within the text of the tweet
id: tweet id
create_at: date added to the db
create_at_long
inreplyto_screen_name: screen name of user this tweet is replying to
inreplyto_user_id: user id of user this tweet is replying to
source: device from which the tweet originated
bad_user_id: alternate user id
user_screen_name: tweeting user screen name
order_of_users: tweet's index within sequence of tweets of the same URL
user_id: user id
Table (in csv format) distinct_users_from_search_table_real_map contains names of tweeting users, and the following information for each user:

user_id: user id
user_screen_name: user name
indegree: number of followers
outdegree: number of friends/followees
bad_user_id: alternate user id
Follower graph

File active_follower_real_sql contains zipped SQL dump of links between tweeting users in the form:

user_id: user id
follower_id: user id of the follower
Empirical characterization of this data is described in 
Kristina Lerman, Rumi Ghosh, Tawan Surachawala (2012) "Social Contagion: An Empirical Study of Information Spread on Digg and Twitter Follower Graphs." This data is made available to the community for research purposes only. If you use the data in a publication, please cite the above paper.},
keywords= {twitter},
terms= {}
}

</description>
<link>https://academictorrents.com/download/d8b3a315172c8d804528762f37fa67db14577cdb</link>
</item>
<item>
<title>US domestic flights from 1990 to 2009 (Dataset)</title>
<description>@article{,
title= {US domestic flights from 1990 to 2009},
journal= {},
author= {US Census Bureau},
year= {2009},
url= {http://www.infochimps.com/datasets/35-million-us-domestic-flights-from-1990-to-2009},
abstract= {Over 3.5 million monthly domestic flight records from 1990 to 2009. Data are arranged as an adjacency list with metadata. Ready for immediate database import and analysis.

##Fields:

|Short name|Type |Description|
|-|-|-|
|Origin|String|Three letter airport code of the origin airport|
|Destination|String|Three letter airport code of the destination airport|
|Origin City|String|Origin city name|
|Destination City|String|Destination city name|
|Passengers|Integer|Number of passengers transported from origin to destination|
|Seats|Integer|Number of seats available on flights from origin to destination|
|Flights|Integer|Number of flights between origin and destination (multiple records for one month, many with flights &gt; 1)|
|Distance|Integer|Distance (to nearest mile) flown between origin and destination|
|Fly Date|Integer|The date (yyyymm) of flight|
|Origin Population|Integer|Origin city's population as reported by US Census|
|Destination Population|Integer|Destination city's population as reported by US Census|

##Snippet:

    MFRRDMMedford, ORBend, OR001156200810200298157730
    AMAEKOAmarillo, TXElko, NV124124185819930820296040259
    TUSEKOTucson, AZElko, NV112124165819930871139240259
    AMAEKOAmarillo, TXElko, NV115124185819940620631541668
    ICTEKOWichita, KSElko, NV1001241100719960755288445034
    SPSEKOWichita Falls, TXElko, NV1221241105919960314768345034

##Source(s)

1. US Census Bureau
2. RITA/Transtats, Bureau of Transportation Statistics},
keywords= {Dataset},
terms= {}
}

</description>
<link>https://academictorrents.com/download/a2ccf94bbb4af222bf8e69dad60a68a29f310d9a</link>
</item>
<item>
<title>UMN Sarwat Foursquare Dataset (September 2013) (Dataset)</title>
<description>@article{,
title= {UMN Sarwat Foursquare Dataset (September 2013)},
journal= {},
author= {Mohamed Sarwat and Justin J. Levandoski and Ahmed Eldawy and Mohamed F. Mokbe},
year= {2013},
url= {http://www-users.cs.umn.edu/~sarwat/foursquaredata/},
license= {},
abstract= {This data set contains 2,153,471 users, 1,143,092 venues, 1,021,970 check-ins, 27,098,490 social connections, and 2,809,581 ratings that users assigned to venues; all extracted from the Foursquare application through the public API. All users information have been anonymized, i.e., users geolocations are also anonymized. Each user is represented by an id, and GeoSpatial location. The same for venues. The data are contained in five files, users.dat, venues.dat, checkins.dat, socialgraph.dat, and ratings.dat. More details about the contents and use of all these files follows.

Content of Files
* users.dat: consists of a set of users such that each user has a unique id and a geospatial location (latitude and longitude) that represents the user home town location.
* venues.dat: consists of a set of venues (e.g., restaurants) such that each venue has a unique id and a geospatial location (lattude and longitude).
* checkins.dat: marks the checkins (visits) of users at venues. Each check-in has a unique id as well as the user id and the venue id.
* socialgraph.dat: contains the social graph edges (connections) that exist between users. Each social connection consits of two users (friends) represented by two unique ids (first_user_id and second_user_id).
* ratings.dat: consists of implicit ratings that quantifies how much a user likes a specific venue.

Credits

The user must acknowledge the use of the data set in publications resulting from the use of the data set by citing the following papers:

* Mohamed Sarwat, Justin J. Levandoski, Ahmed Eldawy, and Mohamed F. Mokbel. LARS*: A Scalable and Efficient Location-Aware Recommender System. in IEEE Transactions on Knowledge and Data Engineering TKDE
* Justin J. Levandoski, Mohamed Sarwat, Ahmed Eldawy, and Mohamed F. Mokbel. LARS: A Location-Aware Recommender System. in ICDE 2012},
keywords= {foursquare},
terms= {}
}

</description>
<link>https://academictorrents.com/download/b24c73949308b3f6bdd8fea1a485534392eef338</link>
</item>
<item>
<title>LBL-CONN-7 Network Traces (Dataset)</title>
<description>@article{,
title = {LBL-CONN-7 Network Traces},
journal = {},
author = {Vern Paxson},
year = {1993},
url = {http://ita.ee.lbl.gov/html/contrib/LBL-CONN-7.html},
abstract = {Description
This trace contains thirty days' worth of all wide-area TCP connections between the Lawrence Berkeley Laboratory (LBL) and the rest of the world.

Format
The reduced trace was generated by tcp-reduce, and has the format explained in that script's documentation . Briefly, the trace is an ASCII file with one line per connection, with the following columns:
timestamp
duration
protocol
bytes sent by originator of the connection, or ? if not available
bytes sent by responder to the connection, or ? if not available
local host - the (renumbered) LBL host that participated in the connection
remote host - the remote (non-LBL) host that participated in the connection. Remote hosts have not been renumbered, to allow for geographic analysis of the data. Please do not attempt any further traffic analysis regarding the remote hosts.
state that the connection ended in. The two most important states are SF, indicating normal SYN/FIN completion, and REJ, indicating a rejected connection (initial SYN elicited a RST in reply). Other states are discussed in the tcp_reduce documentation .
flags zero or more flags:
L indicates the connection was initiated locally (i.e., the LBL host is the one that began the connection)
N indicates the connection was with nearby U.C. Berkeley. When this dataset was captured, a filter was used so that only nntp traffic with UCB was included, so this flag is only ever set for nntp connections.

Measurement
The trace ran from midnight, Thursday, September 16 1993 through midnight, Friday, October 15 1993 (times are Pacific Standard Time), capturing 606,497 wide-area connections. The tracing was done on the Ethernet DMZ network over which flows all traffic into or out of the Lawrence Berkeley Laboratory, located in Berkeley, California. The raw trace was made using tcpdump on a Sun Sparcstation using the BPF kernel packet filter. Fewer than 15 SYN/FIN/RST packets in a million were dropped. Timestamps have microsecond precision. As noted above, the traffic was filtered to exclude connections with nearby UCB except for nntp.
Privacy

The LBL hosts in the trace have been renumbered. The remote hosts remain as full IP addresses, to allow for geographic analysis of the data. Please do not attempt any further traffic analysis regarding the remote hosts.

Acknowledgements
The trace was made by Vern Paxson (vern@ee.lbl.gov). In publications, please include one or more citations to the papers mentioned below, as appropriate.

Publications
The SF connections in this trace correspond to LBL-7 in the papers Empirically-Derived Analytic Models of Wide-Area TCP Connections, V. Paxson, IEEE/ACM Transactions on Networking, 2(4), pp. 316-336, August 1994; Growth Trends in Wide-Area TCP Connections, V. Paxson, IEEE Network, 8(4), pp. 8-17, July 1994; and Wide-Area Traffic: The Failure of Poisson Modeling, V. Paxson and S. Floyd, IEEE/ACM Transactions on Networking, 3(3), pp. 226-244, June 1995.

Restrictions
The trace may be freely redistributed.}
}</description>
<link>https://academictorrents.com/download/2060d7faa61dd774f9279be7f3f79cece12ed0ed</link>
</item>
<item>
<title>BU-Web-Client Network Traces (Dataset)</title>
<description>@article{,
title = {BU-Web-Client Network Traces},
journal = {},
author = {Oceans research group at Boston University},
year = {1994},
url = {http://ita.ee.lbl.gov/html/contrib/BU-Web-Client.html},
license = {The traces may be freely redistributed.},
abstract = {Description
These traces contain records of the HTTP requests and user behavior of a set of Mosaic clients running in the Boston University Computer Science Department, spanning the timeframe of 21 November 1994 through 8 May 1995. 

During the data collection period a total of 9,633 Mosaic sessions were traced, representing a population of 762 different users, and resulting in 1,143,839 requests for data transfer. 

Format
Trace logfiles contain the sequence of WWW object requests (whether the object was served from the local cache or from the network). Each log file name contains a user id number, converted from Unix UIDs via a one-way function that allows user IDs to be compared for equality but not to be easily traced back to particular users. The file name also gives the machine on which the session took place, and the Unix timestamp when the session started. Boston University is located in the United States Eastern Time Zone. For example, a file named con1.cs20.785526125 is a log of a session from user 1, on machine cs20, starting at time 785526125 (12:42:05 EST, Tuesday, November 22, 1994). 

Each line in a log corresponds to a single URL requested by the user; it contains the machine name, the timestamp when the request was made, the user id number, the URL, the size of the document (including the overhead of the protocol) and the object retrieval time in seconds (reflecting only actual communication time, and not including the intermediate processing performed by Mosaic in a multi-connection transfer). An example of a line from a condensed log is: 
cs20 785526142 920156 "http://cs-www.bu.edu/lib/pics/bu-logo.gif" 1804 0.484092 
Lines with the number of bytes equal to 0 and retrieval delay equal to 0.0 mean that the request was satisfied by Mosaic's internal cache. 

Measurement
To collect this data we installed an instrumented version of Mosaic in the general computing environment at Boston University's Computer Science Department. This environment consists principally of 37 SparcStation 2 workstations connected in a local network, which is divided in 2 subnets. Each workstation has its own local disk; logs were written to the local disk and subsequently transferred to a central repository. 

We began by collecting data on a subset of the workstations only, while testing our data collection process. This period lasted from 21 November 1994 until 17 January 1995. When we were statisfied that data collection was occurring correctly, we extended the data collection process to include all workstations; data collection then took place until 8 May 1995. Since Mosaic ceased to be the dominant browser in use by early March 1995, the most representative portion of the traces are those covering the period 21 November 1995 through 28 February 1995. 

Privacy
The user IDs in these logs have been renumbered to protect privacy.
Acknowledgements
These logs were collected by the members of the Oceans research group at Boston University. Mosaic was instrumented by Carlos Cunha (carro@cs.bu.edu). When referring to the use of these traces in published work, please cite Characteristics of WWW Client Traces, Carlos A. Cunha, Azer Bestavros and Mark E. Crovella, Boston University Department of Computer Science, Technical Report TR-95-010, April 1995.
}
}
</description>
<link>https://academictorrents.com/download/f305fe91840e1e117bdf27bd6c3970a69d90b92f</link>
</item>
<item>
<title>Wikipedia English Official Offline Edition 2014-02-03 (Dataset)</title>
<description>@article{,
title = {Wikipedia English Official Offline Edition 2014-02-03},
journal = {},
author = {Wikipedia},
date= {2014-02-03},
year = {2014},
url = {http://meta.wikimedia.org/wiki/Data_dumps},
abstract = {Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is multi-licensed under the Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL). Images and other files are available under different terms, as detailed on their description pages. For our advice about complying with these licenses, see Wikipedia:Copyrights.
},
license ={Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA) and the GNU Free Documentation License (GFDL)},
superseded = {e18b8cce7d9cb2726f5f40dcb857111ec573cad4}

}</description>
<link>https://academictorrents.com/download/9512a1f6d21e5012c06a1c9b8e2dd4796ecc77a9</link>
</item>
<item>
<title>Twitter Data - NIPS 2012 (Dataset)</title>
<description>@article{,
title= {Twitter Data - NIPS 2012},
journal= {},
author= {J. McAuley and J. Leskovec},
year= {},
url= {http://snap.stanford.edu/data/egonets-Twitter.html},
license= {},
abstract= {This dataset consists of 'circles' (or 'lists') from Twitter. Twitter data was crawled from public sources. The dataset includes node features (profiles), circles, and ego networks.


##Dataset statistics

|Attribute|Value|
|---------|-------|
|Nodes|81306|
|Edges|1768149|
|Nodes in largest WCC|81306 (1.000)|
|Edges in largest WCC|1768149 (1.000)|
|Nodes in largest SCC|68413 (0.841)|
|Edges in largest SCC|1685163 (0.953)|
|Average clustering coefficient|0.5653|
|Number of triangles|13082506|
|Fraction of closed triangles|0.06415|
|Diameter (longest shortest path)|7|
|90-percentile effective diameter|4.5|

##Source (citation)

J. McAuley and J. Leskovec. Learning to Discover Social Circles in Ego Networks. NIPS, 2012.

##Files:

|Attribute|Value|
|---------|-------|
|nodeId.edges |The edges in the ego network for the node 'nodeId'. Edges are undirected for facebook, and directed (a follows b) for twitter and gplus. The 'ego' node does not appear, but it is assumed that they follow every node id that appears in this file.|
|nodeId.circles |The set of circles for the ego node. Each line contains one circle, consisting of a series of node ids. The first entry in each line is the name of the circle.|
|nodeId.feat |The features for each of the nodes that appears in the edge file.|
|nodeId.egofeat |The features for the ego user.|
|nodeId.featnames |The names of each of the feature dimensions. Features are '1' if the user has this property in their profile, and '0' otherwise. This file has been anonymized for facebook users, since the names of the features would reveal private data.|},
keywords= {twitter, social networks, NIPS},
terms= {}
}

</description>
<link>https://academictorrents.com/download/046cf7a75db2a530b1505a4ce125fbe0031f4661</link>
</item>
<item>
<title>Introduction to Theory of Computation (Paper)</title>
<description>@article{,
title= {Introduction to Theory of Computation},
journal= {},
author= {Anil Maheshwari and Michiel Smid},
year= {2014},
url= {http://cg.scs.carleton.ca/~michiel/TheoryOfComputation/},
license= {CC-SA},
abstract= {This is a free textbook for an undergraduate course on the Theory of Computation, which we have been teaching at Carleton University since 2002. Until the 2011/2012 academic year, this course was offered as a second-year course (COMP 2805) and was compulsory for all Computer Science students. Starting with the 2012/2013 academic year, the course has been downgraded to a third-year optional course (COMP 3803).
We have been developing this book since we started teaching this course. Currently, we cover most of the material from Chapters 2?5 during a 12-week term with three hours of classes per week.
The material from Chapter 6, on Complexity Theory, is taught in the third-year course COMP 3804 (Design and Analysis of Algorithms). In the early years of COMP 2805, we gave a two-lecture overview of Complexity Theory at the end of the term. Even though this overview has disappeared from the course, we decided to keep Chapter 6. This chapter has not been revised/modified for a long time.},
keywords= {},
terms= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/d746df46e76fcfdb8e0b682a5e47ed1a776db7db</link>
</item>
<item>
<title>Viking MDIM2.1 Colorized Global Mosaic 232m (Dataset)</title>
<description>@article{,
title = {Viking MDIM2.1 Colorized Global Mosaic 232m},
journal = {},
author = {NASA},
year = {2014},
url = {http://astrogeology.usgs.gov/search/details/Mars/Viking/MDIM21/Mars_Viking_MDIM21_ClrMosaic_global_232m/cub},
abstract = {This global image map of Mars has a resolution of 256 pixels/degree (scale approximately 231 m/pixel at the equator). The colorized mosaic was completed by NASA AMES which warped the original Viking colorized mosaic and blended over the lastest black/white mosaic. This mosaic, known as Colorized Mars Digital Image Model (MDIM) 2.1. The original MDIM2.1, replaces two earlier mosaics produced by the USGS from the same set of approximately 4600 Viking Orbiter images. The positional accuracy of features in MDIM 2.1 is estimated to be roughly one pixel (200 m), compared to 3 km for MDIM 2.0 released in 2001 and &gt;6 km for MDIM 1.0 released in 1991. In addition to relatively imprecise geodetic control, the previous mosaics were affected by changing definitions of cartographic parameters (such as the definition of zero longitude), resulting in an overall longitude shift of as much as 0.2° between the early MDIMs and other datasets. The new mosaic uses the most recent coordinate system definitions for Mars. These definitions have been widely adopted by NASA missions and other users of planetary data and are likely to remain in use for a decade or more. As a result, MDIM 2.1 not only registers precisely with data from current missions such as MGS and 2001 Mars Odyssey but will serve as an accurate basemap on which data from future missions can be plotted. 

The basis for the positional accuracy of MDIM 2.1 is the incorporation of all images in the mosaic into the evolving USGS/RAND global control network of Mars. The primary reason for the greatly improved absolute accuracy of the current version of this network is the incorporation of 1232 globally distributed "ground control points" whose latitude and longitude were constrained to values measured from Mars Orbiter Laser Altimeter (MOLA) data. The globally adjusted MOLA dataset has an absolute horizontal accuracy on the order of 100 m, but individual features in images can probably only be tied to MOLA-derived shaded-relief digital image models with a precision on the order of 200 m. Other, lesser contributors to the accuracy of the control solution and mosaic are the use of MOLA-derived elevations for all 37,652 control points, use of updated timing and orientation data for the Viking Orbiter spacecraft, improved measurements of reseau locations in the images leading to more accurate correction of image distortions, and careful checking and re-measurement of control points with large solution residuals. The mosaic is also orthorectified based on the MOLA elevation data, so that parallax distortions present in the earlier versions are eliminated. The root-mean-squared (RMS) error of the control solution is 16 micrometers (1.4 Viking image pixels, or ~300 m on the ground). Visual inspection of the mosaic indicates that both image-to-image seam mismatches and image-to-MOLA registration errors are less than one pixel almost everywhere, with maximum errors on the order of 4 pixels (1 km) occurring in only a few locations. 

The cartographic constants used in MDIM 2.1 are those adopted by the IAU/IAG in 2000, which have been adopted by the majority of Mars missions and instrument teams. Coordinates (e.g., of the boundaries and centers of the individual files or map quadrangles making up the mosaic) are given in terms of east longitude and planetocentric latitude. The files in cylindrical (Equirectangular) map projection are also constructed so that lines of the map raster are equally spaced in planetocentric latitude. These files will thus register with other datasets based on planetocentric latitude either as-is or after a simple change of scale, but must be resampled in order to register to datasets based on planetographic latitude. The global mosaic is divided into 30 regions based on the USGS Mars Chart (MC) series of 1:5,000,000-scale printed maps. All regions are available in Equirectangular projection, which is a generalization of the more familiar Simple Cylindrical projection. Quadrangles 2-29 are provided only in Equirectangular, with center latitude of projection 0° this projection is identical to Simple Cylindrical. The polar quadrangles 1 and 30 are available in two Equirectangular sections with center latitudes of projection ±60° and ±75.52248781° respectively, and also as a single file in Polar Stereographic projection. The two Equirectangular sections of the polar quadrangles can be converted to center latitude of projection 0° (or equivalently to Simple Cylindrical projection) by 2:1 and 4:1 enlargement in the sample direction, respectively, after which they can be merged with the lower latitude data. 

The images used to make MDIM 2.1 were obtained primarily through the red, clear, and minus-blue filters of the Viking Orbiter imaging system, and thus provide a monochromatic view of Mars weighted toward the red end of the visible spectrum. Images were obtained with a wide range of solar incidence angles. It is unfortunately not possible to correct the appearance of both albedo (reflectivity) variations and topographic features for these incidence angle variations simultaneously. The images have therefore been highpass-filtered at a scale of ~50 km to remove regional albedo variations and then normalized so that equal topographic slopes appear with equal contrast everywhere. Photometric processing for MDIM 2.1 incorporates a model of the transmission and scattering of light in the atmosphere that is substantially improved over that used in MDIM 2.0. Residual tonal mismatches between different images after photometric correction were corrected based on a least-squares adjustment of image brightness and contrast. Because of these photometric and cosmetic improvements, it was possible to use a less severe highpass filter than for MDIM 2.0, improving the overall appearance of the mosaic. 

References available from: http://astrogeology.usgs.gov/maps/mdim-2-1

}
}</description>
<link>https://academictorrents.com/download/c746fd3441d19772627fd36599dc418241d39452</link>
</item>
<item>
<title>Analysis of the Cryptocurrency Marketplace (Paper)</title>
<description>@article{,
title = {Analysis of the Cryptocurrency Marketplace},
journal = {},
author = {Alex Heid},
year = {},
url = {http://www.hackmiami.org/whitepapers/},
license = {},
abstract = {This paper will go over the technical, economic, and social impact of cryptocurrencies such as Bitcoin and Litecoin. This document will go into a comprehensive level of detail about cryptocurrency technologies and protocols, as this is required to familiarize the reader with the principles behind the rapidly emerging open source economic ecosystem. Furthermore, emerging attack vectors of cryptocurrencies will be discussed, such as custom malware campaigns and targeted exploitation.}
}
</description>
<link>https://academictorrents.com/download/daaa86689c42e78c4111b74984d5036a426f6cf6</link>
</item>
<item>
<title>Management of acute and post-operative pain in chronic kidney disease (Paper)</title>
<description>@article{parmar_management_2013,
title = {Management of acute and post-operative pain in chronic kidney disease},
issn = {2046-1402},
url = {http://f1000research.com/articles/2-28/v3},
doi = {10.12688/f1000research.2-28.v3},
urldate = {2014-02-07},
journal = {{F1000Research}},
author = {Parmar, Malvinder S and Parmar, Kamalpreet S},
month = apr,
year = {2013},
file = {Management of acute and post-operative pain in chronic kidney disease - F1000Research:/Users/ieee8023/Library/Application Support/Firefox/Profiles/q8zw1ocn.default/zotero/storage/SMMVRJ5P/v3.html:text/html},
abstract = {Chronic kidney disease is common and patients with many co-morbid conditions frequently have to undergo surgical procedures and, therefore, require effective pain management. The pharmacokinetics of various analgesic agents are not well studied in patients with chronic kidney disease and the risk of accumulation of the main drug or their metabolites, resulting in serious adverse events, is a common scenario on medical and surgical wards. It is common for these patients to be cared for by 'non-nephrologists' who often prescribe the standard dose of the commonly used analgesics, without taking into consideration the patient's kidney function. It is important to recognize the problems and complications associated with the use of standard doses of analgesics, and highlight the importance of adjusting analgesic dosage based on kidney function to avoid complications while still providing adequate pain relief.}
}</description>
<link>https://academictorrents.com/download/f92f4798efd078afe1708efb74a3816a66a23104</link>
</item>
<item>
<title>Introducing R (Paper)</title>
<description>@article{,
title= {Introducing R},
author= {German Rodriguez},
year= {2001},
url= {http://data.princeton.edu/R/},
abstract= {The purpose of these notes, an update of my 1992 handout Introducing S-Plus, is to provide a quick introduction to R, particularly as a tool for fitting linear and generalized linear models. Additional examples may be found in the R Logs section of my GLM course.

##1. Introduction
R is a powerful environment for statistical computing which runs on several platforms. These notes are written specially for users running the Windows version, but most of the material applies to the Mac and Linux versions as well.

##1.1 The R Language and Environment
R was first written as a research project by Ross Ihaka and Robert Gentleman, and is now under active development by a group of statisticians called 'the R core team', with a home page at www.r-project.org.

R was designed to be 'not unlike' the S language developed by John Chambers and others at Bell Labs. A commercial version of S with additional features was developed and marketed as S-Plus by Statistical Sciences, which later became Insightful and is now TIBCO Spotfire. R and S-Plus can best be viewed as two implementations of the S language.

R is available free of charge and is distributed under the terms of the Free Software Foundation's GNU General Public License. You can download the program from the Comprehensive R Archive Network (CRAN). Ready-to-run 'binaries' are available for Windows, Mac OS X, and Linux. The source code is also available for download and can be compiled for other platforms.

These notes are organized in several sections, as shown in the table of contents on the right. I have tried to introduce key features of R as they are needed by students in my statistics classes. As a result, I often postpone (or altogether omit) discussion of some of the more powerful features of R as a programming language.

Notes of local interest, such as where to find R at Princeton University, appear in framed boxes and are labeled as such. Permission is hereby given to reproduce these pages freely and host them in your own server if you wish. You may add, edit or delete material in the local notes as long as the rest of the text is left unchanged and due credit is given. Obviously I welcome corrections and suggestions for enhancement.

##1.2 Bibliographic Remarks
S was first introduced by Becker and Chambers (1984) in what's known as the 'brown' book. The new S language was described by Becker, Chambers and Wilks (1988) in the 'blue' book. Chambers and Hastie (1992) edited a book discussing statistical modeling in S, called the 'white' book. The latest version of the S language is described by Chambers (1998) in the 'green' book, but R is largely an implementation of the versions documented in the blue and white books. Chamber's (2008) latest book focuses on Programming with R.

Venables and Ripley (1994, 1997, 1999, 2002) have written an excellent book on Modern Applied Statistics with S-PLUS that is now in its fourth edition. The latest edition is particularly useful to R users because the main text explains differences between S-Plus and R where relevant. A companion volume called S Programming appeared in 2000 and applies to both S-Plus and R. These authors have also made available in their website an extensive collection of complements to their books, follow the links at MASS 4.

There is now an extensive and rapidly growing literature on R. Good introductions include the books by Krause and Olson (1987), Dalgaard (2002), and Braun and Murdoch (2007). Beginners will probably benefit from working through the examples in Everitt and Hothorn's (2006) A Handbook of Statistical Analyses Using R or Fox's (2002) companion to applied regression. Among more specialized books my favorites include Murrell (2005), an essential reference on R graphics, Pinheiro and Bates (2000), a book on mixed models, and Therneau and Grambsh's (2000) Modeling Survival Data, which includes many analyses using S-Plus as well as SAS. (Therneau wrote the survival analysis code used in S-Plus and R.) For additional references see the annotated list at R Books.

The official R manuals are available as PDF files that come with the R distribution. These include An Introduction to R (a nice 100-page introduction), a manual on R Data Import/Export describing facilities for transferring data to and from other packages, and useful notes on R installation and Administration. More specialized documents include a draft of the R Language Definition, a guide to Writing R Extensions, documentation on R Internals including coding standards, and finally the massive R Reference Index (~3000 pages). The online help facility is excellent. When you install R you get a choice of various help formats. I recommend compiled html help because you get a nice tree-view of the contents, an index, a pretty decent search engine, and nicely formatted help pages. (On Unix you should probably choose html help.)},
keywords= {paper, book},
terms= {}
}

</description>
<link>https://academictorrents.com/download/d430724be7ac00f4b5e7f0d956f8411ef9b67dbe</link>
</item>
<item>
<title>Accelerometer-Based Event Detector for Low-Power Applications (Paper)</title>
<description>
@article{s131013978,
AUTHOR = {Smidla, József and Simon, Gyula},
TITLE = {Accelerometer-Based Event Detector for Low-Power Applications},
JOURNAL = {Sensors},
VOLUME = {13},
YEAR = {2013},
NUMBER = {10},
PAGES = {13978--13997},
URL = {http://www.mdpi.com/1424-8220/13/10/13978},
PubMedID = {24135991},
ISSN = {1424-8220},
DOI = {10.3390/s131013978},
abstract = {In this paper, an adaptive, autocovariance-based event detection algorithm is proposed, which can be used with micro-electro-mechanical systems (MEMS) accelerometer
sensors to build inexpensive and power efficient event detectors. The algorithm works well with low signal-to-noise ratio input signals, and its computational complexity is very low, allowing its utilization on inexpensive low-end embedded sensor devices. The proposed algorithm decreases its energy consumption by lowering its duty cycle, as much as the event to be detected allows it. The performance of the algorithm is tested and compared to the
conventional filter-based approach. The comparison was performed in an application where illegal entering of vehicles into restricted areas was detected.
}
}</description>
<link>https://academictorrents.com/download/4bee3aad417a4670079da4daf129d5e4708f61d4</link>
</item>
<item>
<title>Efficient Accelerometer-based Event Detector in Wireless Sensor Networks (Paper)</title>
<description>@inproceedings{smidla_accelero_2013,
author = {J. Smidla and Gy. Simon},
title = {Efficient Accelerometer-based Event Detector in Wireless Sensor Networks},
booktitle = {2013 IEEE International Instrumentation and Measurement Technology Conference},
month = {May 6-9},
year = {2013},
pages = {732-736},
address= {Minneapolis, MN, USA},
        abstract={In this paper an autocovariance-based event detector algorithm is proposed. The algorithm is able to detect events even if the measurements have poor signal-to-noise ratio, and its performance is independent of the characteristic of the input signal. An efficient implementation of the algorithm is also proposed, which allows the utilization of the algorithm on low-end devices, e.g. in wireless sensor networking nodes. The performance of the algorithm has been tested and compared to a conventional filter-based approach, in a vehicle detector application.}
}
</description>
<link>https://academictorrents.com/download/b0fc43009de3d358bfbd8a14ba99ca320b356bc5</link>
</item>
<item>
<title>New approach for modeling of transiting exoplanets for arbitrary limb-darkening law (Paper)</title>
<description>@article{,
title = {New approach for modeling of transiting exoplanets for arbitrary limb-darkening law},
journal = {},
author = {D. Kjurkchieva and D. Dimitrov and A. Vladev and V. Yotov},
year = {},
url = {},
license = {},
abstract = {We present a new solution of the direct problem of planet transits based on transformation of double integrals to single ones. On the basis of our direct problem solution we created the code TAC-maker for rapid and interactive calculation of synthetic planet transits by numerical computations of the integrals. The validation of our approach was made by comparison with the results of the wide-spread Mandel &amp; Agol (2002) method for the cases of linear, quadratic and squared root limb-darkening laws and various combinations of model parameters. For the first time our approach allows the use of arbitrary limb-darkening law of the host star. This advantage together with the practically arbitrary precision of the calculations make the code a valuable tool that faces the challenges of the continuously increasing photometric precision of the ground-based and space observations.}
}
</description>
<link>https://academictorrents.com/download/bbf1c32bd69459b93742eb691bf11fc8961e6db7</link>
</item>
<item>
<title>NLCD2006 Land Cover Change (NLCD2006_landcover_change_pixels_5-4-11_se5.zip) (Dataset)</title>
<description>@article{,
title= {NLCD2006 Land Cover Change (NLCD2006_landcover_change_pixels_5-4-11_se5.zip)},
journal= {},
author= {USDA},
year= {},
url= {},
abstract= {Land cover layer containing only those pixels identified as changed between NLCD2001 Land Cover Version 2.0 and NLCD2006 Land Cover products for the conterminous United States.},
keywords= {Dataset, nlcd, usgs},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/28a2fd1afbda8be43bec55b6c4c2c9cf1f5b9582</link>
</item>
<item>
<title>Viking Merged Color Mosaic (Dataset)</title>
<description>@article{,
title= {Viking Merged Color Mosaic},
journal= {},
author= {ASU },
year= {},
url= {},
abstract= {![](http://i.imgur.com/4VLmMSg.png)

|Attribute|Value|
|---------|-----|
|Resolution:|64ppd|
|Scale:|920mpp|
|Projection:|Simple cylindrical, -180E to 180E, 90N to -90N, 'ocentric|
|Layout:|Single file|
|Total Size:|23040x11520 pixels|
|Details: |Viking color mosaic sharpened with MDIM 1.0. 64ppd/920m. NASA Viking Orbiter.|

},
keywords= {Mars, ASU},
terms= {}
}

</description>
<link>https://academictorrents.com/download/059ed25558b4587143db637ac3ca94bebb57d88d</link>
</item>
<item>
<title>MOLA Shaded Relief / Colorized Elevation (Dataset)</title>
<description>@article{,
title= {MOLA Shaded Relief / Colorized Elevation},
journal= {},
author= {ASU },
year= {},
url= {http://www.mars.asu.edu/data/mola_color/},
abstract= {
|Attribute|Value|
|--------|--------|
|Resolution:|128 ppd|
|Scale:|463.1 mpp|
|Projection:|Simple cylindrical, -180E to 180E, 90N to -90N, 'ocentric|
|Layout:|30 x 30 degree tiles|
|Total Size:|46080 x 23040 pixels|
|Details:| Shaded relief derived from altimetry, colorized by elevation. 128 ppd/460m. NASA MGS/MOLA.|
|Source:| http://pds-geosciences.wustl.edu/missions/mgs/mola.html|
|Notes:|Data populated from 88N to 88S|

![](http://i.imgur.com/ew6ssh8.png)},
keywords= {Mars, ASU},
terms= {}
}

</description>
<link>https://academictorrents.com/download/06f73b5ca501194ba1cd3aa918bd801b84ea7050</link>
</item>
<item>
<title>THEMIS Day IR Global Mosaic (Dataset)</title>
<description>@article{,
title= {THEMIS Day IR Global Mosaic},
journal= {},
author= {ASU },
year= {},
url= {},
abstract= {Version:2.0
Release Date:November 16, 2006
Resolution:256 ppd
Scale:231.55 mpp
Projection:Simple cylindrical, -180E to 180E, 90N to -90N, 'ocentric
Layout:30 x 30 degree tiles
Total Size:92160 x 46080 pixels
Details:Daytime thermal infrared (12.57um) mosaic. 256 ppd/230m. NASA Mars Odyssey/THEMIS
Notes:Gores are filled using PNG transparency.},
keywords= {Mars, ASU},
terms= {}
}

</description>
<link>https://academictorrents.com/download/8b202f57d4bf3304c10fcd11bdee224c3a9ff16f</link>
</item>
<item>
<title>Arizona State University Twitter Data Set  (Dataset)</title>
<description>@article{,
title= {Arizona State University Twitter Data Set },
journal= {},
author= {R. Zafarani and H. Liu},
year= {2009},
institution= {Arizona State University, School of Computing, Informatics and Decision Systems Engineering},
url= {http://socialcomputing.asu.edu/datasets/Twitter},
abstract= {Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one neat and simple package. It's a new and easy way to discover the latest news related to subjects you care about.

|Attribute|Value|
|-|-|
|Number of Nodes: |11316811|
|Number of Edges: |85331846|
|Missing Values? |no|
|Source:| N/A|

##Data Set Information:

1. nodes.csv
-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains
all the node ids used in the dataset

2. edges.csv
-- this is the friendship/followership network among the users. The friends/followers are represented using edges. Edges are directed. 

Here is an example. 

1,2

This means user with id "1" is followering user with id "2".


##Attribute Information:

Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one neat and simple package. It's a new and easy way to discover the latest news related to subjects you care about.},
keywords= {ASU, Twitter, Social, Graph},
terms= {}
}

</description>
<link>https://academictorrents.com/download/2399616d26eeb4ae9ac3d05c7fdd98958299efa9</link>
</item>
<item>
<title>UCI Machine Learning Datasets 12/2013 (Dataset)</title>
<description>@article{,
title= {UCI Machine Learning Datasets 12/2013},
journal= {},
author= {UCI },
year= {2013},
url= {},
abstract= {The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and this project is in collaboration with Rexa.info at the University of Massachusetts Amherst. Funding support from the National Science Foundation is gratefully acknowledged.

Many people deserve thanks for making the repository a success. Foremost among them are the donors and creators of the databases and data generators. Special thanks should also go to the past librarians of the repository: David Aha, Patrick Murphy, Christopher Merz, Eamonn Keogh, Cathy Blake, Seth Hettich, and David Newman.},
keywords= {UCI},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/7fafb101f9c7961f9b840daeb4af43039107ddef</link>
</item>
<item>
<title>Visual Object Classes Challenge 2012 Dataset (VOC2012) VOCtrainval_11-May-2012.tar (Dataset)</title>
<description>@article{,
title= {Visual Object Classes Challenge 2012 Dataset (VOC2012) VOCtrainval_11-May-2012.tar},
journal= {},
author= {Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
year= {2012},
url= {http://host.robots.ox.ac.uk/pascal/VOC/voc2012/},
abstract= {##Introduction
The main goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

* Person: person
* Animal: bird, cat, cow, dog, horse, sheep
* Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
* Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

There are three main object recognition competitions: classification, detection, and segmentation, a competition on action classification, and a competition on large scale recognition run by ImageNet. In addition there is a "taster" competition on person layout.

##Classification/Detection Competitions

Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image.
Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image.
 
20 classes
![](http://i.imgur.com/WmLRN4p.png)

* aeroplane
* bicycle
* bird
* boat
* bottle
* bus
* car
* cat
* chair
* cow
* dining table
* dog
* horse
* motorbike
* person
* potted plant
* sheep
* sofa
* train
* tv/monitor
 
Participants may enter either (or both) of these competitions, and can choose to tackle any (or all) of the twenty object classes. The challenge allows for two approaches to each of the competitions:

1. Participants may use systems built or trained using any methods or data excluding the provided test sets.
2. Systems are to be built or trained using only the provided training/validation data.

The intention in the first case is to establish just what level of success can currently be achieved on these problems and by what method; in the second case the intention is to establish which method is most successful given a specified training set.





Segmentation Competition

Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise.

![](https://i.imgur.com/ek0NbVK.png)
 
##Action Classification Competition

Action Classification: Predicting the action(s) being performed by a person in a still image.
 
![](https://i.imgur.com/w8tr9hs.png)

* jumping
* phoning
* playinginstrument
* reading
* ridingbike
* ridinghorse
* running
* takingphoto
* usingcomputer
* walking
 
In 2012 there are two variations of this competition, depending on how the person whose actions are to be classified is identified in a test image: (i) by a tight bounding box around the person; (ii) by only a single point located somewhere on the body. The latter competition aims to investigate the performance of methods given only approximate localization of a person, as might be the output from a generic person detector.

##ImageNet Large Scale Visual Recognition Competition

The goal of this competition is to estimate the content of photographs for the purpose of retrieval and automatic annotation using a subset of the large hand-labeled ImageNet dataset (10,000,000 labeled images depicting 10,000+ object categories) as training. Test images will be presented with no initial annotation - no segmentation or labels - and algorithms will have to produce labelings specifying what objects are present in the images. In this initial version of the challenge, the goal is only to identify the main objects present in images, not to specify the location of objects.

Further details can be found at the ImageNet website.

##Person Layout Taster Competition
Person Layout: Predicting the bounding box and label of each part of a person (head, hands, feet).
 
![](https://i.imgur.com/Hphaauf.png)

##Data

To download the training/validation data, see the development kit.

The training data provided consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the twenty classes present in the image. Note that multiple objects from multiple classes may be present in the same image. Annotation was performed according to a set of guidelines distributed to all annotators.

A subset of images are also annotated with pixel-wise segmentation of each object present, to support the segmentation competition.

Images for the action classification task are disjoint from those of the classification/detection/segmentation tasks. They have been partially annotated with people, bounding boxes, reference points and their actions. Annotation was performed according to a set of guidelines distributed to all annotators.

Images for the person layout taster, where the test set is disjoint from the main tasks, have been additionally annotated with parts of the people (head/hands/feet).

The data will be made available in two stages; in the first stage, a development kit will be released consisting of training and validation data, plus evaluation software (written in MATLAB). One purpose of the validation set is to demonstrate how the evaluation software works ahead of the competition submission.

In the second stage, the test set will be made available for the actual competition. As in the VOC2008-2011 challenges, no ground truth for the test data will be released.

The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. Statistics of the database are online.
},
keywords= {VOC},
terms= {}
}
</description>
<link>https://academictorrents.com/download/df0aad374e63b3214ef9e92e178580ce27570e59</link>
</item>
<item>
<title>Crater Detection via Genetic Search Methods to Reduce Image Features (Paper)</title>
<description>@article{cohen2013crater,
title= {Crater Detection via Genetic Search Methods to Reduce Image Features},
author= {Joseph Paul Cohen and Wei Ding},
journal= {Advances in Space Research},
year= {2013},
publisher= {Elsevier},
abstract= {Recent approaches to crater detection have been inspired by face detection's use of gray-scale texture features. Using gray-scale texture features for supervised machine learning crater detection algorithms provides better classification of craters in planetary images than previous methods. When using Haar features it is typical to generate thousands of numerical values from each candidate crater image. This magnitude of image features to extract and consider can spell disaster when the application is an entire planetary surface. One solution is to reduce the number of features extracted and considered in order to increase accuracy as well as speed. Feature subset selection provides the operational classifiers with a concise and denoised set of features by reducing irrelevant and redundant features. Feature subset selection is known to be NP-hard. To provide an efficient suboptimal solution, four genetic algorithms are proposed to use greedy selection, weighted random selection, and simulated annealing to distinguish discriminate features from indiscriminate features. Inspired by analysis regarding the relationship between subset size and accuracy, a squeezing algorithm is presented to shrink the genetic algorithm's chromosome cardinality during the genetic iterations. A significant increase in the classification performance of a Bayesian classifier in crater detection using image texture features is observed.},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/8ae530c0c1466ba8feee9914236cc900ad2f708e</link>
</item>
<item>
<title>Bernoulli trials based feature selection for crater detection (Paper)</title>
<description>@inproceedings{liu2011bernoulli,
  title={Bernoulli trials based feature selection for crater detection},
  author={Liu, Siyi and Ding, Wei and Cohen, Joseph Paul and Simovici, Dan and Stepinski, Tomasz},
  booktitle={Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems},
  pages={461--464},
  year={2011},
  organization={ACM},
abstract = {Counting craters is a fundamental task of planetary sci-ence because it provides the only tool for measuring relativeages of planetary surfaces. However, advances in surveyingcraters present in data gathered by planetary probes havenot kept up with advances in data collection. One chal-lenge of auto-detecting craters in images is to identify an images features that discriminate it between craters andother surface objects. The problem of optimal feature se-lection is known to be NP-hard and the search is compu-tationally intractable. In this paper we propose a wrapperbased randomized feature selection method to efficiently se-lect relevant features for crater detection. We design andimplement a dynamic programming algorithm to search fora relevant feature subset by removing irrelevant features andminimizing a cost objective function simultaneously. In or-der to only remove irrelevant features we use Bernoulli Tri-als to calculate the probability of such a case using the costfunction. Our proposed algorithms are empirically evaluatedon a large high-resolution Martian image exhibiting a heav-ily cratered Martian terrain characterized by heterogeneoussurface morphology. The experimental results demonstratethat the proposed approach achieves a higher accuracy thanother existing randomized approaches to a large extent withless runtime.}
}</description>
<link>https://academictorrents.com/download/37499de2b944dacc88cd295d3f9631670bd6abe6</link>
</item>
<item>
<title>Crater Dataset (Dataset)</title>
<description>@article{,
title= {Crater Dataset},
journal= {},
author= {UMass Boston KDLab},
year= {2013},
url= {http://kdl.cs.umb.edu/w/datasets/craters/},
abstract= {Dataset Objective:
Determine if the instance is a crater or not a crater. 1=Crater, 0=Not Crater

Data Set Information:
This dataset was generated using HRSC nadir panchromatic image h0905_0000 taken by the Mars Express spacecraft. The images is located in the Xanthe Terra, centered on Nanedi Vallis and covers mostly Noachian terrain on Mars. The image had a resolution of 12.5 meters/pixel.

Data Set Generation:

Using the technique described by L. Bandeira (Bandeira, Ding, Stepinski. 2010.Automatic Detection of Sub-km Craters Using Shape and Texture Information) we identify crater candidates in the image using the pipeline depicted in the figure below. Each crater candidate image block is normalized to a standard scale of 48 pixels. Each of the nine kinds of image masks probes the normalized image block in four different scales of 12 pixels, 24 pixels, 36 pixels, and 48 pixels, with a step of a third of the mask size (meaning 2/3 overlap). We totally extract 1,090 Haar-like attributes using nine types of masks as the attribute vectors to represent each crater candidate.
The dataset was converted to the Weka ARFF format by Joseph Paul Cohen in 2012.},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/30748b1a7ac99b1c5ff66f0bc5c5f7428ed035c5</link>
</item>
<item>
<title>Effectiveness of Cybersecurity Competitions (Paper)</title>
<description>@article{cheungeffectiveness,
  title={Effectiveness of Cybersecurity Competitions},
  author={Cheung, Ronald S and Cohen, Joseph Paul and Lo, Henry Z and Elia, Fabio and Carrillo-Marquez, Veronica},
abstract = {There has been a heightened interest among U.S.government agencies to fund cybersecurity workforce development. These efforts include offering universities funding forstudent scholarships, funding for building capacity in cybersecurity education, as well as sponsoring cybersecurity competitions, games, and outreach programs. This paper examines the effectiveness of cybersecurity competitions in educating students.Our study shows that though competitions do pique students interest, the effectiveness of this approach in producing more high quality professionals can be limited. One reason is that the knowledge barrier to compete in these competitions is high. To be successful, students have to be proficient in operating systems,application services, software engineering, system administration and networking. Many Computer Science and InformationTechnology students do not feel qualified, and consequently this reduces participation from a wider student audience. Our approach takes aims at lowering this barrier to entry. We employ a hands-on learning methodology where students attend lectures on background knowledge on weekdays and practice what they learn in weekend workshops. A virtual networking environment is provided for students to practice network defense in the workshops and on their own time}
}</description>
<link>https://academictorrents.com/download/30ec3bb79d95e4af3b92315a5a073fb10ec8a87d</link>
</item>
<item>
<title>Mars Weekend: A Panel and Games at the Museum of Science Boston (Paper)</title>
<description>@inproceedings{cohen2012mars,
  title={Mars Weekend: A Panel and Games at the Museum of Science Boston},
  author={Cohen, JP and Ding, W and Sable, J and Li, R and Stepinski, T},
  booktitle={Lunar and Planetary Institute Science Conference Abstracts},
  volume={43},
  pages={1023},
  year={2012},
abstract = {This ongoing outreach project uniquely combines the data, systems, and resources of four existing NASA funded research projects on Mars robotic navigation (MER Participating Scientist project and ExoMars Pan- Cam project), intelligent Mars data processing (AISR Crater Detection project), and Lunar mapping (LRO Par- ticipating Scientist project). The project aims to stimu- late the public excitement about Mars and Lunar science and exploration and to enrich the public with expertise developed at The Ohio State University (OSU), the University of Massachusetts Boston (UMB), and the Lunar and Planetary Institute (LPI) through our outreach partner, the Museum of Science, Boston.}
}</description>
<link>https://academictorrents.com/download/b0700675b5b7756ba6243420a9db09380a5d27b2</link>
</item>
<item>
<title>Genetically Enhanced Feature Selection of Discriminative Planetary Crater Image Features (Paper)</title>
<description>@inproceedings{Cohen:2011:GEF:2188812.2188820,
 author = {Cohen, Joseph Paul and Liu, Siyi and Ding, Wei},
 title = {Genetically Enhanced Feature Selection of Discriminative Planetary Crater Image Features},
 booktitle = {Proceedings of the 24th International Conference on Advances in Artificial Intelligence},
 series = {AI'11},
 year = {2011},
 isbn = {978-3-642-25831-2},
 location = {Perth, Australia},
 pages = {61--71},
 numpages = {11},
 url = {http://dx.doi.org/10.1007/978-3-642-25832-9_7},
 doi = {10.1007/978-3-642-25832-9_7},
 acmid = {2188820},
 publisher = {Springer-Verlag},
 address = {Berlin, Heidelberg},
 keywords = {bayesian classifier, crater detection, genetic algorithms, machine learning},
abstract = {Using gray-scale texture features has recently become a new trend in supervised machine learning crater detection algorithms. To provide better classification of craters in planetary images, feature subset selection is used to reduce irrelevant and redundant features. Feature selection is known to be NP-hard. To provide an efficient suboptimal solution, three genetic algorithms are proposed to use greedy selection, weighted random selection, and simulated annealing to distinguish discriminate features from indiscriminate features. A significant increase in the classification ability of a Bayesian classifier in crater detection using image texture features.}
}</description>
<link>https://academictorrents.com/download/cb1655a57dd24345c9ea7a43c5ec09e03c7a0979</link>
</item>
</channel>
</rss>
