<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:academictorrents="http://academictorrents.com/" version="2.0">
<channel>
<title>Deep Learning - Academic Torrents</title>
<description>collection curated by joecohen</description>
<link>https://academictorrents.com/collection/deep-learning</link>
<item>
<title>COCO 2017 Resized to 256x256 (Dataset)</title>
<description>@article{,
title= {COCO 2017 Resized to 256x256},
keywords= {},
author= {},
abstract= {COCO: Common Objects in Context

Resized to 256x245},
terms= {},
license= {},
superseded= {},
url= {http://cocodataset.org/}
}

</description>
<link>https://academictorrents.com/download/eea5a532dd69de7ff93d5d9c579eac55a41cb700</link>
</item>
<item>
<title>MS-Celeb-1M: {A} Dataset and Benchmark for Large-Scale Face Recognition (Dataset)</title>
<description>@article{dblp:journals/corr/guozhhg16,
author= {Yandong Guo and               Lei Zhang and               Yuxiao Hu and               Xiaodong He and               Jianfeng Gao},
title= {MS-Celeb-1M: {A} Dataset and Benchmark for Large-Scale Face Recognition},
journal= {CoRR},
volume= {abs/1607.08221},
year= {2016},
url= {http://arxiv.org/abs/1607.08221},
archiveprefix= {arXiv},
eprint= {1607.08221},
timestamp= {Mon, 13 Aug 2018 16:46:27 +0200},
biburl= {https://dblp.org/rec/bib/journals/corr/GuoZHHG16},
bibsource= {dblp computer science bibliography, https://dblp.org},
abstract= {In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base. More specifically, we propose a benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individual on the web as training data. The rich information provided by the knowledge base helps to conduct disambiguation and improve the recognition accuracy, and contributes to various real-world applications, such as image captioning and news video analysis. Associated with this task, we design and provide concrete measurement set, evaluation protocol, as well as training data. We also present in details our experiment setup and report promising baseline results. Our benchmark task could lead to one of the largest classification problems in computer vision. To the best of our knowledge, our training dataset, which contains 10M images in version 1, is the largest publicly available one in the world.
},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/9e67eb7cc23c9417f39778a8e06cca5e26196a97</link>
</item>
<item>
<title>ImageNet Large Scale Visual Recognition Challenge (V2017) (Dataset)</title>
<description>@article{ilsvrc15,
author= {Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei},
title= {ImageNet Large Scale Visual Recognition Challenge (V2017)},
year= {2015},
journal= {International Journal of Computer Vision (IJCV)},
doi= {10.1007/s11263-015-0816-y},
volume= {115},
number= {3},
pages= {211-252},
abstract= {},
keywords= {ILSVRC2017, ILSVRC, ImageNet, MLPerf},
terms= {},
license= {},
superseded= {},
url= {}
}

</description>
<link>https://academictorrents.com/download/943977d8c96892d24237638335e481f3ccd54cfb</link>
</item>
<item>
<title>Electron Microscopy (CA1 hippocampus) Dataset (Dataset)</title>
<description>@article{,
title= {Electron Microscopy (CA1 hippocampus) Dataset},
keywords= {},
author= {},
abstract= {The dataset available for download on this webpage represents a 5x5x5µm section taken from the CA1 hippocampus region of the brain, corresponding to a 1065x2048x1536 volume. The resolution of each voxel is approximately 5x5x5nm. The data is provided as multipage TIF files that can be loaded in Fiji.

![](https://i.imgur.com/rTCKgHn.png)

![](https://i.imgur.com/DkDkaMH.gif)

We annotated mitochondria in two sub-volumes. Each sub-volume consists of the first 165 slices of the 1065x2048x1536 image stack. The volume used for training our algorithm in the publications mentionned at the bottom of this page is the top part while the bottom part was used for testing.

Although our line of research was primarily motivated by the need to accurately segment mitochondria and synapses, other structures are of interest for neuroscientists such as vesicles or cell boundaries. This dataset was acquired by Graham Knott and Marco Cantoni at EPFL. It is made publicly available in the hope of encouraging similar sharing of useful data amongst researchers and also accelerating neuroscientific research.

For further information, please visit http://cvlab.epfl.ch/research/medical/em/mitochondria.

```
total 3.7G
124M testing_groundtruth.tif
124M testing.tif
124M training_groundtruth.tif
124M training.tif
3.2G volumedata.tif
```

### References

A. Lucchi Y. Li and P. Fua, Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets, Conference on Computer Vision and Pattern Recognition, 2013.
 
A. Lucchi, K.Smith, R. Achanta, G. Knott, P. Fua, Supervoxel-Based Segmentation of Mitochondria in EM Image Stacks with Learned Shape Features, IEEE Transactions on Medical Imaging, Vol. 30, Nr. 11, October 2011.
},
terms= {},
license= {},
superseded= {},
url= {https://cvlab.epfl.ch/data/em}
}

</description>
<link>https://academictorrents.com/download/3ada3ae6ec71097e63d897cf878051bba3eaba25</link>
</item>
<item>
<title>Animals with Attributes 2 (AwA2) dataset (Dataset)</title>
<description>@article{,
title= {Animals with Attributes 2 (AwA2) dataset},
keywords= {},
author= {},
abstract= {This dataset provides a platform to benchmark transfer-learning algorithms, in particular attribute base classification and zero-shot learning [1]. It can act as a drop-in replacement to the original Animals with Attributes (AwA) dataset [2,3], as it has the same class structure and almost the same characteristics. 

It consists of 37322 images of 50 animals classes with pre-extracted feature representations for each image. The classes are aligned with Osherson's classical class/attribute matrix [3,4], thereby providing 85 numeric attribute values for each class. Using the shared attributes, it is possible to transfer information between different classes. 
The image data was collected from public sources, such as Flickr, in 2016. In the process we made sure to only include images that are licensed for free use and redistribution, please see the archive for the individual license files.

![](https://cvml.ist.ac.at/AwA2/awa2_banner.jpg)


### Publications

Please cite the following paper when using the dataset:

[1] Y. Xian, C. H. Lampert, B. Schiele, Z. Akata. "Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly" arXiv:1707.00600 [cs.CV]
Attribute based classification and the original Animals with Attributes (AwA) data is described in:

[2] C. H. Lampert, H. Nickisch, and S. Harmeling. "Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer". In CVPR, 2009 (pdf)

[3] C. H. Lampert, H. Nickisch, and S. Harmeling. "Attribute-Based Classification for Zero-Shot Visual Object Categorization". IEEE T-PAMI, 2013 (pdf)
The class/attribute matrix was originally created by:

[4] D. N. Osherson, J. Stern, O. Wilkie, M. Stob, and E. E. Smith. "Default probability". Cognitive Science, 15(2), 1991.

[5] C. Kemp, J. B. Tenenbaum, T. L. Griffiths, T. Yamada, and N. Ueda. "Learning systems of concepts with an infinite relational model". In AAAI, 2006.},
terms= {},
license= {},
superseded= {},
url= {https://cvml.ist.ac.at/AwA2/}
}

</description>
<link>https://academictorrents.com/download/1490aec815141cdb50a32b81ef78b1eaf6b38b03</link>
</item>
<item>
<title>Small Object Dataset (Dataset)</title>
<description>@article{,
title= {Small Object Dataset},
keywords= {},
author= {Zheng Ma and Lei Yu and Antoni B. Chan},
abstract= {Images of small objects for small instance detections.  Currently four object types are available.

![](http://visal.cs.cityu.edu.hk/wp/wp-content/uploads/smallobject.jpg)

We collect four datasets of small objects from images/videos
on the Internet (e.g.YouTube or Google).

Fly Dataset: contains 600 video frames with an average
of 86 ± 39 flies per frame (648×72 @ 30 fps). 32 images
are used for training (1:6:187) and 50 images for testing
(301:6:600).

Honeybee Dataset: contains 118 images with an average
of 28 ± 6 honeybees per image (640×480). The dataset is
divided evenly for training and test sets. Only the first 32
images are used for training.

Fish Dataset: contains 387 frames of video with an average
of 56±9 fish per frame (300×410 @ 30 fps). 32 images
are used for training (1:3:94) and 65 for testing (193:3:387).

Seagull Dataset: contains three high-resolution images
(624×964) with an average of 866±107 seagulls per image.
The first image is used for training, and the rest for testing.

Cite this paper: http://visal.cs.cityu.edu.hk/static/pubs/conf/cvpr15-densdet.pdf
},
terms= {},
license= {},
superseded= {},
url= {http://visal.cs.cityu.edu.hk/downloads/smallobjects/}
}

</description>
<link>https://academictorrents.com/download/8e751c111cf90123374b5f0cf61e6af9f5e5231e</link>
</item>
<item>
<title>Downsampled ImageNet 32x32 (Dataset)</title>
<description>@article{,
title= {Downsampled ImageNet 32x32},
keywords= {},
author= {Aaron van den Oord and Nal Kalchbrenner and Koray Kavukcuoglu},
abstract= {This page includes downsampled ImageNet images, which can be used for density estimation and generative modeling experiments. Images come in two resolutions: 32x32 and 64x64, and were introduced in Pixel Recurrent Neural Networks. Please refer to the Pixel RNN paper for more details and results. 

![](https://i.imgur.com/s6gdDuX.jpg)},
terms= {},
license= {},
superseded= {},
url= {http://image-net.org/small/download.php}
}

</description>
<link>https://academictorrents.com/download/bf62f5051ef878b9c357e6221e879629a9b4b172</link>
</item>
<item>
<title>Downsampled ImageNet 64x64 (Dataset)</title>
<description>@article{,
title= {Downsampled ImageNet 64x64},
keywords= {},
author= {Aaron van den Oord and Nal Kalchbrenner and Koray Kavukcuoglu},
abstract= {This page includes downsampled ImageNet images, which can be used for density estimation and generative modeling experiments. Images come in two resolutions: 32x32 and 64x64, and were introduced in Pixel Recurrent Neural Networks. Please refer to the Pixel RNN paper for more details and results. 

![](https://i.imgur.com/s6gdDuX.jpg)},
terms= {},
license= {},
superseded= {},
url= {http://image-net.org/small/download.php}
}

</description>
<link>https://academictorrents.com/download/96816a530ee002254d29bf7a61c0c158d3dedc3b</link>
</item>
<item>
<title>MNIST Database (mnist.pkl.gz) (Dataset)</title>
<description>@article{,
title= {MNIST Database (mnist.pkl.gz)},
keywords= {mnist.pkl.gz},
journal= {},
author= {Christopher J.C. Burges and Yann LeCun and Corinna Cortes },
year= {},
url= {},
license= {},
abstract= {The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

With some classification methods (particuarly template-based methods, such as SVM and K-nearest neighbors), the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications.

The MNIST database was constructed from NIST's Special Database 3 and Special Database 1 which contain binary images of handwritten digits. NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected among Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database by mixing NIST's datasets.

The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made sure that the sets of writers of the training set and test set were disjoint.

SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly 30,000 examples each. The new training set was completed with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,000 from SD-3) is available on this site. The full 60,000 sample training set is available.

Many methods have been tested with this training set and test set. Here are a few examples. Details about the methods are given in an upcoming paper. Some of those experiments used a version of the database where the input images where deskewed (by computing the principal axis of the shape that is closest to the vertical, and shifting the lines so as to make it vertical). In some other experiments, the training set was augmented with artificially distorted versions of the original training samples. The distortions are random combinations of shifts, scaling, skewing, and compression. 


},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/323a0048d87ca79b68f12a6350a57776b6a3b7fb</link>
</item>
<item>
<title>MXNet pre-trained model Full ImageNet Network inception-21k.tar.gz (Dataset)</title>
<description>@article{,
title= {MXNet pre-trained model Full ImageNet Network inception-21k.tar.gz},
keywords= {},
journal= {},
author= {dmlc},
year= {},
url= {https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-21k-inception.md},
license= {},
abstract= {# Full ImageNet Network

This model is a pretrained model on full imagenet dataset [1] with 14,197,087 images in 21,841 classes. The model is trained by only random crop and mirror augmentation.

The network is based on Inception-BN network [2], and added more capacity. This network runs roughly 2 times slower than standard Inception-BN Network.

We trained this network on a machine with 4 GeForce GTX 980 GPU. Each round costs 23 hours, the released model is the 9 round.

Train Top-1 Accuracy over 21,841 classes: 37.19%

Single image prediction memory requirement: 15MB

ILVRC2012 Validation Performance:

|        | Over 1,000 classes | Over 21,841 classes |
| ------ | ------------------ | ------------------- |
| Top-1  | 68.3%              | 41.9%               |
| Top-5  | 89.0%              | 69.6%               |
| Top=20 | 96.0%              | 83.6%               |


Note: Directly use 21k prediction may lose diversity in output. You may choose a subset from 21k to make perdiction more reasonable.

The compressed file contains:
- ```Inception-symbol.json```: symbolic network
- ```Inception-0009.params```: network parameter
- ```synset.txt```: prediction label/text mapping

There is no mean image file for this model. We use ```mean_r=117```, ```mean_g=117``` and ```mean_b=117``` to noramlize the image.

##### Reference:

[1] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." *Computer Vision and Pattern Recognition*, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009.

[2] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." *arXiv preprint arXiv:1502.03167* (2015).},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/27330fbd1ec0648e72b2cf5c40aa0d4df1931221</link>
</item>
<item>
<title>PASCAL Visual Object Classes Challenge 2012 (VOC2012) Complete Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL Visual Object Classes Challenge 2012 (VOC2012) Complete Dataset},
journal= {},
author= {Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
year= {},
url= {},
abstract= {Introduction

The main goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

Data

To download the training/validation data, see the development kit.

The training data provided consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the twenty classes present in the image. Note that multiple objects from multiple classes may be present in the same image. Annotation was performed according to a set of guidelines distributed to all annotators.

A subset of images are also annotated with pixel-wise segmentation of each object present, to support the segmentation competition.

Images for the action classification task are disjoint from those of the classification/detection/segmentation tasks. They have been partially annotated with people, bounding boxes, reference points and their actions. Annotation was performed according to a set of guidelines distributed to all annotators.

Images for the person layout taster, where the test set is disjoint from the main tasks, have been additionally annotated with parts of the people (head/hands/feet).

The data will be made available in two stages; in the first stage, a development kit will be released consisting of training and validation data, plus evaluation software (written in MATLAB). One purpose of the validation set is to demonstrate how the evaluation software works ahead of the competition submission.

In the second stage, the test set will be made available for the actual competition. As in the VOC2008-2011 challenges, no ground truth for the test data will be released.

The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. Statistics of the database are online.},
keywords= {},
terms= {The VOC2012 data includes images obtained from the "flickr" website. Use of these images must respect the corresponding terms of use:

"flickr" terms of use
For the purposes of the challenge, the identity of the images in the database, e.g. source and name of owner, has been obscured. Details of the contributor of each image can be found in the annotation to be included in the final release of the data, after completion of the challenge. Any queries about the use or ownership of the data should be addressed to the organizers.},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/f6ddac36ac7ae2ef79dc72a26a065b803c9c7230</link>
</item>
<item>
<title>PASCAL Visual Object Classes Challenge 2011 (VOC2011) Complete Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL Visual Object Classes Challenge 2011 (VOC2011) Complete Dataset},
journal= {},
author= {Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
year= {},
url= {},
abstract= {Introduction

The goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor


Data

To download the training/validation data, see the development kit.

The training data provided consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the twenty classes present in the image. Note that multiple objects from multiple classes may be present in the same image. Some example images can be viewed online. A subset of images are also annotated with pixel-wise segmentation of each object present, to support the segmentation competition. Some segmentation examples can be viewed online.

Annotation was performed according to a set of guidelines distributed to all annotators.

The data will be made available in two stages; in the first stage, a development kit will be released consisting of training and validation data, plus evaluation software (written in MATLAB). One purpose of the validation set is to demonstrate how the evaluation software works ahead of the competition submission.

In the second stage, the test set will be made available for the actual competition. As in the VOC2008-2010 challenges, no ground truth for the test data will be released.

The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. In total there are 28,952 images. Further statistics are online.

},
keywords= {},
terms= {The VOC2011 data includes images obtained from the "flickr" website. Use of these images must respect the corresponding terms of use:

"flickr" terms of use
For the purposes of the challenge, the identity of the images in the database, e.g. source and name of owner, has been obscured. Details of the contributor of each image can be found in the annotation to be included in the final release of the data, after completion of the challenge. Any queries about the use or ownership of the data should be addressed to the organizers.}
}

</description>
<link>https://academictorrents.com/download/408e318ba27031a533c709b7d696e34637bcfc0e</link>
</item>
<item>
<title>PASCAL Visual Object Classes Challenge 2010 (VOC2010) Complete Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL Visual Object Classes Challenge 2010 (VOC2010) Complete Dataset},
journal= {},
author= {Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
year= {},
url= {},
abstract= {Introduction

The goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

Data

To download the training/validation data, see the development kit.

The training data provided consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the twenty classes present in the image. Note that multiple objects from multiple classes may be present in the same image. Some example images can be viewed online. A subset of images are also annotated with pixel-wise segmentation of each object present, to support the segmentation competition. Some segmentation examples can be viewed online.

Annotation was performed according to a set of guidelines distributed to all annotators.

The data will be made available in two stages; in the first stage, a development kit will be released consisting of training and validation data, plus evaluation software (written in MATLAB). One purpose of the validation set is to demonstrate how the evaluation software works ahead of the competition submission.

In the second stage, the test set will be made available for the actual competition. As in the VOC2008/VOC2009 challenges, no ground truth for the test data will be released.

The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. In total there are 21,738 images. Further statistics are online.

Best Practice

The VOC challenge encourages two types of participation: (i) methods which are trained using only the provided "trainval" (training + validation) data; (ii) methods built or trained using any data except the provided test data, for example commercial systems. In both cases the test data must be used strictly for reporting of results alone - it must not be used in any way to train or tune systems, for example by runing multiple parameter choices and reporting the best results obtained.

If using the training data we provide as part of the challenge development kit, all development, e.g. feature selection and parameter tuning, must use the "trainval" (training + validation) set alone. One way is to divide the set into training and validation sets (as suggested in the development kit). Other schemes e.g. n-fold cross-validation are equally valid. The tuned algorithms should then be run only once on the test data.

In VOC2007 we made all annotations available (i.e. for training, validation and test data) but since then we have not made the test annotations available. Instead, results on the test data are submitted to an evaluation server.

Since algorithms should only be run once on the test data we strongly discourage multiple submissions to the server (and indeed the number of submissions for the same algorithm is strictly controlled), as the evaluation server should not be used for parameter tuning.

We encourage you to publish test results always on the latest release of the challenge, using the output of the evaluation server. If you wish to compare methods or design choices e.g. subsets of features, then there are two options: (i) use the entire VOC2007 data, where all annotations are available; (ii) report cross-validation results using the latest "trainval" set alone.

},
keywords= {},
terms= {The VOC2010 data includes images obtained from the "flickr" website. Use of these images must respect the corresponding terms of use:

"flickr" terms of use
For the purposes of the challenge, the identity of the images in the database, e.g. source and name of owner, has been obscured. Details of the contributor of each image can be found in the annotation to be included in the final release of the data, after completion of the challenge. Any queries about the use or ownership of the data should be addressed to the organizers.}
}

</description>
<link>https://academictorrents.com/download/96db21675f464480780637f1416477ac14a81107</link>
</item>
<item>
<title>PASCAL Visual Object Classes Challenge 2009 (VOC2009) Complete Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL Visual Object Classes Challenge 2009 (VOC2009) Complete Dataset},
journal= {},
author= {Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
year= {2009},
url= {http://host.robots.ox.ac.uk/pascal/VOC/voc2009/index.html},
abstract= {Introduction

The goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor


Data

To download the training/validation data, see the development kit.

The training data provided consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the twenty classes present in the image. Note that multiple objects from multiple classes may be present in the same image. Some example images can be viewed online. A subset of images are also annotated with pixel-wise segmentation of each object present, to support the segmentation competition. Some segmentation examples can be viewed online.

Annotation was performed according to a set of guidelines distributed to all annotators.

The data will be made available in two stages; in the first stage, a development kit will be released consisting of training and validation data, plus evaluation software (written in MATLAB). One purpose of the validation set is to demonstrate how the evaluation software works ahead of the competition submission.

In the second stage, the test set will be made available for the actual competition. As in the VOC2008 challenge, no ground truth for the test data will be released.

The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. In total there are 14,743 images. Further statistics are online.

},
keywords= {},
terms= {The VOC2009 data includes images obtained from the "flickr" website. Use of these images must respect the corresponding terms of use:

"flickr" terms of use
For the purposes of the challenge, the identity of the images in the database, e.g. source and name of owner, has been obscured. Details of the contributor of each image can be found in the annotation to be included in the final release of the data, after completion of the challenge. Any queries about the use or ownership of the data should be addressed to the organizers.}
}
</description>
<link>https://academictorrents.com/download/e2209d95a13d364aad0811eacbf391a10c37d963</link>
</item>
<item>
<title>PASCAL Visual Object Classes Challenge 2008 (VOC2008) Complete Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL Visual Object Classes Challenge 2008 (VOC2008) Complete Dataset},
journal= {},
author= {Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
year= {2008},
url= {http://host.robots.ox.ac.uk/pascal/VOC//voc2008/index.html},
abstract= {Data

To download the training/validata data, see the development kit. In total there are 10,057 images [further statistics].

The training data provided consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the twenty classes present in the image. Note that multiple objects from multiple classes may be present in the same image. Some example images can be viewed online.

Annotation was performed according to a set of guidelines distributed to all annotators.

The data will be made available in two stages; in the first stage, a development kit will be released consisting of training and validation data, plus evaluation software (written in MATLAB). One purpose of the validation set is to demonstrate how the evaluation software works ahead of the competition submission.

In the second stage, the test set will be made available for the actual competition. As in the VOC2007 challenge, no ground truth for the test data will be released until after the challenge is complete.

The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. In total there are 10,057 images. Further statistics are online - statistics for the test data will be released after the challenge.

Development Kit

The development kit consists of the training/validation data, MATLAB code for reading the annotation data, support files, and example implementations for each competition.

Download the training/validation data (550MB tar file) - includes patch of 14-Jul-2008
Download the development kit code and documentation (250KB tar file)

Patch 14-Jul-08

There were errors in the 14-Apr-2008 release of the training/validation data as follows:

image labels in x_train/x_trainval.txt (classification task) did not include the "don't care" (zero) label
the test set for the main challenge (classification/detection) included images used for the layout challenge - these will be ignored in the evaluation
some images contained only "difficult" objects - these will be ignored in the evaluation (classification/detection)
The errors will not affect evaluation, but participants wanting to take advantage of the "don't care" label (without having to compute it themselves) should download the patch, which contains updated image lists, and can be untarred over the original development kit:

Running on VOC2007 test data

If at all possible, participants are requested to submit results for both the VOC2008 and VOC2007 test sets provided in the test data, to allow comparison of results across the years. In both cases, the VOC2008 training/validation data should be used for training i.e.

Train on VOC2008 train+val, test on VOC2008 test.
Train on VOC2008 train+val, test on VOC2007 test.
The updated development kit provides a switch to select between test sets. Results are placed in two directories, results/VOC2007/ or results/VOC2008/ according to the test set.

Publication Policy

The main mechanism for dissemination of the results will be the challenge webpage.

For VOC2008, the detailed output of each submitted method will be published online e.g. per-image confidence for the classification task, and bounding boxes for the detection task. The intention is to assist others in the community in carrying out detailed analysis and comparison with their own methods. The published results will not be anonymous - by submitting results, participants are agreeing to have their results shared online.

Acknowledgements

We gratefully acknowledge the following, who spent many long hours providing annotation for the VOC2008 database: Jan-Hendrik Becker, Patrick Buehler, Kian Ming Chai, Miha Drenik, Chris Engels, Jan Van Gemert, Hedi Harzallah, Nicolas Heess, Zdenek Kalal, Lubor Ladicky, Marcin Marszalek, Alastair Moore, Maria-Elena Nilsback, Paul Sturgess, David Tingdahl, Hirofumi Uemura, Martin Vogt.

Support

The preparation and running of this challenge is supported by the EU-funded PASCAL Network of Excellence on Pattern Analysis, Statistical Modelling and Computational Learning.},
keywords= {},
terms= {The VOC2008 data includes images obtained from the "flickr" website. Use of these images must respect the corresponding terms of use:

"flickr" terms of use
For the purposes of the challenge, the identity of the images in the database, e.g. source and name of owner, has been obscured. Details of the contributor of each image can be found in the annotation to be included in the final release of the data, after completion of the challenge. Any queries about the use or ownership of the data should be addressed to the organizers.}
}

</description>
<link>https://academictorrents.com/download/577c99c831a03753c38764201123cbc5e9e3c03b</link>
</item>
<item>
<title>PASCAL Visual Object Classes Challenge 2006 (VOC2006) Complete Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL Visual Object Classes Challenge 2006 (VOC2006) Complete Dataset},
journal= {},
author= {Mark Everingham},
year= {},
url= {},
abstract= {Details of the contributor of each image can be found in the file "contrib.txt" included in the database.

CategoriesViews of bicycles, buses, cats, cars, cows, dogs, horses, motorbikes, people, sheep in arbitrary pose.

Number of images5,304

Number of annotated images5,304

Object annotation statisticsTotal number of labelled objects = 9,507

Annotation notesThese images were collected from personal photographs, "flickr", and the Microsoft Research Cambridge database for the 2006 VOC challenge. All images are annotated with instances of all ten categories: bicycles, buses, cats, cars, cows, dogs, horses, motorbikes, people, sheep. 

AcknowledgementsFunding was provided by PASCAL.
Images were contributed and/or annotated by Moray Allen, James Bednar, Matthijs Douze, Mark Everingham, Stefan Harmeling, Juan Huo, Lindsay Hutchison, Fiona Jamieson, Maria-Elena Nilsback, John Quinn, Florian Schroff, Kira Smyllie, Mark Van Rossum, Chris Williams, John Winn, Andrew Zisserman.

Publications
M. Everingham, A. Zisserman, C. K. I. Williams, L. Van Gool. The 2006 PASCAL Visual Object Classes Challenge (VOC2006) Results.},
keywords= {},
terms= {By downloading the test data you are agreeing to abide by the licenses for the "flickr" and Microsoft Research Cambridge images contained in the database}
}

</description>
<link>https://academictorrents.com/download/db06b76152c0bf475af4093538e5a8d0e7971273</link>
</item>
<item>
<title>PASCAL Visual Object Classes Challenge 2005 (VOC2005) Complete Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL Visual Object Classes Challenge 2005 (VOC2005) Complete Dataset},
journal= {},
author= {Mark Everingham },
year= {},
url= {http://host.robots.ox.ac.uk/pascal/VOC//databases.html},
abstract= {CategoriesViews of motorbikes, bicycles, people, and cars in arbitrary pose.

Number of images1578 

Number of annotated images1578 

Object annotation statisticsTotal number of labelled objects =  2209

Annotation notesThe images in this database are a subset of the other image databases on this page. The images were manually selected as an "easier" dataset for the 2005 VOC challenge. Annotations were taken verbatim from the source databases.

AcknowledgementsImages in this database were taken from the TU-Darmstadt, Caltech, TU-Graz and UIUC databases. Additional images were provided by INRIA. Funding was provided by PASCAL.

PublicationsM. Everingham, A. Zisserman, C. K. I. Williams, L. Van Gool, et al. The 2005 PASCAL Visual Object Classes Challenge. In Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment., eds. J. Quinonero-Candela, I. Dagan, B. Magnini, and F. d'Alche-Buc, LNAI 3944, pages 117-176, Springer-Verlag, 2006. },
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/f758e9f976e3742b1349bf4b42e985b6ce1299ce</link>
</item>
<item>
<title>PASCAL Visual Object Classes Challenge 2007 (VOC2007) Complete Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL Visual Object Classes Challenge 2007 (VOC2007) Complete Dataset},
keywords= {VOC},
journal= {},
author= {Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
year= {},
url= {http://host.robots.ox.ac.uk/pascal/VOC/voc2007/},
license= {},
abstract= {==Introduction

The goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

Person: person
Animal: bird, cat, cow, dog, horse, sheep
Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor
There will be two main competitions, and two smaller scale "taster" competitions.

==Main Competitions

Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image.
Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image.
 
20 classes: aeroplanebicyclebirdboatbottlebuscarcatchaircow dining tabledoghorsemotorbikepersonpotted plantsheepsofatraintv/monitor
 
Participants may enter either (or both) of these competitions, and can choose to tackle any (or all) of the twenty object classes. The challenge allows for two approaches to each of the competitions:

Participants may use systems built or trained using any methods or data excluding the provided test sets.
Systems are to be built or trained using only the provided training/validation data.
The intention in the first case is to establish just what level of success can currently be achieved on these problems and by what method; in the second case the intention is to establish which method is most successful given a specified training set.

Taster Competitions

Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise.

Person Layout: Predicting the bounding box and label of each part of a person (head, hands, feet).

Participants may enter either (or both) of these competitions.

The VOC2007 challenge has been organized following the successful VOC2006 and VOC2005 challenges. Compared to VOC2006 we have increased the number of classes from 10 to 20, and added the taster challenges. These tasters have been introduced to sample the interest in segmentation and layout.

==Data

The training data provided consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the twenty classes present in the image. Note that multiple objects from multiple classes may be present in the same image. Some example images can be viewed online.

Annotation was performed according to a set of guidelines distributed to all annotators. These guidelines can be viewed here.

The data will be made available in two stages; in the first stage, a development kit will be released consisting of training and validation data, plus evaluation software (written in MATLAB). One purpose of the validation set is to demonstrate how the evaluation software works ahead of the competition submission.

In the second stage, the test set will be made available for the actual competition. As in the VOC2006 challenge, no ground truth for the test data will be released until after the challenge is complete.

The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. In total there are 9,963 images, containing 24,640 annotated objects.},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/c9db37df1eb2e549220dc19f70f60f7786d067d4</link>
</item>
<item>
<title>Tiny Images Dataset (Dataset)</title>
<description>@article{,
title= {Tiny Images Dataset},
journal= {},
author= {Rob Fergus and Antonio Torralba and William T. Freeman},
year= {},
url= {http://horatio.cs.nyu.edu/mit/tiny/data/index.html},
abstract= {https://i.imgur.com/gWxLPJm.jpg

## Overview

This page has links for downloading the Tiny Images dataset, which consists of 79,302,017 images, each being a 32x32 color image. This data is stored in the form of large binary files which can be accessed by a Matlab toolbox that we have written. You will need around 400Gb of free disk space to store all the files. In total there are 5 files that need to be downloaded, 3 of which are large binary files consisting of (i) the images themselves; (ii) their associated metadata (filename, search engine used, ranking etc.); (iii) Gist descriptors for each image. The other two files are the Matlab toolbox and index data file that together let you easily load in data from the binaries. 


## Downloads

Note that these files are very large and will take a considerable time to download. Please ensure you have sufficient disk space before commencing the download. 

  1. Image binary (227Gb)

  2. Metadata binary (57Gb) 

  3. Gist binary (114Gb)

  4. Index data (7Mb) 

  5. Matlab Tiny Images toolbox (150Kb) 


## Overview

The 79 million images are stored in one giant binary file, 227Gb in size. The metadata accompanying each image is also in a single giant file, 57Gb in size. To read images/metadata from these files, we have provided some Matlab wrapper functions.

There are two versions of the functions for reading image data: 
* (i) loadTinyImages.m - plain Matlab function (no MEX), runs under 32/64bits. Loads images in by image number. Use this by default. 
* (ii) read_tiny_big_binary.m - Matlab wrapper for 64-bit MEX function. A bit faster and more flexible than (i), but requires a 64-bit machine. 

There are two types of annotation data: 
* (i) Manual annotation data, sorted in annotations.txt, that holds the label of images manually inspected to see if image content agrees with noun used to collect it. Some other information, such as search engine, is also stored. This data is available for only a very small portion of images.
* (ii) Automatic annotation data, stored in tiny_metadata.bin, consisting of information relating the gathering of the image, e.g. search engine, which page, url to thumbnail etc. This data is available for all 79 million images. 

## Requirements

1. Around 300Gb of disk space.

2. If you want to use the MEX versions of the code for reading in the data, you will need a 64-bit machine. But for most purposes, the Matlab implementation (loadTinyImages.m), which can use either 32 or 64bits will work perfectly well. To discover if you have a 32/64bit machine, type 'uname -a' in an xterm (if using linux). 

## Files

The .tgz file should contain 10 files

1. loadTinyImages.m -- read tiny image data, pure Matlab version.
2. loadGroundTruth.m -- read annotations.txt file holding manual annotations
3. read_tiny_big_binary.m -- read tiny image data, 64-bit Matlab/MEX version
4. read_tiny_big_metadata.m -- read tiny image metadata, 64-bit Matlab/MEX version
5. read_tiny_gist_binary.m -- read tiny Gist, 64-bit Matlab/MEX version
6. read_tiny_binary_big_core.c -- 64-bit MEX source code for image reading
7. read_tiny_metadata_big_core.c -- 64-bit MEX source code for metadata reading
8. read_tiny_binary_gist_core.c -- 64-bit MEX source code for gist reading
9. compute_hash_function.m -- utility function to do fast string searching as used by read_tiny_big_binary.m and read_tiny_big_metadata.m
10. fast_str2num.m -- utility function for -- -- read_tiny_big_metadata.m
11. annotations.txt -- text file holding list of annotated images
12. README.txt -- this file

Separately, you should have downloaded the following files

1. tiny_images.bin - 227Gb file holding 79,302,017 images
2. tiny_metadata.bin - 57Gb file holding metadata for all 79,302,017 images
3. tinygist80million.bin - 114Gb file holding 384-dim Gist descriptors for all 79,302,017 images
4. tiny_index.mat - 7Mb file holding index info, including:
        word - cell array of all 75,846 nouns for which we have images in tiny_images.bin
        num_imgs - vector with #images per noun for all 75,846 nouns 

## Preliminaries

Before the functions can be used you must do two things:

1. Set the absolute paths to the binary files in the Matlab functions. There are a total of 7 lines that must be set:

*(i) loadTinyImages.m, line 14 -- set path to tiny_images.bin file
*(ii) read_tiny_big_binary.m, line 40 -- set path to tiny_images.bin file 
*(iii) read_tiny_big_binary.m, line 42 -- set path to tiny_index.mat file 
*(iv) read_tiny_big_metadata.m, line 63 -- set path to tiny_metadata.bin file 
*(v) read_tiny_big_metadata.m, line 65 -- set path to tiny_index.mat file 
*(vi) read_tiny_gist_binary.m, line 36 -- set path to tiny_index.mat file 
*(vii) read_tiny_gist_binary.m, line 38 -- set path to tiny_metadata.bin file 

2. If using the MEX versions, they must be compiled with the commands:
*(i) mex read_tiny_binary_big_core.c
*(ii) mex read_tiny_metadata_big_core.c
*(iii) mex read_tiny_binary_gist_core.c

## Usage

Here are some examples of the scripts in use. Please look at the comments at the top of each file for more extensive explanations.

## loadTinyImages.m

% load in first 10 images from 79,302,017 images
img = loadTinyImages([1:10]);

% load in 10 images at random q = randperm(79302017);
img = loadTinyImages(q(1:10));
%% N.B. function does NOT sort indices, so sorting beforehand would
%% improve speed.


## loadGroundTruth.m

% read in contents of annotation.txt file
[imageFileName, keyword, correct, engine, ind_engine, image_ndx]=loadGroundTruth;
%%% the labeling convention in correct is:
% -1 = Incorrect, 0 = Skipped, 1 = Correct
% Note that this different to the 'label' field produced by % read_tiny_big_metadata below (meaning of -1 and 0 are swapped)
% but the annotation.txt file information should be used in preference to
% that from read_tiny_big_metadata.m


## read_tiny_big_metadata.m

% load in filenames of first 10 images
data = read_tiny_big_metadata([1:10],{'filename'});

% load in search engine used for
% first 10 images from noun 'aardvark';

data = read_tiny_big_metadata('aardvark',[1:10],{'engine'});

## read_tiny_big_binary.m

% load in first 10 images from 79,302,017 images
img = read_tiny_big_binary([1:10]);
% note output dimension is 3072x10, rather than 32x32x3x10 % as for loadTinyImages.m

% load in first 10 images from noun 'dog';
q = randperm(79302017);
img = read_tiny_big_binary('dog',q(1:10));
% function sorts indices internally for speed

% load in images for different nouns
img = read_tiny_big_binary({'dog','cat','mouse','pig'},{[1:5],[1:2:10],[8 13],[4:-1:1]});},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/03b779ffefa8efc30c2153f3330bb495bdc3e034</link>
</item>
<item>
<title>Labeled Faces in the Wild (Dataset)</title>
<description>@article{,
title= {Labeled Faces in the Wild},
journal= {},
author= {Gary B. Huang and Manu Ramesh and Tamara Berg and Erik Learned-Miller},
year= {2007},
url= {http://vis-www.cs.umass.edu/lfw/},
abstract= {Welcome to Labeled Faces in the Wild, a database of face photographs designed for studying the problem of unconstrained face recognition. The data set contains more than 13,000 images of faces collected from the web. Each face has been labeled with the name of the person pictured. 1680 of the people pictured have two or more distinct photos in the data set. The only constraint on these faces is that they were detected by the Viola-Jones face detector. More details can be found in the technical report below.

Information:
13233 images
5749 people
1680 people with two or more images

Citation:
Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments
Gary B. Huang and Manu Ramesh and Tamara Berg and Erik Learned-Miller
University of Massachusetts, Amherst - 2007

},
keywords= {Dataset, umass, lfw, faces, Amherst, massachusetts},
terms= {}
}

</description>
<link>https://academictorrents.com/download/9547ef95bc7007685afe52a8ec940aa61530bc99</link>
</item>
<item>
<title>PASCAL-S - The Secrets of Salient Object Segmentation Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL-S - The Secrets of Salient Object Segmentation Dataset},
journal= {},
author= {UCLA CCVL},
year= {},
url= {http://cbi.gatech.edu/salobj/},
abstract= {Free-fiewing fixations on a subset of 850 images from PASCAL VOC.  Collected on 8 subjects, 3s viewing time, Eyelink II eye tracker. The performance of most algorithms suggest that PASCAL-S is less biased than most of the saliency datasets.

850 IMAGES FROM PASCAL 2010
1296 OBJECT INSTANCES
12 SUBJECTS

```
Folders in archive:
algmaps/
algmaps/pascal
algmaps/pascal/mcg_gbvs
algmaps/pascal/humanFix
algmaps/pascal/gc
algmaps/pascal/dva
algmaps/pascal/ft
algmaps/pascal/sig
algmaps/pascal/aim
algmaps/pascal/pcas
algmaps/pascal/gbvs
algmaps/pascal/sun
algmaps/pascal/aws
algmaps/pascal/sf
algmaps/pascal/itti
algmaps/bruce
algmaps/bruce/dva
algmaps/bruce/sig
algmaps/bruce/aim
algmaps/bruce/gbvs
algmaps/bruce/sun
algmaps/bruce/aws
algmaps/bruce/itti
algmaps/cerf
algmaps/cerf/dva
algmaps/cerf/sig
algmaps/cerf/aim
algmaps/cerf/gbvs
algmaps/cerf/sun
algmaps/cerf/aws
algmaps/cerf/itti
algmaps/imgsal
algmaps/imgsal/humanFix
algmaps/imgsal/gc
algmaps/imgsal/cpmc_gbvs
algmaps/imgsal/dva
algmaps/imgsal/ft
algmaps/imgsal/sig
algmaps/imgsal/aim
algmaps/imgsal/pcas
algmaps/imgsal/gbvs
algmaps/imgsal/sun
algmaps/imgsal/aws
algmaps/imgsal/sf
algmaps/imgsal/itti
algmaps/ft
algmaps/ft/gc
algmaps/ft/cpmc_gbvs
algmaps/ft/dva
algmaps/ft/ft
algmaps/ft/sig
algmaps/ft/aim
algmaps/ft/pcas
algmaps/ft/gbvs
algmaps/ft/sun
algmaps/ft/aws
algmaps/ft/sf
algmaps/ft/itti
algmaps/judd
algmaps/judd/dva
algmaps/judd/sig
algmaps/judd/aim
algmaps/judd/gbvs
algmaps/judd/sun
algmaps/judd/aws
algmaps/judd/itti

```},
keywords= {PASCAL},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/6c49defd6f0e417c039637475cde638d1363037e</link>
</item>
<item>
<title>PASCAL-Context Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL-Context Dataset},
journal= {},
author= {UCLA CCVL},
year= {},
url= {},
abstract= {This dataset is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL semantic segmentation task by providing annotations for the whole scene. The statistics section has a full list of 400+ labels. Every pixel has a unique class label. Instance information (i.e, different masks to separate different instances of the same class in the same image) are currently provided for the 20 PASCAL objects.

Statistics
Since the dataset is an annotation of PASCAL VOC 2010, it has the same statistics as those of the original dataset. Training and validation contains 10,103 images while testing contains 9,637 images.

Usage Considerations
The classes are not drawn from a fixed pool. Instead labelers were free to either select or type in what they believe to be the appropriate class and to determine what the appropriate object granularity is. We decided to merge/split some of the categories so the current number of categories is different from what we mentioned in the CVPR 2014 paper.

When using this dataset it is important that you examine classes to ensure they match your intended use. For example, sand is often labeled independently despite also being considered ground. Those interested in ground may want to cluster sand and ground together along with other classes.

Citation
The Role of Context for Object Detection and Semantic Segmentation in the Wild
Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, Sanja Fidler, Raquel Urtasun, Alan Yuille
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014

Acknowledgements
We would like to acknowledge the support by Implementation of Technologies for Identification, Behavior, and Location of Human based on Sensor Network Fusion Program through the Korean Ministry of Trade, Industry and Energy (Grant Number: 10041629). We would also like to thank National Science Foundation for grant 1317376 (Visual Cortex on Silicon. NSF Expedition in Computing). We thank Viet Nguyen for coordinating and leading the efforts for cleaning up the annotations.},
keywords= {PASCAL},
terms= {}
}

</description>
<link>https://academictorrents.com/download/eec6177ad62f4c47086e4cbec93ac4c08857ddbe</link>
</item>
<item>
<title>PASCAL-Part Dataset (Dataset)</title>
<description>@article{,
title= {PASCAL-Part Dataset},
journal= {},
author= {UCLA CCVL},
year= {},
url= {},
abstract= {This dataset is a set of additional annotations for PASCAL VOC 2010. It goes beyond the original PASCAL object detection task by providing segmentation masks for each body part of the object. For categories that do not have a consistent set of parts (e.g., boat), we provide the silhouette annotation.

Statistics
Since the dataset is an annotation of the PASCAL VOC 2010, it has the same statistics as those of the original dataset. Training and validation contains 10,103 images while testing contains 9,637 images.

Usage Considerations
We provide segmentation masks for detailed body parts. One can merge several parts to get appropriate object part granularity for different tasks. For instance, "eyes", "ears", "nose", etc. can be merged into a single "head" part.

Citation
Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts
Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, Alan Yuille
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014

Acknowledgements
We thank Viet Nguyen for coordinating and leading the efforts for cleaning up the annotations. We would like to acknowledge the support by grants ARO 62250-CS and N00014-12-1-0883.
},
keywords= {PASCAL},
terms= {}
}

</description>
<link>https://academictorrents.com/download/f86670296bff85bcdffea6c4fc2e791446f9fb5e</link>
</item>
<item>
<title>Columbia University Image Library (COIL-20) (Dataset)</title>
<description>@article{,
title= {Columbia University Image Library (COIL-20)},
journal= {},
author= {S. A. Nene and S. K. Nayar and H. Murase},
year= {1996},
url= {http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php},
abstract= {To database is available in two versions. The first, [unprocessed], consists of images for five of the objects that contain both the object and the background. The second, [processed], contains images for all of the objects in which the background has been discarded (and the images consist of the smallest square that contains the object). For formal documentation look at the corresponding compressed technical report

"Columbia Object Image Library (COIL-20),"
S. A. Nene, S. K. Nayar and H. Murase,
Technical Report CUCS-005-96, February 1996.},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/1d16994c70b7fff8bfe917f83c397b1193daee7f</link>
</item>
<item>
<title>Stanford STL-10 Image Dataset (Dataset)</title>
<description>@article{,
title= {Stanford STL-10 Image Dataset},
journal= {},
author= {Adam Coates and Honglak Lee and Andrew Y. Ng},
year= {},
url= {https://cs.stanford.edu/~acoates/stl10/},
abstract= {![](https://cs.stanford.edu/~acoates/stl10/images.png)

The STL-10 dataset is an image recognition dataset for developing unsupervised feature learning, deep learning, self-taught learning algorithms. It is inspired by the CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image models prior to supervised training. The primary challenge is to make use of the unlabeled data (which comes from a similar but different distribution from the labeled data) to build a useful prior. We also expect that the higher resolution of this dataset (96x96) will make it a challenging benchmark for developing more scalable unsupervised learning methods.

Overview

10 classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck.
Images are 96x96 pixels, color.
500 training images (10 pre-defined folds), 800 test images per class.
100000 unlabeled images for unsupervised learning. These examples are extracted from a similar but broader distribution of images. For instance, it contains other types of animals (bears, rabbits, etc.) and vehicles (trains, buses, etc.) in addition to the ones in the labeled set.
Images were acquired from labeled examples on ImageNet.

Testing Protocol

We recommend the following standardized testing protocol for reporting results:
Perform unsupervised training on the unlabeled.
Perform supervised training on the labeled data using 10 (pre-defined) folds of 100 examples from the training data. The indices of the examples to be used for each fold are provided.
Report average accuracy on the full test set.
Download

Binary files, (Python code from Martin Tutek)
The binary files are split into data and label files with suffixes: train_X.bin, train_y.bin, test_X.bin and test_y.bin. Within each, the values are stored as tightly packed arrays of uint8's. The images are stored in column-major order, one channel at a time. That is, the first 96*96 values are the red channel, the next 96*96 are green, and the last are blue. The labels are in the range 1 to 10. The unlabeled dataset, unlabeled.bin, is in the same format, but there is no "_y.bin" file.
A class_names.txt file is included for reference, with one class name per line.
The file fold_indices.txt contains the (zero-based) indices of the examples to be used for each training fold. The first line contains the indices for the first fold, the second line, the second fold, and so on.
Thanks to Martin Tutek for the python code to load/view STL-10! Python code

Reference

* Please cite the following reference in papers using this dataset:

Adam Coates, Honglak Lee, Andrew Y. Ng An Analysis of Single Layer Networks in Unsupervised Feature Learning AISTATS, 2011.},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/a799a2845ac29a66c07cf74e2a2838b6c5698a6a</link>
</item>
<item>
<title>Georgia Tech face database (Dataset)</title>
<description>@article{,
title= {Georgia Tech face database},
journal= {},
author= {Ara V. Nefian},
year= {},
url= {http://www.anefian.com/research/face_reco.htm},
abstract= {Georgia Tech face database (128MB) contains images of 50 people taken in two or three sessions between 06/01/99 and 11/15/99 at the Center for Signal and Image Processing at Georgia Institute of Technology. All people in the database are represented by 15 color JPEG images with cluttered background taken at resolution 640x480 pixels. The average size of the faces in these images is 150x150 pixels. The pictures show frontal and/or tilted faces with different facial expressions, lighting conditions and scale. Each image is manually labeled to determine the position of the face in the image. The set of label files is available here. The Readme.txt file gives more details about the database.},
keywords= {Dataset},
terms= {}
}
</description>
<link>https://academictorrents.com/download/0848b2c9b40e49041eff85ac4a2da71ae13a3e4f</link>
</item>
<item>
<title>Columbia Object Image Library (COIL-100) (Dataset)</title>
<description>@article{,
title= {Columbia Object Image Library (COIL-100)},
journal= {},
author= {Sameer A. Nene and Shree K. Nayar and Hiroshi Murase},
year= {1996},
url= {http://www1.cs.columbia.edu/CAVE/software/softlib/coil-100.php},
abstract= {Columbia Object Image Library COIL is a database of color images of 100 objects. The objects were placed on a motorized turntable against a black background. The turntable was rotated through 360 degrees to vary object pose with respect to a fixed color camera. Images of the objects were taken at pose intervals of 5 degrees?. This corresponds to 72 poses per object. The images were size normalized. COIL-100 is available online via ftp.

"Columbia Object Image Library (COIL-100),"
S. A. Nene, S. K. Nayar and H. Murase,
Technical Report CUCS-006-96, February 1996.},
keywords= {Dataset},
terms= {}
}

</description>
<link>https://academictorrents.com/download/ce39e4554b2207c7764a58acf190dd3ccfa227e2</link>
</item>
<item>
<title>CIFAR-100 (Canadian Institute for Advanced Research) (Dataset)</title>
<description>@article{,
title= {CIFAR-100 (Canadian Institute for Advanced Research)},
journal= {},
author= {Alex Krizhevsky and Vinod Nair and Geoffrey Hinton},
year= {},
url= {http://www.cs.toronto.edu/~kriz/cifar.html},
abstract= {This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
Here is the list of classes in the CIFAR-100:

SuperclassClasses
aquatic mammalsbeaver, dolphin, otter, seal, whale
fishaquarium fish, flatfish, ray, shark, trout
flowersorchids, poppies, roses, sunflowers, tulips
food containersbottles, bowls, cans, cups, plates
fruit and vegetablesapples, mushrooms, oranges, pears, sweet peppers
household electrical devicesclock, computer keyboard, lamp, telephone, television
household furniturebed, chair, couch, table, wardrobe
insectsbee, beetle, butterfly, caterpillar, cockroach
large carnivoresbear, leopard, lion, tiger, wolf
large man-made outdoor thingsbridge, castle, house, road, skyscraper
large natural outdoor scenescloud, forest, mountain, plain, sea
large omnivores and herbivorescamel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammalsfox, porcupine, possum, raccoon, skunk
non-insect invertebratescrab, lobster, snail, spider, worm
peoplebaby, boy, girl, man, woman
reptilescrocodile, dinosaur, lizard, snake, turtle
small mammalshamster, mouse, rabbit, shrew, squirrel
treesmaple, oak, palm, pine, willow
vehicles 1bicycle, bus, motorcycle, pickup truck, train
vehicles 2lawn-mower, rocket, streetcar, tank, tractor

Yes, I know mushrooms aren't really fruit or vegetables and bears aren't really carnivores. },
keywords= {Dataset},
terms= {}
}
</description>
<link>https://academictorrents.com/download/9adb30144cf53809ec0613fa869b0a65b4e81ff5</link>
</item>
<item>
<title>CIFAR-10 (Canadian Institute for Advanced Research) (Dataset)</title>
<description>@article{,
title= {CIFAR-10 (Canadian Institute for Advanced Research)},
journal= {},
author= {Alex Krizhevsky and Vinod Nair and Geoffrey Hinton},
year= {},
url= {http://www.cs.toronto.edu/~kriz/cifar.html},
abstract= {The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. 

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class. },
keywords= {Dataset},
terms= {}
}

</description>
<link>https://academictorrents.com/download/463ba7ec7f37ed414c12fbb71ebf6431eada2d7a</link>
</item>
<item>
<title>Caltech256 Image Dataset (Dataset)</title>
<description>@article{,
title= {Caltech256 Image Dataset},
journal= {},
author= {Greg Griffin and Alex Holub and Pietro Perona},
year= {2006},
url= {http://www.vision.caltech.edu/Image_Datasets/Caltech256/},
abstract= {==Overview
256 Object Categories + Clutter
At least 80 images per category
30608 images instead of 9144

==Caltech-101: Drawbacks
Smallest category size is 31 images:
Too easy?
    left-right aligned
    Rotation artifacts
    Soon will saturate performance

==Caltech-256 : New Features  
Smallest category size now 80 images
Harder
    Not left-right aligned
    No artifacts
    Performance is halved
    More categories
New and larger clutter category

==Collection Procedure
Similar to Caltech-101 (Li, Fergus, Perona)

Four sorters rate the images
1 good: a clear example
2 bad: confusing, occluded, cluttered, or artistic
3 not applicable: object category not present

92,652 Images from Google and Picsearch
    32.1% were rated good and kept

Some images borrowed from 29 of the largest Caltech-101 categories (green)

==Acknowledgements
Rob Fergus and Fei Fei Li, Pierre Moreels for code and procedures developed for the Caltech-101 image set
Marco Ranzato and Claudio Fanti for miscellaneous help
Sorters: Lis Fano, Nick Lo, Julie May, Weiyu Xu for making this image set possible with their hard work

Please site as: Griffin, G. Holub, AD. Perona, P. The Caltech 256. Caltech Technical Report. The technical report will be available shortly.},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/7de9936b060525b6fa7f5d8aabd316637d677622</link>
</item>
<item>
<title>Caltech101 Image Dataset (Dataset)</title>
<description>@article{,
title= {Caltech101 Image Dataset},
journal= {},
author= {Fei-Fei Li and Marco Andreetto and Marc 'Aurelio Ranzato},
year= {2003},
url= {http://www.vision.caltech.edu/Image_Datasets/Caltech101/},
abstract= {==Description

Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. Collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato.  The size of each image is roughly 300 x 200 pixels.
We have carefully clicked outlines of each object in these pictures, these are included under the 'Annotations.tar'. There is also a matlab script to view the annotaitons, 'show_annotations.m'.
How to use the dataset

If you are using the Caltech 101 dataset for testing your recognition algorithm you should try and make your results comparable to the results of others. We suggest training and testing on fixed number of pictures and repeating the experiment with different random selections of pictures in order to obtain error bars. Popular number of training images: 1, 3, 5, 10, 15, 20, 30. Popular numbers of testing images: 20, 30. See also the discussion below.
When you report your results please keep track of which images you used and which were misclassified. We will soon publish a more detailed experimental protocol that allows you to report those details. See the Discussion section for more details.


==How to Reference this Dataset

We would appreciate it if you cite our works when using the dataset:
1. Images only:
L. Fei-Fei, R. Fergus and P. Perona. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. IEEE. CVPR 2004, Workshop on Generative-Model Based Vision. 2004

2. Images and annotations:
L. Fei-Fei, R. Fergus and P. Perona. One-Shot learning of object categories. IEEE Trans. Pattern Recognition and Machine Intelligence. In press.},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/410206b2624ab243b0fa87058f73927fc44a5b7c</link>
</item>
<item>
<title>MS Common Objects in Context (COCO2014) (Dataset)</title>
<description>@article{,
title= {MS Common Objects in Context (COCO2014)},
journal= {},
author= {Microsoft},
year= {2014},
url= {http://mscoco.org/},
abstract= {Microsoft COCO is a new image recognition, segmentation, and captioning dataset. Microsoft COCO has several features:

Object segmentation
Recognition in Context
Multiple objects per image
More than 300,000 images
More than 2 Million instances
80 object categories
5 captions per image


The 2014 Testing Images are for the MS COCO Captioning Challenge, while the 2015 Testing Images are for the MS COCO Detection Challenge. The train and val data are common to both challenges. Note also that as an alternative to downloading the large image zip files, individual images may be downloaded from the COCO website using the "coco_url" field specified in the image info struct.},
keywords= {},
terms= {You must agree to these terms: http://mscoco.org/terms_of_use/},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/f993c01f3c268b5d57219a38f8ec73ee7524421a</link>
</item>
<item>
<title>VQA: Visual Question Answering Dataset (Dataset)</title>
<description>@article{,
title= {VQA: Visual Question Answering Dataset},
journal= {},
author= {Stanislaw Antol and Aishwarya Agrawal and Jiasen Lu and Margaret Mitchell and Dhruv Batra and C. Lawrence Zitnick and Devi Parikh},
booktitle= {International Conference on Computer Vision (ICCV)},
year= {2015},
url= {http://visualqa.org/},
abstract= {254,721 images, 764,163 questions, 9,934,119 answers!

===What is VQA?
VQA is a new dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.
Over 250K images (MSCOCO and abstract scenes)
3 questions per image
10 ground truth answers per question
3 plausible (but likely incorrect) answers per question
Open-ended and multiple-choice answering tasks
Automatic evaluation metric

===Overview
For every image, we collected 3 free-form natural-language questions with 10 concise open-ended answers each. We provide two formats of the VQA task: open-ended and multiple-choice. For additional details, please see the VQA paper. 

The annotations we release are the result of the following post-processing steps on the raw crowdsourced data:
Spelling correction (using Bing Speller) of question and answer strings
Question normalization (first char uppercase, last char ???)
Answer normalization (all chars lowercase, no period except as decimal point, number words ?&gt; digits, strip articles (a, an the))
Adding apostrophe if a contraction is missing it (e.g., convert "dont" to "don't")

Please follow the instructions in the README to download and setup the VQA data (annotations and images).

==October 2015: Full release (v1.0)
Real Images
204,721 MSCOCO images 
(all of current train/val/test)
614,163 questions
6,141,630 ground truth answers
1,842,489 plausible answers
Abstract Scenes
50,000 abstract scenes
150,000 questions
1,500,000 ground truth answers
450,000 plausible answers
250,000 captions

==July 2015: Beta v0.9 release
123,287 MSCOCO images (all of train/val)
369,861 questions
3,698,610 ground truth answers
1,109,583 plausible answers

==June 2015: Beta v0.1 release
10,000 MSCOCO images (from train)
30,000 questions
300,000 ground truth answers
90,000 plausible answers
},
keywords= {deeplearning},
terms= {View the terms here: http://visualqa.org/terms.html}
}

</description>
<link>https://academictorrents.com/download/f075ad12eccbbd665aec68db5d208dc68e7a384f</link>
</item>
<item>
<title>Enwiki Word2vec model 1000 Dimensions (Dataset)</title>
<description>@article{,
title= {Enwiki Word2vec model 1000 Dimensions},
journal= {},
author= {Idio},
year= {2015},
url= {https://github.com/idio/wiki2vec},
license= {},
abstract= {Gensim Word2vec model built on the english wikipedia, 1000dimensions, 10cbow, no stemming},
keywords= {Wikipedia, nlp, word2vec, english, gensim, deeplearning, natural language, wiki},
terms= {}
}

</description>
<link>https://academictorrents.com/download/5d18911e7036870197bf5e23cf1be96d3353518a</link>
</item>
<item>
<title>Netflix Prize Data Set  (Dataset)</title>
<description>@article{,
title= {Netflix Prize Data Set },
journal= {},
author= {Netflix},
year= {2009},
url= {http://archive.ics.uci.edu/ml/datasets/Netflix+Prize},
license= {},
abstract= {This is the official data set used in the Netflix Prize competition. The data consists of about 100 million movie ratings, and the goal is to predict missing entries in the movie-user rating matrix.

|Attribute| Value|
|----|---|
| Data Set Characteristics:  | Multivariate, Time-Series      |
| Attribute Characteristics: | Integer                      |
| Associated Tasks:          | Clustering, Recommender-Systems |
| Number of Instances:       | 100480507                     |
| Number of Attributes:      | 17770                    |     
| Missing Values?            | Yes                           |
| Area:                      | N/A                                   |          


#Data Set Information:

This dataset was constructed to support participants in the Netflix Prize. 

There are over 480,000 customers in the dataset, each identified by a unique integer id. 

The title and release year for each movie is also provided. There are over 17,000 movies in the dataset, each identified by a unique integer id. 

The dataset contains over 100 million ratings. The ratings were collected between October 1998 and December 2005 and reflect the distribution of all ratings received during this period. Each rating has a customer id, a movie id, the date of the rating, and the value of the rating. 

As part of the original Netflix Prize a set of ratings was identified whose rating values were not provided in the original dataset. The object of the Prize was to accurately predict the ratings from this 'qualifying' set. These missing ratings are now available in the grand_prize.tar.gz dataset file.


#Attribute Information:

The format of the data is described fully in the README files contained in the dataset tar files. 


|Attribute| Value|
|-|-|
|MovieID: | Arbitrarily assigned unique integer in the range [1 .. 17770]. |
|CustomerID:  |Arbitrarily assigned unique integer in the range [1..2649429] (with gaps). |
|Rating:  |Number of 'stars' assigned to a movie by a customer; an integer from 1 to 5. |
|Title: | English language title of the movie on the Netflix website. |
|YearOfRelease:  |Year a movie was released in the range [1890..2005]. May correspond to the release of corresponding DVD, not necessarily its theaterical release. |
|Date: | Timestamp of a rating in the form YYYY-MM-DD, in the range 1998-11-01 to  2005-12-31. |
|NetflixID: | Integer ID of a movie as currently used in the Netflix developer API |

#Relevant Papers:
James Bennett and Stan Lanning. 'The Netflix Prize', 2007. 
http://rexa.info/paper/4755326FDAE3929649348DC380A46D3882A98198},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/9b13183dc4d60676b773c9e2cd6de5e5542cee9a</link>
</item>
<item>
<title>The Extended Yale Face Database B (Dataset)</title>
<description>@article{,
title = {The Extended Yale Face Database B},
journal = {},
author = {Yale},
year = {2001},
url = {http://vision.ucsd.edu/~iskwak/ExtYaleDatabase/ExtYaleB.html},
abstract = {The extended Yale Face Database B contains 16128 images of 28 human subjects under 9 poses and 64 illumination conditions. The data format of this database is the same as http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html the Yale Face Database B. Please refer to the homepage of the Yale Face Database B for more detailed information of the data format. 

You are free to use the extended Yale Face Database B for research purposes. All publications which use this database should acknowledge the use of "the Exteded Yale Face Database B" and reference Athinodoros Georghiades, Peter Belhumeur, and David Kriegman's paper, "From Few to Many: Illumination Cone Models for Face Recognition under 
Variable Lighting and Pose", PAMI, 2001. 

The extended database as opposed to the original Yale Face Database B with 10 subjects was first reported by Kuang-Chih Lee, Jeffrey Ho, and David Kriegman in "Acquiring Linear Subspaces for Face Recognition under Variable Lighting, PAMI, May, 2005 http://vision.ucsd.edu/~leekc/papers/9pltsIEEE.pdf All test image data used in the experiments are manually aligned, cropped, and then re-sized to 168x192 images.
If you publish your experimental results with the cropped images, please reference the PAMI2005 paper as well.}
}</description>
<link>https://academictorrents.com/download/06e479f338b56fa5948c40287b66f68236a14612</link>
</item>
<item>
<title>Yale YouTube Video Text (Dataset)</title>
<description>@article{,
title= {Yale YouTube Video Text},
journal= {},
author= {Yale},
year= {},
url= {http://vision.ucsd.edu/content/youtube-video-text},
abstract= {YouTube Video Text (YVT) contains 30 videos. Each video has 15-second length, 30 frames per second, HD 720p quality and was collected from YouTube. The text content in the dataset can be divided into two categories, overlay text (e.g., captions, songs title, logos) and scene text (e.g. street signs, business signs, words on shirt).},
keywords= {Dataset}
}

</description>
<link>https://academictorrents.com/download/156802226bcf5747e0bea4e4f14c03b3b952de80</link>
</item>
<item>
<title>MNIST Database (Dataset)</title>
<description>@article{,
title= {MNIST Database},
journal= {},
author= {Christopher J.C. Burges and Yann LeCun and Corinna Cortes },
year= {},
url= {http://yann.lecun.com/exdb/mnist/},
abstract= {The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

With some classification methods (particuarly template-based methods, such as SVM and K-nearest neighbors), the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications.

The MNIST database was constructed from NIST's Special Database 3 and Special Database 1 which contain binary images of handwritten digits. NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected among Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database by mixing NIST's datasets.

The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made sure that the sets of writers of the training set and test set were disjoint.

SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly 30,000 examples each. The new training set was completed with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,000 from SD-3) is available on this site. The full 60,000 sample training set is available.

Many methods have been tested with this training set and test set. Here are a few examples. Details about the methods are given in an upcoming paper. Some of those experiments used a version of the database where the input images where deskewed (by computing the principal axis of the shape that is closest to the vertical, and shifting the lines so as to make it vertical). In some other experiments, the training set was augmented with artificially distorted versions of the original training samples. The distortions are random combinations of shifts, scaling, skewing, and compression. 

FILE FORMATS FOR THE MNIST DATABASE

The data is stored in a very simple file format designed for storing vectors and multidimensional matrices. General info on this format is given at the end of this page, but you don't need to read that to use the data files.
All the integers in the files are stored in the MSB first (high endian) format used by most non-Intel processors. Users of Intel processors and other low-endian machines must flip the bytes of the header.

There are 4 files:

train-images-idx3-ubyte: training set images 
train-labels-idx1-ubyte: training set labels 
t10k-images-idx3-ubyte:  test set images 
t10k-labels-idx1-ubyte:  test set labels

The training set contains 60000 examples, and the test set 10000 examples.

The first 5000 examples of the test set are taken from the original NIST training set. The last 5000 are taken from the original NIST test set. The first 5000 are cleaner and easier than the last 5000.

TRAINING SET LABEL FILE (train-labels-idx1-ubyte):

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000801(2049) magic number (MSB first) 
0004     32 bit integer  60000            number of items 
0008     unsigned byte   ??               label 
0009     unsigned byte   ??               label 
........ 
xxxx     unsigned byte   ??               label
The labels values are 0 to 9.

TRAINING SET IMAGE FILE (train-images-idx3-ubyte):

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000803(2051) magic number 
0004     32 bit integer  60000            number of images 
0008     32 bit integer  28               number of rows 
0012     32 bit integer  28               number of columns 
0016     unsigned byte   ??               pixel 
0017     unsigned byte   ??               pixel 
........ 
xxxx     unsigned byte   ??               pixel
Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).

TEST SET LABEL FILE (t10k-labels-idx1-ubyte):

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000801(2049) magic number (MSB first) 
0004     32 bit integer  10000            number of items 
0008     unsigned byte   ??               label 
0009     unsigned byte   ??               label 
........ 
xxxx     unsigned byte   ??               label
The labels values are 0 to 9.

TEST SET IMAGE FILE (t10k-images-idx3-ubyte):

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000803(2051) magic number 
0004     32 bit integer  10000            number of images 
0008     32 bit integer  28               number of rows 
0012     32 bit integer  28               number of columns 
0016     unsigned byte   ??               pixel 
0017     unsigned byte   ??               pixel 
........ 
xxxx     unsigned byte   ??               pixel
Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black). 
  
THE IDX FILE FORMAT

the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.
The basic format is

magic number 
size in dimension 0 
size in dimension 1 
size in dimension 2 
..... 
size in dimension N 
data

The magic number is an integer (MSB first). The first 2 bytes are always 0.

The third byte codes the type of the data: 
0x08: unsigned byte 
0x09: signed byte 
0x0B: short (2 bytes) 
0x0C: int (4 bytes) 
0x0D: float (4 bytes) 
0x0E: double (8 bytes)

The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....

The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).

The data is stored like in a C array, i.e. the index in the last dimension changes the fastest. },
keywords= {mnist},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/ce990b28668abf16480b8b906640a6cd7e3b8b21</link>
</item>
<item>
<title>MPEG-7 Core Experiment CE-Shape-1 [tar.gz] (Dataset)</title>
<description>@article{,
title = {MPEG-7 Core Experiment CE-Shape-1 [tar.gz]},
journal = {},
author = {Richard Ralph},
year = {1999},
url = {http://www.dabi.temple.edu/~shape/MPEG7/dataset.html},
license = {},
abstract = {Here are the first shapes from each class in the MPEG-7 Core Experiment CE-Shape-1 Test Set.

MPEG-7 Core Experiment CE-Shape-1 [?] is a popular database for shape matching evaluation consisting of 70 shape categories, where each category is represented by 20 different images with high intra-class variability. The shapes are defined by a binary mask outlining the objects.
The evaluation protocol for this retrieval task is the bullseye rating, in which each image is used as reference and compared to all of the other images. The mean percentage of correct images in the top 40 matches (the 40 images with the lowest shape similarity values) is taken as bullseye rating.

The Latecki group maintains an overview of recent results here:
http://knight.cis.temple.edu/ shape/MPEG7/results.html.

Download MPEG-7 Core Experiment CE-Shape-1
http://www.cis.temple.edu/ latecki/TestData/mpeg7shapeB.tar.gz

Note: It raises interesting questions how to define the shape of an object, as there are very similar objects (apples and device9) in two categories, however the octopus category has much larger intra-class variances and is still the same category. 


$ tar -ztvf mpeg7shapeB.tar.gz 
-rwxrwxrwx  0 latecki users    1723 Nov 12  1999 original/Bone-1.gif
-rwxrwxrwx  0 latecki users    1819 Nov 12  1999 original/Bone-10.gif
-rwxrwxrwx  0 latecki users    1745 Nov 12  1999 original/Bone-11.gif
-rwxrwxrwx  0 latecki users    1738 Nov 12  1999 original/Bone-12.gif
-rwxrwxrwx  0 latecki users    1322 Nov 12  1999 original/Bone-13.gif
-rwxrwxrwx  0 latecki users    1720 Nov 12  1999 original/Bone-14.gif
-rwxrwxrwx  0 latecki users    1654 Nov 12  1999 original/Bone-15.gif
-rwxrwxrwx  0 latecki users    1759 Nov 12  1999 original/Bone-16.gif
-rwxrwxrwx  0 latecki users    1739 Nov 12  1999 original/Bone-17.gif
-rwxrwxrwx  0 latecki users    1489 Nov 12  1999 original/Bone-18.gif
-rwxrwxrwx  0 latecki users    1772 Nov 12  1999 original/Bone-19.gif
-rwxrwxrwx  0 latecki users    1714 Nov 12  1999 original/Bone-2.gif
-rwxrwxrwx  0 latecki users    1459 Nov 12  1999 original/Bone-20.gif
-rwxrwxrwx  0 latecki users    1759 Nov 12  1999 original/Bone-3.gif
...........
-rwxrwxrwx  0 latecki users    1664 Nov 12  1999 original/watch-17.gif
-rwxrwxrwx  0 latecki users    1873 Nov 12  1999 original/watch-18.gif
-rwxrwxrwx  0 latecki users    1881 Nov 12  1999 original/watch-19.gif
-rwxrwxrwx  0 latecki users    2720 Nov 12  1999 original/watch-2.gif
-rwxrwxrwx  0 latecki users    1889 Nov 12  1999 original/watch-20.gif
-rwxrwxrwx  0 latecki users    1699 Nov 12  1999 original/watch-3.gif
-rwxrwxrwx  0 latecki users    1757 Nov 12  1999 original/watch-4.gif
-rwxrwxrwx  0 latecki users    1802 Nov 12  1999 original/watch-5.gif
-rwxrwxrwx  0 latecki users    1765 Nov 12  1999 original/watch-6.gif
-rwxrwxrwx  0 latecki users    1840 Nov 12  1999 original/watch-7.gif
-rwxrwxrwx  0 latecki users    1927 Nov 12  1999 original/watch-8.gif
-rwxrwxrwx  0 latecki users    1719 Nov 12  1999 original/watch-9.gif}
}</description>
<link>https://academictorrents.com/download/0a8cb3446b0de5690fee29a2c68922ff691c7f9a</link>
</item>
<item>
<title>MPEG-7 Core Experiment CE-Shape-1 (Dataset)</title>
<description>@article{,
title = {MPEG-7 Core Experiment CE-Shape-1},
journal = {},
author = {Richard Ralph},
year = {1999},
url = {http://www.dabi.temple.edu/~shape/MPEG7/dataset.html},
license = {},
abstract = {Here are the first shapes from each class in the MPEG-7 Core Experiment CE-Shape-1 Test Set.

MPEG-7 Core Experiment CE-Shape-1 [?] is a popular database for shape matching evaluation consisting of 70 shape categories, where each category is represented by 20 different images with high intra-class variability. The shapes are defined by a binary mask outlining the objects.
The evaluation protocol for this retrieval task is the bullseye rating, in which each image is used as reference and compared to all of the other images. The mean percentage of correct images in the top 40 matches (the 40 images with the lowest shape similarity values) is taken as bullseye rating.

The Latecki group maintains an overview of recent results here:
http://knight.cis.temple.edu/ shape/MPEG7/results.html.

Download MPEG-7 Core Experiment CE-Shape-1
http://www.cis.temple.edu/ latecki/TestData/mpeg7shapeB.tar.gz

Note: It raises interesting questions how to define the shape of an object, as there are very similar objects (apples and device9) in two categories, however the octopus category has much larger intra-class variances and is still the same category. }
}</description>
<link>https://academictorrents.com/download/0f9ac75f2d9e2ce2ef7b800aa23882915f4e31fa</link>
</item>
<item>
<title>iCubWorld1.0 dataset (Dataset)</title>
<description>@article{,
title = {iCubWorld1.0 dataset},
journal = {},
author = {Istituto Italiano di Tecnologia},
year = {},
url = {http://www.iit.it/it/projects/data-sets.html},
abstract = {This is the first release of the iCubWorld dataset. It consists of seven instances of objects acquired in the two different modalities: human and robot as defined earlier. The size of the images is 320x240 subsequently cropped to the bounding box size according to the following:
Human mode: the bounding box is set to 80x80;
Robot mode: the bounding box is set to 160x160.  

The kinematics of the robot is known and used to position the bounding box. The independent motion detector method is used to position the bounding box in the human mode.

We provide 500 images per class during the training phase and 500 images per class for the testing phase.

Archive:  iCubWorld1.0.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2013-03-20 16:23   iCubWorld1.0/human/
        0  2013-03-20 16:32   iCubWorld1.0/human/test/
        0  2013-03-20 16:29   iCubWorld1.0/human/test/bottle/
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000000.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000001.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000002.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000003.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000004.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000005.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000006.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000007.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000008.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000009.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000010.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000011.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000012.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000013.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000014.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000015.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000016.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000017.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000018.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000019.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000020.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000021.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000022.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000023.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000024.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000025.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000026.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000027.ppm
    19213  2012-09-11 09:52   iCubWorld1.0/human/test/bottle/00000028.ppm
...
    76815  2012-09-09 16:27   iCubWorld1.0/robot/train/turtle/00003497.ppm
    76815  2012-09-09 16:27   iCubWorld1.0/robot/train/turtle/00003498.ppm
    76815  2012-09-09 16:27   iCubWorld1.0/robot/train/turtle/00003499.ppm
        0  2013-03-20 16:33   iCubWorld1.0/
---------                     -------
651282550                     14035 files
}
}</description>
<link>https://academictorrents.com/download/40bc001de97101552a2974ed880bafa377e055f5</link>
</item>
<item>
<title>SUFR ver1.3 2014 synthetic image datasets (Dataset)</title>
<description>@article{,
title= {SUFR ver1.3 2014 synthetic image datasets},
journal= {},
author= {Qianli Liao and Joel Z Leibo},
year= {2014},
url= {http://sufr2014.wordpress.com/download-sufr-2014/},
abstract= {![](https://sufr2014.files.wordpress.com/2014/01/cropped-sufr_examples_merge3.png)

##SUFR_ver1.3
Joel Z. Leibo, Qianli Liao, and Tomaso Poggio

##Contents: 
1. SUFR-W
2. SUFR

##Description:
This package contains SUFR-W, a dataset of "in the wild" natural images of faces gathered from the internet. The protocol used to create the dataset is described in Leibo, Liao and  Poggio (2014). It also contains the full set of SUFR synthetic datasets, called the "Subtasks of Unconstrained Face Recognition Challenge" in Leibo, Liao and Poggio (2014). 

##Details:

##SUFR-W

** SUFR_in_the_wild/SUFR_in_the_wild_info.mat
matlab struct "info" contains two fields:
- id :   the ID of the person depicted by each image
- name :  the name of the person depicted by each image

** SUFR_in_the_wild/SUFR_in_the_wild_info.txt
Contains the same information as SUFR_in_the_wild_info.mat, but in plain text

** SUFR_in_the_wild/splits_10_folds.mat
i. matlab struct "sufr_train_val_test_names" contains three fields:
train
val
test
Each field contains a 1x10 cell. The i-th element of a cell contains the names(ID) of i-th training/test/validation fold.
The "names" are from 1 to 400, they are actually the IDs of the people.

ii. matlab struct "sufr_train_val_test" contains three fields:
train
val
test
Each field contains a 1x10 cell. The i-th element of a cell contains the training/test/validation pairs (and labels) of the i-th fold.
The first two columns are the image indices of the training/test/validation pairs. The last column is the label. 1: same person, -1: different people.

** SUFR_in_the_wild/splits_10_folds_text
a folder contains the text version of SUFR_in_the_wild/splits_10_folds.mat

** Note: Similar to the protocol of LFW (Huang et al. 2007), use 10-fold cross validation. Training, validation, and test data are provided for each fold. They are not
 overlapping --- individual people appearing in the test set do not appear in the training or validation set. That is, if any image of person X appears in the training set, then no   images of person X will appear in the test set.


##SUFR

  Each dataset contains the following annotations:
**** Information is provided in two formats: .txt and .mat

** info.mat
a matlab struct contains:
-- sku: 3D model names we used to build the dataset.
-- id : object ID of each image
-- angle: rotation angle of each image
-- ilum: ilumination info
-- shift: translation
-- scale: size of the face
-- affine: the affine transformation matrix
-- background: background ID 

** info.txt
text version of info.mat

** bounding_box_info.txt
The bounding box of the face in each image

** splits.mat
-- sufr_train_test_sets
Training and testing pairs and labels
The first two columns are the image indices of the training/test/validation pairs. The last column is the label. 1: same person, -1: different people.

-- sufr_train_val_sets
Training and validation pairs and labels

-- sufr_train_val_test_names
Training, validation and testing IDs.
The IDs correspond to the "id" field in info.mat

** test.txt, test_names.txt, train.txt, train_names.txt, val.txt, val_names.txt
text version of splits.mat

** Note: given the large number of synthetic datasets, we do not require 10-fold cross-validation. The model should be developed only
 using training and validation sets. The test set should only be used once when reporting results.
 


##Version history:

- This is ver1.3 of SUFR-W.  
- The following two papers report results on slightly older versions of the dataset. The differences between versions are minor, a few label mistakes were corrected and slightly different training/test splits were used. 

 1.1:  Liao Q, Leibo JZ, Poggio T.  Learning invariant representations and applications to face verification (2013).  Advances in Neural Information Processing Systems (NIPS). Lake Tahoe, NV.
 1.2:  Liao Q, Leibo JZ, Mroueh Y, Poggio T. Can a biologically-plausible hierarchy effectively replace face detection, alignment, and recognition pipelines? (2013) arXiv:1311.4082, November 16, 2013.
 
- There is only one version of the synthetic datasts (this one).
 
- We do NOT anticipate any further changes to either SUFR-W or SUFR. This version (1.3) is the first to be publicly released, so going forward all reported results will be on version 1.3. 
 
 
 

 ##Reference
 
Please cite as:

Leibo J. Z., Liao Q., and Poggio T. Subtasks of Unconstrained Faces Recognition (2014). 9th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. (VISAPP). Lisbon, Portugal.  
  -- Available from: http://cbcl.mit.edu/publications/ps/Leibo_Liao_Poggio_VISAPP_2014.pdf
  -- Presentation available at: http://cbcl.mit.edu/publications/ps/Subtasks_Presentation_VISAPP2014.pdf 
  -- Bibtex at:  http://www.jzleibo.com/bio/subtasks
  
  
  

##Acknowledgment

This material is based upon work supported by the Center for Minds, Brains and Machines (CBMM), funded by NSF STC award CCF-1231216.
 
 },
keywords= {Dataset},
terms= {}
}

</description>
<link>https://academictorrents.com/download/032b2df1f6f0d75817b0f3af2af9bcdb3a415c37</link>
</item>
<item>
<title>Mnih Massachusetts Roads Dataset (Dataset)</title>
<description>@article{,
title = {Mnih Massachusetts Roads Dataset},
journal = {},
  year = {2013},
url = {http://www.cs.toronto.edu/~vmnih/data/},
abstract = {"The datasets introduced in Chapter 6 of my PhD thesis are below. See the thesis for more details." },
author = {Volodymyr Mnih},
}</description>
<link>https://academictorrents.com/download/3b17f08ed5027ea24db04f460b7894d913f86c21</link>
</item>
<item>
<title>Mnih Massachusetts Building Dataset (Dataset)</title>
<description>@article{,
title= {Mnih Massachusetts Building Dataset},
journal= {},
year= {2013},
url= {http://www.cs.toronto.edu/~vmnih/data/},
abstract= {"The datasets introduced in Chapter 6 of my PhD thesis are below. See the thesis for more details." },
author= {Volodymyr Mnih},
keywords= {},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/630d2c7e265af1d957cbee270f4328c54ccef333</link>
</item>
<item>
<title>Lerman Twitter 2010 Dataset (Dataset)</title>
<description>@article{,
title= {Lerman Twitter 2010 Dataset},
journal= {},
author= {Kristina Lerman },
year= {2010},
license= {This data is made available to the community for research purposes only},
url= {http://www.isi.edu/~lerman/downloads/twitter/twitter2010.html},
abstract= {Twitter_2010 data set contains tweets containing URLs that have been posted on Twitter during October 2010. In addition to tweets, we also the followee links of tweeting users, allowing us to reconstruct the follower graph of active (tweeting) users.
URLs66,059
tweets2,859,764
users736,930
links36,743,448
Tweets

Table (in csv format) link_status_search_with_ordering_real_csv contains tweets with the following information

link: URL within the text of the tweet
id: tweet id
create_at: date added to the db
create_at_long
inreplyto_screen_name: screen name of user this tweet is replying to
inreplyto_user_id: user id of user this tweet is replying to
source: device from which the tweet originated
bad_user_id: alternate user id
user_screen_name: tweeting user screen name
order_of_users: tweet's index within sequence of tweets of the same URL
user_id: user id
Table (in csv format) distinct_users_from_search_table_real_map contains names of tweeting users, and the following information for each user:

user_id: user id
user_screen_name: user name
indegree: number of followers
outdegree: number of friends/followees
bad_user_id: alternate user id
Follower graph

File active_follower_real_sql contains zipped SQL dump of links between tweeting users in the form:

user_id: user id
follower_id: user id of the follower
Empirical characterization of this data is described in 
Kristina Lerman, Rumi Ghosh, Tawan Surachawala (2012) "Social Contagion: An Empirical Study of Information Spread on Digg and Twitter Follower Graphs." This data is made available to the community for research purposes only. If you use the data in a publication, please cite the above paper.},
keywords= {twitter},
terms= {}
}

</description>
<link>https://academictorrents.com/download/d8b3a315172c8d804528762f37fa67db14577cdb</link>
</item>
<item>
<title>Cat Annotation Dataset Merged (Dataset)</title>
<description>@article{,
title= {Cat Annotation Dataset Merged},
journal= {},
author= {Weiwei Zhang and Jian Sun and Xiaoou Tang},
year= {2008},
url= {http://137.189.35.203/WebUI/CatDatabase/catData.html},
license= {The CAT dataset is only for research purposes, we do not have any copyright of the images.},
abstract= {# Cat Annotation Dataset
The CAT dataset includes 10,000 cat images. For each image, we annotate the head of cat with nine points, two for eyes, one for mouth, and six for ears. The detail configuration of the annotation was shown in Figure 6 of the original paper:

Weiwei Zhang, Jian Sun, and Xiaoou Tang, "Cat Head Detection - How to Effectively Exploit Shape and Texture Features", Proc. of European Conf. Computer Vision, vol. 4, pp.802-816, 2008.

### Format

The annotation data are stored in a file with the name of the corresponding cat image plus ".cat", one annotation file for each cat image. For each annotation file, the annotation data are stored in the following sequence:

 1.  Number of points (always 9)
 2.  Left Eye
 3.  Right Eye
 4.  Mouth
 5.  Left Ear-1
 6.  Left Ear-2
 7.  Left Ear-3
 8.  Right Ear-1
 9.  Right Ear-2
 10. Right Ear-3

### Training, Validation, and Testing
We randomly divide the data into three sets: 5,000 images for training, 2,000 images for validation and 3000 images for testing.

![](https://i.imgur.com/TKEV2Ov.jpg)

},
keywords= {cats},
terms= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/c501571c29d16d7f41d159d699d0e7fb37092cbd</link>
</item>
<item>
<title>Twitter Data - NIPS 2012 (Dataset)</title>
<description>@article{,
title= {Twitter Data - NIPS 2012},
journal= {},
author= {J. McAuley and J. Leskovec},
year= {},
url= {http://snap.stanford.edu/data/egonets-Twitter.html},
license= {},
abstract= {This dataset consists of 'circles' (or 'lists') from Twitter. Twitter data was crawled from public sources. The dataset includes node features (profiles), circles, and ego networks.


##Dataset statistics

|Attribute|Value|
|---------|-------|
|Nodes|81306|
|Edges|1768149|
|Nodes in largest WCC|81306 (1.000)|
|Edges in largest WCC|1768149 (1.000)|
|Nodes in largest SCC|68413 (0.841)|
|Edges in largest SCC|1685163 (0.953)|
|Average clustering coefficient|0.5653|
|Number of triangles|13082506|
|Fraction of closed triangles|0.06415|
|Diameter (longest shortest path)|7|
|90-percentile effective diameter|4.5|

##Source (citation)

J. McAuley and J. Leskovec. Learning to Discover Social Circles in Ego Networks. NIPS, 2012.

##Files:

|Attribute|Value|
|---------|-------|
|nodeId.edges |The edges in the ego network for the node 'nodeId'. Edges are undirected for facebook, and directed (a follows b) for twitter and gplus. The 'ego' node does not appear, but it is assumed that they follow every node id that appears in this file.|
|nodeId.circles |The set of circles for the ego node. Each line contains one circle, consisting of a series of node ids. The first entry in each line is the name of the circle.|
|nodeId.feat |The features for each of the nodes that appears in the edge file.|
|nodeId.egofeat |The features for the ego user.|
|nodeId.featnames |The names of each of the feature dimensions. Features are '1' if the user has this property in their profile, and '0' otherwise. This file has been anonymized for facebook users, since the names of the features would reveal private data.|},
keywords= {twitter, social networks, NIPS},
terms= {}
}

</description>
<link>https://academictorrents.com/download/046cf7a75db2a530b1505a4ce125fbe0031f4661</link>
</item>
<item>
<title>Arizona State University Twitter Data Set  (Dataset)</title>
<description>@article{,
title= {Arizona State University Twitter Data Set },
journal= {},
author= {R. Zafarani and H. Liu},
year= {2009},
institution= {Arizona State University, School of Computing, Informatics and Decision Systems Engineering},
url= {http://socialcomputing.asu.edu/datasets/Twitter},
abstract= {Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one neat and simple package. It's a new and easy way to discover the latest news related to subjects you care about.

|Attribute|Value|
|-|-|
|Number of Nodes: |11316811|
|Number of Edges: |85331846|
|Missing Values? |no|
|Source:| N/A|

##Data Set Information:

1. nodes.csv
-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains
all the node ids used in the dataset

2. edges.csv
-- this is the friendship/followership network among the users. The friends/followers are represented using edges. Edges are directed. 

Here is an example. 

1,2

This means user with id "1" is followering user with id "2".


##Attribute Information:

Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one neat and simple package. It's a new and easy way to discover the latest news related to subjects you care about.},
keywords= {ASU, Twitter, Social, Graph},
terms= {}
}

</description>
<link>https://academictorrents.com/download/2399616d26eeb4ae9ac3d05c7fdd98958299efa9</link>
</item>
<item>
<title>Visual Object Classes Challenge 2012 Dataset (VOC2012) VOCtrainval_11-May-2012.tar (Dataset)</title>
<description>@article{,
title= {Visual Object Classes Challenge 2012 Dataset (VOC2012) VOCtrainval_11-May-2012.tar},
journal= {},
author= {Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
year= {2012},
url= {http://host.robots.ox.ac.uk/pascal/VOC/voc2012/},
abstract= {##Introduction
The main goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

* Person: person
* Animal: bird, cat, cow, dog, horse, sheep
* Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
* Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

There are three main object recognition competitions: classification, detection, and segmentation, a competition on action classification, and a competition on large scale recognition run by ImageNet. In addition there is a "taster" competition on person layout.

##Classification/Detection Competitions

Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image.
Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image.
 
20 classes
![](http://i.imgur.com/WmLRN4p.png)

* aeroplane
* bicycle
* bird
* boat
* bottle
* bus
* car
* cat
* chair
* cow
* dining table
* dog
* horse
* motorbike
* person
* potted plant
* sheep
* sofa
* train
* tv/monitor
 
Participants may enter either (or both) of these competitions, and can choose to tackle any (or all) of the twenty object classes. The challenge allows for two approaches to each of the competitions:

1. Participants may use systems built or trained using any methods or data excluding the provided test sets.
2. Systems are to be built or trained using only the provided training/validation data.

The intention in the first case is to establish just what level of success can currently be achieved on these problems and by what method; in the second case the intention is to establish which method is most successful given a specified training set.





Segmentation Competition

Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise.

![](https://i.imgur.com/ek0NbVK.png)
 
##Action Classification Competition

Action Classification: Predicting the action(s) being performed by a person in a still image.
 
![](https://i.imgur.com/w8tr9hs.png)

* jumping
* phoning
* playinginstrument
* reading
* ridingbike
* ridinghorse
* running
* takingphoto
* usingcomputer
* walking
 
In 2012 there are two variations of this competition, depending on how the person whose actions are to be classified is identified in a test image: (i) by a tight bounding box around the person; (ii) by only a single point located somewhere on the body. The latter competition aims to investigate the performance of methods given only approximate localization of a person, as might be the output from a generic person detector.

##ImageNet Large Scale Visual Recognition Competition

The goal of this competition is to estimate the content of photographs for the purpose of retrieval and automatic annotation using a subset of the large hand-labeled ImageNet dataset (10,000,000 labeled images depicting 10,000+ object categories) as training. Test images will be presented with no initial annotation - no segmentation or labels - and algorithms will have to produce labelings specifying what objects are present in the images. In this initial version of the challenge, the goal is only to identify the main objects present in images, not to specify the location of objects.

Further details can be found at the ImageNet website.

##Person Layout Taster Competition
Person Layout: Predicting the bounding box and label of each part of a person (head, hands, feet).
 
![](https://i.imgur.com/Hphaauf.png)

##Data

To download the training/validation data, see the development kit.

The training data provided consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the twenty classes present in the image. Note that multiple objects from multiple classes may be present in the same image. Annotation was performed according to a set of guidelines distributed to all annotators.

A subset of images are also annotated with pixel-wise segmentation of each object present, to support the segmentation competition.

Images for the action classification task are disjoint from those of the classification/detection/segmentation tasks. They have been partially annotated with people, bounding boxes, reference points and their actions. Annotation was performed according to a set of guidelines distributed to all annotators.

Images for the person layout taster, where the test set is disjoint from the main tasks, have been additionally annotated with parts of the people (head/hands/feet).

The data will be made available in two stages; in the first stage, a development kit will be released consisting of training and validation data, plus evaluation software (written in MATLAB). One purpose of the validation set is to demonstrate how the evaluation software works ahead of the competition submission.

In the second stage, the test set will be made available for the actual competition. As in the VOC2008-2011 challenges, no ground truth for the test data will be released.

The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. Statistics of the database are online.
},
keywords= {VOC},
terms= {}
}
</description>
<link>https://academictorrents.com/download/df0aad374e63b3214ef9e92e178580ce27570e59</link>
</item>
<item>
<title>CVPR Indoor Scene Recognition (Dataset)</title>
<description>@article{,
title= {CVPR Indoor Scene Recognition},
journal= {},
author= {A. Quattoni and A.Torralba},
year= {},
url= {http://web.mit.edu/torralba/www/indoor.html},
abstract= {![](http://web.mit.edu/torralba/www/allIndoors.jpg)

Indoor scene recognition is a challenging open problem in high level vision. Most scene recognition models that work well for outdoor scenes perform poorly in the indoor domain. The main difficulty is that while some indoor scenes (e.g. corridors) can be well characterized by global spatial properties, others (e.g., bookstores) are better characterized by the objects they contain. More generally, to address the indoor scenes recognition problem we need a model that can exploit local and global discriminative information.
 
##Database
The database contains 67 Indoor categories, and a total of 15620 images. The number of images varies across categories, but there are at least 100 images per category. All images are in jpg format. The images provided here are for research purposes only.

##Evaluation
For the results in the paper we use a subset of the dataset that has the same number of training and testing samples per class. The partition that we use is:

TrainImages.txt: contains the file names of each training image. Total 67*80 images

TestImages.txt: contains the file names of each test image. Total 67*20 images

##Annotations

A subset of the images are segmented and annotated with the objects that they contain. The annotations are in LabelMe format.

 
##Paper
A. Quattoni, and A.Torralba. Recognizing Indoor Scenes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
 
##Acknowledgments
Thanks to Aude Oliva for helping to create the database of indoor scenes.
Funding for this research was provided by NSF Career award (IIS 0747120)},
keywords= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/59aa0ad684e5d849f68bad9a6d43a9000a927164</link>
</item>
</channel>
</rss>
