Visual Object Classes Challenge 2012 Dataset (VOC2012) VOCtrainval_11-May-2012.tar
Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.

Type: Dataset
Tags: VOC

title= {Visual Object Classes Challenge 2012 Dataset (VOC2012) VOCtrainval_11-May-2012.tar},
journal= {},
author= {Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.},
year= {2012},
url= {},
abstract= {##Introduction
The main goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). It is fundamentally a supervised learning learning problem in that a training set of labelled images is provided. The twenty object classes that have been selected are:

* Person: person
* Animal: bird, cat, cow, dog, horse, sheep
* Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train
* Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor

There are three main object recognition competitions: classification, detection, and segmentation, a competition on action classification, and a competition on large scale recognition run by ImageNet. In addition there is a "taster" competition on person layout.

##Classification/Detection Competitions

Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image.
Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image.
20 classes

* aeroplane
* bicycle	
* bird	
* boat	
* bottle	
* bus	
* car	
* cat	
* chair
* cow
* dining table	
* dog	
* horse	
* motorbike
* 	person	
* potted plant
* sheep
* sofa
* train	
* tv/monitor
Participants may enter either (or both) of these competitions, and can choose to tackle any (or all) of the twenty object classes. The challenge allows for two approaches to each of the competitions:

1. Participants may use systems built or trained using any methods or data excluding the provided test sets.
2. Systems are to be built or trained using only the provided training/validation data.

The intention in the first case is to establish just what level of success can currently be achieved on these problems and by what method; in the second case the intention is to establish which method is most successful given a specified training set.

Segmentation Competition

Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise.

##Action Classification Competition

Action Classification: Predicting the action(s) being performed by a person in a still image.

* jumping	
* phoning
* playinginstrument	
* reading	
* ridingbike
* ridinghorse	
* running	
* takingphoto	
* usingcomputer	
* walking
In 2012 there are two variations of this competition, depending on how the person whose actions are to be classified is identified in a test image: (i) by a tight bounding box around the person; (ii) by only a single point located somewhere on the body. The latter competition aims to investigate the performance of methods given only approximate localization of a person, as might be the output from a generic person detector.

##ImageNet Large Scale Visual Recognition Competition

The goal of this competition is to estimate the content of photographs for the purpose of retrieval and automatic annotation using a subset of the large hand-labeled ImageNet dataset (10,000,000 labeled images depicting 10,000+ object categories) as training. Test images will be presented with no initial annotation - no segmentation or labels - and algorithms will have to produce labelings specifying what objects are present in the images. In this initial version of the challenge, the goal is only to identify the main objects present in images, not to specify the location of objects.

Further details can be found at the ImageNet website.

##Person Layout Taster Competition
Person Layout: Predicting the bounding box and label of each part of a person (head, hands, feet).

To download the training/validation data, see the development kit.

The training data provided consists of a set of images; each image has an annotation file giving a bounding box and object class label for each object in one of the twenty classes present in the image. Note that multiple objects from multiple classes may be present in the same image. Annotation was performed according to a set of guidelines distributed to all annotators.

A subset of images are also annotated with pixel-wise segmentation of each object present, to support the segmentation competition.

Images for the action classification task are disjoint from those of the classification/detection/segmentation tasks. They have been partially annotated with people, bounding boxes, reference points and their actions. Annotation was performed according to a set of guidelines distributed to all annotators.

Images for the person layout taster, where the test set is disjoint from the main tasks, have been additionally annotated with parts of the people (head/hands/feet).

The data will be made available in two stages; in the first stage, a development kit will be released consisting of training and validation data, plus evaluation software (written in MATLAB). One purpose of the validation set is to demonstrate how the evaluation software works ahead of the competition submission.

In the second stage, the test set will be made available for the actual competition. As in the VOC2008-2011 challenges, no ground truth for the test data will be released.

The data has been split into 50% for training/validation and 50% for testing. The distributions of images and objects by class are approximately equal across the training/validation and test sets. Statistics of the database are online.
keywords= {VOC},
terms= {}

10 day statistics (2 downloads)

Average Time 21 mins, 16 secs
Average Speed 1.57MB/s
Best Time 5 mins, 01 secs
Best Speed 6.64MB/s
Worst Time 37 mins, 32 secs
Worst Speed 887.94kB/s