Whale Shark ID Dataset
Wild Me

Type: Dataset
Tags: coco, identification, wildlife, whale shark

title= {Whale Shark ID Dataset},
journal= {},
author= {Wild Me},
year= {2020},
url= {https://www.wildme.org},
abstract= {Our released whale shark (Rhincodon typus) data set represents a collaborative effort based on the data collection and population modeling efforts conducted at Ningaloo Marine Park in Western Australia from 1995-2008 (Holmberg et al. 2008, 2009). Photos (7888) and metadata from 2441 whale shark encounters were collected from 464 individual contributors, especially from the original research of Brad Norman and from members of the local whale shark tourism industry who sight these animals annually from April-June. Images were annotated with bounding boxes around each visible whale shark and viewpoints labeled (e.g., left, right, etc.). A total of 543 individual whale sharks were identified by their unique spot patterning using first computer-assisted spot pattern recognition (Arzoumanian et al. 2005) and then manual review and confirmation.  A total of 7,693 named sightings were exported.

The dataset is released in the Microsoft COCO format (https://cocodataset.org/) and therefore uses flat image folders with associated YAML metadata files. We have collapsed the entire dataset into a single "train" label and have left "val" and "test" empty; we do this as an invitation to researchers to experiment with their own novel approaches for dealing with the unbalanced and chaotic distribution on the number of sightings per individual.  All of the images in the dataset have been resized to have a maximum linear dimension of 3,000 pixels.  The metadata for all animal sightings is defined by an axis-aligned bounding box via and includes information on the rotation of the box (theta), the viewpoint of the animal, a species (category) ID, a source image ID, an individual string ID name, and other miscellaneous values.  The temporal ordering of the images, and an anonymized ID for the original photographer, can be determined from the metadata for each image.

For research or press contact, please direct all correspondence to Wild Me at info@wildme.org.  Wild Me (https://www.wildme.org) is a registered 501(c)(3) not-for-profit based in Portland, Oregon, USA and brings state-of-the-art computer vision tools to ecology researchers working around the globe on wildlife conservation.

Direct download mirror: https://wildbookiarepository.azureedge.net/datasets/whaleshark.coco.tar.gz},
keywords= {wildlife, identification, whale shark, coco},
terms= {Use of this dataset in scientific research must provide attribution under the CDLA-Permissive License (version 1.0) and must also cite the original research publication: 

  title={Estimating population size, structure, and residency time for whale sharks Rhincodon typus through collaborative photo-identification},
  author={Holmberg, Jason and Norman, Bradley and Arzoumanian, Zaven},
  journal={Endangered Species Research},
license= {Community Data License Agreement – Permissive – Version 1.0 (https://cdla.io/permissive-1-0/)},
superseded= {}

10 day statistics (4 downloads)

Average Time 45 mins, 38 secs
Average Speed 2.36MB/s
Best Time 7 mins, 35 secs
Best Speed 14.21MB/s
Worst Time 2 hrs, 26 mins, 41 secs
Worst Speed 734.70kB/s