ImperialAIEchocardiographyDataset_2020-12-05
Unity Imaging Collaborative

folder ImperialAIEchocardiographyDataset_2020-12-05 (2 files)
filelabels.zip 6.29MB
filepng-cache.zip 1.27GB
Type: Dataset
Tags:

Bibtex:
@article{,
title= {ImperialAIEchocardiographyDataset_2020-12-05},
keywords= {},
author= {Unity Imaging Collaborative},
abstract= {This is the latest versions of the datasets and code. They are constantly being added to. The code lives on github.

Download 2020-12-05 release:

Unity Imaging Echocardiography Model Development Dataset Images: Download
Unity Imaging Echocardiography Model Development Dataset Labels: Download
Unity Imaging Code: https://github.com/UnityImaging
For reproducibility, specific snapshots of the datasets and code used for publication are below.




Images - png-cache.zip
1) We curate a collection of DICOM files that will contribute to a dataset.

2) Each DICOM file is assigned to a dataset class - currently there are two

01 - development - training / tuning / internal validation images
02 - external validation images
3) Each DICOM file is given a 64 character hexadecimal code, e.g. 4d44413619e0161c5ab795bc1b899f7fb4bd0b2f5ab2efc881ecfc663d3bfb66

4) Each image within a DICOM (typically an individual frame for echo) gets given a number padded to 4 digits, starting from 0000 and going to 9999.

5) These images are extracted from the DICOM file, burnt-in meta-data masked, and saved as a png with their code as a filename - e.g. 01-4d44413619e0161c5ab795bc1b899f7fb4bd0b2f5ab2efc881ecfc663d3bfb66-0000.png

6) The individual images that make up a dataset for a paper are saved in a folder called png-cache, with sub directories for the dataset class (e.g. /01) and then the first two pairs of hexadecimal digits (e.g. /4d/44), i.e. /png-cache/01/4d/44/4d44413619e0161c5ab795bc1b899f7fb4bd0b2f5ab2efc881ecfc663d3bfb66-0000.png

7) This folder is then compressed to form png-cache.zip

Not all files may have an associated label - e.g. all the frames of a video may be included, but only a few of them have expert labels

Labels - labels.zip
These are stored as JSON files. The development dataset (provided as labels-all.json) is divided up into:

labels-train.json - training
labels-tune.json - tuning
labels-ival.json - internal validation
For each image file (which acts as the key), there is a dictionary for every possible label. Each label for an image may have a type of:

"off": the structure is definitely not in the image - i.e the outputs would be expected to be all zeros
"blurred": the structure is might be in the image, but there is no label available (either it was too blurry, or no one has tried to label it) - i.e the output would need to be masked from the loss function
"point": the structure is a single point, with the x and y coordinate from the x and y keys
"curve": the structure is a curve, repreesnted as a cubic spline, with the x and y coordinates of the control points in the x and y keys
For convenience each of the .json files have an equivalent .txt file with a list of the contained images.},
terms= {},
license= {https://creativecommons.org/licenses/by-nc-nd/4.0/},
superseded= {},
url= {https://data.unityimaging.net/}
}


Send Feedback