DeepLesion (10,594 CT scans with lesions)
Ke Yan (National Institutes of Health Clinical Center)

folder DeepLesion (59 files)
fileDL_info.csv 8.48MB
fileImages_png/Images_png_01.zip 4.30GB
fileImages_png/Images_png_02.zip 4.30GB
fileImages_png/Images_png_03.zip 4.29GB
fileImages_png/Images_png_04.zip 4.30GB
fileImages_png/Images_png_05.zip 4.30GB
fileImages_png/Images_png_06.zip 4.30GB
fileImages_png/Images_png_07.zip 4.30GB
fileImages_png/Images_png_08.zip 4.31GB
fileImages_png/Images_png_09.zip 4.30GB
fileImages_png/Images_png_10.zip 4.30GB
fileImages_png/Images_png_11.zip 4.31GB
fileImages_png/Images_png_12.zip 4.30GB
fileImages_png/Images_png_13.zip 4.33GB
fileImages_png/Images_png_14.zip 4.29GB
fileImages_png/Images_png_15.zip 4.29GB
fileImages_png/Images_png_16.zip 4.31GB
fileImages_png/Images_png_17.zip 4.30GB
fileImages_png/Images_png_18.zip 4.32GB
fileImages_png/Images_png_19.zip 4.30GB
fileImages_png/Images_png_20.zip 4.30GB
fileImages_png/Images_png_21.zip 4.32GB
fileImages_png/Images_png_22.zip 4.30GB
fileImages_png/Images_png_23.zip 4.31GB
fileImages_png/Images_png_24.zip 4.30GB
fileImages_png/Images_png_25.zip 4.30GB
fileImages_png/Images_png_26.zip 4.30GB
fileImages_png/Images_png_27.zip 4.30GB
fileImages_png/Images_png_28.zip 4.30GB
fileImages_png/Images_png_29.zip 4.31GB
fileImages_png/Images_png_30.zip 4.30GB
fileImages_png/Images_png_31.zip 4.32GB
fileImages_png/Images_png_32.zip 4.33GB
fileImages_png/Images_png_33.zip 4.30GB
fileImages_png/Images_png_34.zip 4.30GB
fileImages_png/Images_png_35.zip 4.30GB
fileImages_png/Images_png_36.zip 4.30GB
fileImages_png/Images_png_37.zip 4.30GB
fileImages_png/Images_png_38.zip 4.31GB
fileImages_png/Images_png_39.zip 4.31GB
fileImages_png/Images_png_40.zip 4.32GB
fileImages_png/Images_png_41.zip 4.30GB
fileImages_png/Images_png_42.zip 4.31GB
fileImages_png/Images_png_43.zip 4.33GB
fileImages_png/Images_png_44.zip 4.33GB
fileImages_png/Images_png_45.zip 4.29GB
fileImages_png/Images_png_46.zip 4.31GB
fileImages_png/Images_png_47.zip 4.30GB
fileImages_png/Images_png_48.zip 4.30GB
Too many files! Click here to view them all.
Type: Dataset
Tags:

Bibtex:
@article{,
title= {DeepLesion (10,594 CT scans with lesions)},
keywords= {},
author= {Ke Yan (National Institutes of Health Clinical Center)},
abstract= {## Introduction

The DeepLesion dataset contains 32,120 axial computed tomography (CT) slices from 10,594 CT
scans (studies) of 4,427 unique patients. There are 1–3 lesions in each image with accompanying
bounding boxes and size measurements, adding up to 32,735 lesions altogether. The lesion
annotations were mined from NIH’s picture archiving and communication system (PACS). Some
meta-data are also provided. The contents include:
 - Folder “Images\_png”: png image files. We named each slice with the format “{patient
index}\_{study index}\_{series index}\_{slice index}.png”, with the last underscore being / or \
to indicate sub-folders. The images are stored in unsigned 16 bit. One should subtract 32768
from the pixel intensity to obtain the original Hounsfield unit (HU) values.
 We provide not only the key CT slice that contains the lesion annotation, but also its 3D
context (30mm extra slices above and below the key slice). Due to the large size of the data
and the file size limit of the website, we packed them to 56 smaller zip files for downloading.
 - Key_slices.zip: key slices with overlaid lesion annotations for review purposes.
 - Folder “Key_slice_examples”: random image examples chosen from Key_slices.zip.
 - DL_info.csv: The annotations and meta-data. See Section “Annotations” below.

## Reference

Ke Yan, Xiaosong Wang, Le Lu, Ronald M. Summers, "DeepLesion: Automated Mining of
Large-Scale Lesion Annotations and Universal Lesion Detection with Deep Learning", Journal
of Medical Imaging 5(3), 036501 (2018), doi: 10.1117/1.JMI.5.3.036501



## Annotations
In DL_info.csv, each row is the information of a lesion in DeepLesion. The meaning of the columns
are:
1. File name. Please replace the last underscore with / or \ to indicate sub-folders.
2. Patient index starting from 1.
3. Study index for each patient starting from 1. There are 1~26 studies for each patient.
4. Series ID.
5. Slice index of the key slice containing the lesion annotation, starting from 1.
6. 8D vector, the image coordinates (in pixel) of the two RECIST diameters of the lesion. [x11,
y11, x12, y12, x21, y21, x22, y22]. The first 4 coordinates are for the long axis. Please see our paper
and its supplementary material for further explanation.
7. 4D vector, the bounding-box [x1, y1, x2, y2] of the lesion (in pixel) estimated from the RECIST diameters, see our paper
8. 2D vector, the lengths of the long and short axes. The unit is pixels.
9. The relative body position of the center of the lesion. The z-coordinates were predicted by the
self-supervised body part regressor. See our paper for details. The coordinates are approximate
and just for reference.
10. The type of the lesion. Types 1~8 correspond to bone, abdomen, mediastinum, liver, lung,
kidney, soft tissue, and pelvis, respectively. See our paper for details. The lesion types are
coarsely defined and just for reference. Only the lesions in the val and test sets were annotated
with others denoted as -1.
11. This field is set to 1 if the annotation of this lesion is possibly noisy according to manual check.
We found 35 noisy annotations out of 32,735 till now.
12. Slice range. Context slices neighboring to the key slice were provided in this dataset. For
example, in the first lesion, the key slice is 109 and the slice range is 103~115, meaning that
slices 103~115 are provided. For most lesions, we provide 30mm extra slices above and below
the key slice, unless the long axis of the lesion is larger than this thickness (then we provide
more) or the beginning or end of the volume is reached.
13. Spacing (mm per pixel) of the x, y, and z axes. The 3rd value is the slice interval, or the physical
distance between two slices.
14. Image size.
15. The windowing (min~max) in Hounsfield unit extracted from the original DICOM file.
16. Patient gender. F for female and M for male.
17. Patient age.
18. Official randomly generated patient-level data split, train=1, validation=2, test=3.

## Applications
DeepLesion is a large-scale dataset that contains a variety types of lesions. It can be used for lesion
detection, classification, segmentation, retrieval, measurement, growth analysis, relationship mining
between different lesions, etc.
Limitations

Since DeepLesion was mined from PACS, it has a few limitations:
 - DeepLesion contains only 2D diameter measurements and bounding-boxes of lesions. It has no lesion segmentation masks, 3D bounding-boxes, or fine-grained lesion types. Therefore,
some applications (e.g. lesion segmentation) may need extra manual annotations.
 - Not all lesions were annotated in the images. Radiologists typically mark only representative
lesions in each study. Therefore, some lesions remain unannotated.
 - According to manual examination, although most bookmarks represent abnormal findings or
lesions, a small proportion of the bookmarks are actually measurement of normal structures,
such as lymph nodes of normal size.

https://i.imgur.com/AuNDBbz.png

## Acknowledgments 

This research was supported by the Intramural Research Program of the NIH Clinical Center. We
thank NVIDIA for the donation of GPU cards. We thank our lab members Jiamin Liu, Yuxing Tang,
and Youbao Tang for their help in preparing the dataset.},
terms= {},
license= {"usage of the data set is unrestricted"},
superseded= {},
url= {https://nihcc.app.box.com/v/DeepLesion}
}


Send Feedback