Non-Small Cell Lung Cancer CT Scan Dataset (NSCLC-Radiomics-Genomics)

folder NSCLC-Radiomics-Genomics (183 files)
fileGSE58661_series_matrix.txt.gz 29.93MB
fileLung3.csv 13.64kB
fileGSE58661_RAW/GSM1416616_LUNG3-89.CEL.gz 4.74MB
fileGSE58661_RAW/GSM1416614_LUNG3-87.CEL.gz 4.83MB
fileGSE58661_RAW/GSM1416615_LUNG3-88.CEL.gz 4.85MB
fileGSE58661_RAW/GSM1416613_LUNG3-86.CEL.gz 4.77MB
fileGSE58661_RAW/GSM1416611_LUNG3-84.CEL.gz 4.69MB
fileGSE58661_RAW/GSM1416612_LUNG3-85.CEL.gz 4.85MB
fileGSE58661_RAW/GSM1416610_LUNG3-83.CEL.gz 4.79MB
fileGSE58661_RAW/GSM1416608_LUNG3-81.CEL.gz 4.83MB
fileGSE58661_RAW/GSM1416609_LUNG3-82.CEL.gz 4.91MB
fileGSE58661_RAW/GSM1416607_LUNG3-80.CEL.gz 4.75MB
fileGSE58661_RAW/GSM1416605_LUNG3-78.CEL.gz 4.76MB
fileGSE58661_RAW/GSM1416606_LUNG3-79.CEL.gz 4.80MB
fileGSE58661_RAW/GSM1416604_LUNG3-77.CEL.gz 4.87MB
fileGSE58661_RAW/GSM1416602_LUNG3-75.CEL.gz 4.91MB
fileGSE58661_RAW/GSM1416603_LUNG3-76.CEL.gz 4.74MB
fileGSE58661_RAW/GSM1416601_LUNG3-74.CEL.gz 4.73MB
fileGSE58661_RAW/GSM1416599_LUNG3-72.CEL.gz 4.92MB
fileGSE58661_RAW/GSM1416600_LUNG3-73.CEL.gz 4.79MB
fileGSE58661_RAW/GSM1416598_LUNG3-71.CEL.gz 4.60MB
fileGSE58661_RAW/GSM1416596_LUNG3-69.CEL.gz 4.60MB
fileGSE58661_RAW/GSM1416597_LUNG3-70.CEL.gz 4.88MB
fileGSE58661_RAW/GSM1416595_LUNG3-68.CEL.gz 4.86MB
fileGSE58661_RAW/GSM1416593_LUNG3-66.CEL.gz 4.88MB
fileGSE58661_RAW/GSM1416594_LUNG3-67.CEL.gz 4.64MB
fileGSE58661_RAW/GSM1416592_LUNG3-65.CEL.gz 4.89MB
fileGSE58661_RAW/GSM1416590_LUNG3-63.CEL.gz 4.74MB
fileGSE58661_RAW/GSM1416591_LUNG3-64.CEL.gz 4.76MB
fileGSE58661_RAW/GSM1416589_LUNG3-62.CEL.gz 4.68MB
fileGSE58661_RAW/GSM1416587_LUNG3-60.CEL.gz 4.65MB
fileGSE58661_RAW/GSM1416588_LUNG3-61.CEL.gz 4.64MB
fileGSE58661_RAW/GSM1416586_LUNG3-59.CEL.gz 4.86MB
fileGSE58661_RAW/GSM1416584_LUNG3-57.CEL.gz 4.77MB
fileGSE58661_RAW/GSM1416585_LUNG3-58.CEL.gz 4.77MB
fileGSE58661_RAW/GSM1416583_LUNG3-56.CEL.gz 4.88MB
fileGSE58661_RAW/GSM1416581_LUNG3-54.CEL.gz 4.77MB
fileGSE58661_RAW/GSM1416582_LUNG3-55.CEL.gz 4.59MB
fileGSE58661_RAW/GSM1416580_LUNG3-53.CEL.gz 4.83MB
fileGSE58661_RAW/GSM1416578_LUNG3-51.CEL.gz 4.61MB
fileGSE58661_RAW/GSM1416579_LUNG3-52.CEL.gz 4.83MB
fileGSE58661_RAW/GSM1416577_LUNG3-50.CEL.gz 4.77MB
fileGSE58661_RAW/GSM1416575_LUNG3-48.CEL.gz 4.67MB
fileGSE58661_RAW/GSM1416576_LUNG3-49.CEL.gz 4.80MB
fileGSE58661_RAW/GSM1416574_LUNG3-47.CEL.gz 4.75MB
fileGSE58661_RAW/GSM1416572_LUNG3-45.CEL.gz 4.76MB
fileGSE58661_RAW/GSM1416573_LUNG3-46.CEL.gz 4.83MB
fileGSE58661_RAW/GSM1416571_LUNG3-44.CEL.gz 4.64MB
fileGSE58661_RAW/GSM1416569_LUNG3-42.CEL.gz 4.70MB
Too many files! Click here to view them all.
Type: Dataset

title= {Non-Small Cell Lung Cancer CT Scan Dataset (NSCLC-Radiomics-Genomics)},
keywords= {},
journal= {},
author= {},
year= {},
url= {},
license= {Creative Commons Attribution 3.0 Unported License},
abstract= {This collection contains images from 89 non-small cell lung cancer (NSCLC) patients that were treated with surgery. For these patients pretreatment CT scans, gene expression, and clinical data are available. This dataset refers to the Lung3 dataset of the study published in Nature Communications.
In short, this publication applies a radiomic approach to computed tomography data of 1,019 patients with lung or head-and-neck cancer. Radiomics refers to the comprehensive quantification of tumour phenotypes by applying a large number of quantitative image features. In present analysis 440 features quantifying tumour image intensity, shape and texture, were extracted.  We found that a large number of radiomic features have prognostic power in independent data sets, many of which were not identified as significant before. Radiogenomics analysis revealed that a prognostic radiomic signature, capturing intra-tumour heterogeneity, was associated with underlying gene-expression patterns. These data suggest that radiomics identifies a general prognostic phenotype existing in both lung and head-and-neck cancer. This may have a clinical impact as imaging is routinely used in clinical practice, providing an unprecedented opportunity to improve decision-support in cancer treatment at low cost.

The dataset described here (Lung3) was used to investigate the association of radiomic imaging features with gene-expression profiles. The Lung2 dataset used for training the radiomic biomarker and consisting of 422 NSCLC CT scans with outcome data can be found here: NSCLC-Radiomics.

For scientific inquiries about this dataset, please contact Dr. Hugo Aerts of the Dana-Farber Cancer Institute / Harvard Medical School (

Gene-expression Data
Corresponding microarray data acquired for the imaging samples are available at National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (Link to GEO: The patient names used to identify the cases on GEO are identical to those used in the DICOM files on TCIA and in the clinical data spreadsheet.
Clinical Data
Corresponding clinical data can be found here: Lung3.metadata.xls.
Please note that survival time is measured in days from start of treatment. DICOM patients names are identical in TCIA and clinical data file.


### Publications

Aerts, H. J. W. L., Velazquez, E. R., Leijenaar, R. T. H., Parmar, C., Grossmann, P., Cavalho, S., … Lambin, P. (2014, June 3). Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nature Communications. Nature Publishing Group.

superseded= {},
terms= {}

Send Feedback