Integration of TCGA and CCLE data
Download TCGA clinical/subtype information
-
Move to
transcriptomic_data/and startR$ cd transcriptomic_data $ R -
Read
integration.Rsource("integration.R") -
Run
outputClinical()oroutputSubtype()outputClinical("BRCA") outputSubtype("BRCA")Output:
<TCGA Study Abbreviation>_clinic.csvor<TCGA Study Abbreviation>_subtype.csv
Select samples in reference to clinical or subtype data
-
You can select the patient’s state based on the clinical or subtype data obtained above.
patientSelection(type = subtype, ID = "patient", pathologic_stage %in% c("Stage_I", "Stage_II"), age_at_initial_pathologic_diagnosis < 60)
Download TCGA gene expression data (HTSeq-Counts)
-
Download the gene expression data of the specified sample types (Sample Type Codes) in the cancer type specified by
outputClinical()oroutputSubtype(). By running this code, you can get data of only the patients selected bysampleSelection().downloadTCGA(cancertype = "BRCA", sampletype = c("01", "06"), outputresult = FALSE)Output: Number of selected samples
Download CCLE transcriptomic data
-
Download CCLE transcriptomic data. You can select cell lines derived from one specific cancer type.
downloadCCLE(cancertype = "BREAST", outputresult = FALSE)Output: Number of selected samples
Merge TCGA and CCLE data
- Merge TCGA data download with
downloadTCGA()and CCLE data download withdownloadCCLE(). - Run ComBat-seq program to remove batch effects between TCGA and CCLE datasets.
-
Output total read counts of all samples in order to decide the cutoff value of total read counts for
normalization().mergeTCGAandCCLE(outputesult = FALSE)Output :
totalreadcounts.csv
Normalize RNA-seq counts data
- Conduct normalization of RNA-seq.
- You can specify min and max value for truncation of total read counts.
-
If you do not want to specify values for truncation, please set
min=Formax=F.normalization(min=40000000, max=140000000)Output :
TPM_RLE_postComBat_<TCGA>_<CCLE>.csv