一、数据下载
1 GDCquery筛选-GDCDownload下载-GDCprepare_clinic()读取 数据
1)下载indexed简化版数据: ——正常就够了
data_cl <-GDCquery_clinic(project = "TCGA-PRAD", type = "clinical")
write.csv(data_cl, file = 'data_cl_index.csv', row.names =F)
2)下载XML数据:
query <-GDCquery(project = "TCGA-PRAD",
data.category = "Clinical",
file.type = "xml")
GDCdownload(query,
method = "api",
directory = "GDCdata", #默认文件夹名称
files.per.chunk = 6) #网速慢可以设小一点
#循环输出所有数据
clinical.info<-c("drug","follow_up","radiation","patient","stage_event","new_tumor_event","admin")
for(i in clinical.info){
data_cl <- GDCprepare_clinic(query, clinical.info = i)
write.csv(data_cl, file = paste0('TCGA-PRAD_clinical_',i,'.csv'), row.names =F)
}
临床数据说明:
TCGA临床数据下载—TCGAbiolinks - 组学大讲堂问答社区
1 GDC database 中三种临床数据类型的区别:
- indexed clinical: a refined clinical data that is created using the XML files.
- XML files: original source of the data #最全的临床数据类型
- BCR Biotab: tsv files parsed from XML files
2 indexed和XML数据的区别:
- XML has more information: radiation, drugs information, follow-ups, biospecimen, etc. So the indexed one is only a subset of the XML files
- The indexed data contains the updated data with the follow up information. For example: if the patient is alive in the first time clinical data was collect and the in the next follow-up he is dead, the indexed data will show dead. The XML will have two fields, one for the first time saying he is alive (in the clinical part) and the follow-up saying he is dead.
推荐使用XML clinical data(信息最全)
3 GDCprepare_clinic() :clinical.info参数设置(以下7种)
官方说明地址:TCGAbiolinks: Clinical data