TCGA_临床数据下载

郁柳_Fudan

已于 2022-04-24 19:01:11 修改

阅读量2.5k

点赞数 1

分类专栏： TCGA 文章标签： r语言

于 2022-03-22 21:53:49 首次发布

本文链接：https://blog.csdn.net/weixin_59289660/article/details/123670374

版权

TCGA 专栏收录该内容

12 篇文章 56 订阅

订阅专栏

一、数据下载

1 GDCquery筛选-GDCDownload下载-GDCprepare_clinic()读取数据

1）下载indexed简化版数据： ——正常就够了

data_cl <-GDCquery_clinic(project = "TCGA-PRAD", type = "clinical")

write.csv(data_cl, file = 'data_cl_index.csv', row.names =F)

2）下载XML数据：

query <-GDCquery(project = "TCGA-PRAD", 
                 data.category = "Clinical", 
                 file.type = "xml")

GDCdownload(query, 
            method = "api",
            directory = "GDCdata", #默认文件夹名称
            files.per.chunk = 6) #网速慢可以设小一点

#循环输出所有数据
clinical.info<-c("drug","follow_up","radiation","patient","stage_event","new_tumor_event","admin")
for(i in clinical.info){
  data_cl <- GDCprepare_clinic(query, clinical.info = i)
  write.csv(data_cl, file = paste0('TCGA-PRAD_clinical_',i,'.csv'), row.names =F)
}

临床数据说明：

TCGA临床数据下载—TCGAbiolinks - 组学大讲堂问答社区

1 GDC database 中三种临床数据类型的区别：

indexed clinical: a refined clinical data that is created using the XML files.
XML files: original source of the data #最全的临床数据类型
BCR Biotab: tsv files parsed from XML files

2 indexed和XML数据的区别：

XML has more information: radiation, drugs information, follow-ups, biospecimen, etc. So the indexed one is only a subset of the XML files
The indexed data contains the updated data with the follow up information. For example: if the patient is alive in the first time clinical data was collect and the in the next follow-up he is dead, the indexed data will show dead. The XML will have two fields, one for the first time saying he is alive (in the clinical part) and the follow-up saying he is dead.

推荐使用XML clinical data（信息最全）

3 GDCprepare_clinic() ：clinical.info参数设置（以下7种）

官方说明地址：TCGAbiolinks: Clinical data