学学习笔记:利用TCGA Assembler工具下载及处理数据

下载包的作者提供的下载地址:https://github.com/compgenome365/TCGA-Assembler-2
文献引用:TCGA assembler 2: software pipeline for retrieval and progressing of TCGA/CPTAC data
参考操作说明:https://cloud.tencent.com/developer/article/1481868

工作环境准备

在这里插入图片描述

安装包

getwd()
source("https://bioconductor.org/biocLite.R")
biocLite("httr")
biocLite("RCurl")
biocLite("stringr")
biocLite("HGNChelper")
biocLite("rjson")
library(httr)
library(bitops)
library(RCurl)
library(stringr)
library(HGNChelper)
library(rjson)
library(digest)
##download clinical data
#https://tcga-data.nci.nih.gov/docs/publications/tcga/
#查看癌症缩写
source("Module_A.R")  
source("Module_B.R")  

利用Model_A下载数据

以COAD的CPTAC数据为例:

filename_coad_CPTAC <-DownloadCPTACData(cancerType="COAD", 
                  assayPlatform = "proteome_iTRAQ",
                  saveFolderName = "./ManualExampleData/RawData.TCGA-Assembler")

利用Model_B处理数据

在这里插入图片描述

source("Module_B.R")
CPTACData	<-	
  ProcessCPTACData(inputFilePath	=	filename_read_CPTAC[1],	
                   outputFileName	=	"COAD_iTRAQData",	
                   outputFileFolder	=	"./ManualExampleData/ProcessedData.TCGA-Assembler") 

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是TCGA数据下载处理的R语言脚本: 首先,需要安装以下R包:TCGAbiolinks,tidyverse,ggplot2,survival,survminer。 ```R # 安装TCGAbiolinks包 if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("TCGAbiolinks") # 安装其他必要的包 install.packages(c("tidyverse", "ggplot2", "survival", "survminer")) ``` 接下来,下载TCGA数据。例如,我们下载肺癌(LUSC)的RNA-seq和临床数据。 ```R library(TCGAbiolinks) # Set working directory setwd("your_working_directory") # Download RNA-seq data query <- GDCquery(project = "TCGA-LUSC", data.category = "Transcriptome Profiling", data.type = "Gene Expression Quantification", workflow.type = "HTSeq - FPKM", legacy = TRUE, platform = "Illumina HiSeq", file.type = "results", experimental.strategy = "RNA-Seq") GDCdownload(query) # Download clinical data query <- GDCquery(project = "TCGA-LUSC", data.category = "Clinical", file.type = "xml") GDCdownload(query) ``` 接下来,我们可以将下载的RNA-seq数据导入到R中,并进行预处理。例如,我们可以通过log2转换标准化数据并删除低表达基因。 ```R # Load RNA-seq data LUSC_rnaseq <- GDCprepare(query, save = TRUE, save.filename = "LUSC_rnaseq") # Log2 transformation and normalization LUSC_rnaseq$log2 <- log2(LUSC_rnaseq$counts+1) LUSC_rnaseq_norm <- normalizeBetweenArrays(LUSC_rnaseq$log2, method = "quantile") # Remove low expressed genes LUSC_rnaseq_norm_filter <- LUSC_rnaseq_norm[rowSums(LUSC_rnaseq_norm > 1) >= 20,] ``` 最后,我们可以使用survival和survminer包对临床数据进行生存分析和可视化。 ```R # Load clinical data LUSC_clinical <- GDCprepare_clinic(query, clinical.info = "patient") # Merge RNA-seq and clinical data LUSC_data <- merge(LUSC_rnaseq_norm_filter, LUSC_clinical, by = "bcr_patient_barcode") # Survival analysis fit <- survfit(Surv(time, vital_status) ~ 1, data = LUSC_data) ggsurvplot(fit, data = LUSC_data, pval = TRUE, conf.int = TRUE) # Cox proportional hazards model model <- coxph(Surv(time, vital_status) ~ gene1 + gene2 + gene3, data = LUSC_data) summary(model) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值