TCGA癌种项目:
项目名称 | 癌种 | 样本数量 |
---|---|---|
TCGA-BRCA | 乳腺癌(Breast Invasive Carcinoma) | 1079 |
TCGA-OV | 卵巢浆液性囊腺癌(Ovarian Serous Cystadenocarcinoma) | 571 |
TCGA-LUAD | 肺腺癌(Lung Adenocarcinoma) | 563 |
TCGA-UCEC | 子宫内膜癌(Uterine Corpus Endometrial Carcinoma) | 542 |
TCGA-HNSC | 头颈部鳞状细胞癌(Head and Neck Squamous Cell Carcinoma) | 523 |
TCGA-KIRC | 肾细胞癌(Kidney Renal Clear Cell Carcinoma) | 523 |
TCGA-GBM | 胶质母细胞瘤(Glioblastoma Multiforme) | 522 |
TCGA-LGG | 低级别胶质瘤(Brain Lower Grade Glioma) | 509 |
TCGA-LUSC | 肺鳞状细胞癌(Lung Squamous Cell Carcinoma) | 501 |
TCGA-THCA | 甲状腺癌(Thyroid Carcinoma) | 473 |
TCGA-PRAD | 前列腺癌(Prostate Adenocarcinoma) | 469 |
TCGA-SKCM | 黑色素瘤(Skin Cutaneous Melanoma) | 469 |
TCGA-COAD | 结肠癌(Colon Adenocarcinoma) | 458 |
TCGA-STAD | 胃癌(Stomach Adenocarcinoma) | 437 |
TCGA-BLCA | 膀胱癌(Bladder Urothelial Carcinoma) | 408 |
TCGA-LIHC | 肝细胞癌(Liver Hepatocellular Carcinoma) | 375 |
TCGA-CESC | 宫颈癌(Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma) | 305 |
TCGA-KIRP | 肾乳头状细胞癌(Kidney Renal Papillary Cell Carcinoma) | 289 |
TCGA-TGCT | 睾丸生殖细胞肿瘤(Testicular Germ Cell Tumors) | 261 |
TCGA-SARC | 软组织肉瘤(Sarcoma) | 255 |
TCGA-ESCA | 食管癌(Esophageal Carcinoma) | 183 |
TCGA-PAAD | 胰腺癌(Pancreatic Adenocarcinoma) | 173 |
TCGA-READ | 直肠癌(Rectum Adenocarcinoma) | 170 |
TCGA-PCPG | 嗜铬细胞瘤/副神经节瘤(Pheochromocytoma and Paraganglioma) | 169 |
TCGA-LAML | 急性髓细胞性白血病(Acute Myeloid Leukemia) | 135 |
TCGA-THYM | 胸腺瘤(Thymoma) | 97 |
TCGA-ACC | 肾上腺皮质癌(Adrenocortical Carcinoma) | 92 |
TCGA-MESO | 恶性间皮瘤(Mesothelioma) | 85 |
TCGA-UVM | 葡萄膜黑色素瘤(Uveal Melanoma) | 80 |
TCGA-KICH | 肾嫌色细胞癌(Kidney Chromophobe) | 66 |
TCGA-UCS | 子宫梗死性肉瘤(Uterine Carcinosarcoma) | 57 |
TCGA-CHOL | 胆管癌(Cholangiocarcinoma) | 50 |
TCGA-DLBC | 弥漫性大B细胞淋巴瘤(Lymphoid Neoplasm Diffuse Large B-cell Lymphoma) | 47 |
指定好项目名称下载即可(STAR-count转录组定量结果,其它数据类型需要自己指定):
library(TCGAbiolinks)
library(dplyr)
library(SummarizedExperiment)
library(msigdbr)
# 选择项目
class <- "TCGA-READ"
# 数据下载
query <- GDCquery(
project = class,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "STAR - Counts"
)
GDCdownload(query = query)
data <- GDCprepare(query = query)
if (!dir.exists(paste0("./", class))) {
dir.create(paste0("./", class))
}
Exp <- assay(data) %>% as.data.frame() # 提取数据表达
ann <- rowRanges(data) # 提取基因注释
ann <- as.data.frame(ann)
rownames(ann) <- ann$gene_id
ann <- ann[rownames(Exp),]
write.csv(ann, paste0("./", class,"/ann.csv"), row.names = F) # 基因注释信息
Exp <- cbind(data.frame(Gene = ann$gene_name), Exp)
write.csv(Exp, paste0("./", class,"/exp.csv"), row.names = F) # 表达矩阵
clinical <- GDCquery_clinic(project= class, type = "clinical") # 提取临床信息
write.csv(clinical, paste0("./", class,"/clinical.csv"), row.names = F) # 临床注释信息
结果如下:
▲ count 表达矩阵
▲ 样本临床、生存信息
▲ 基因注释