library(readr)
library(readr)
gtf <- read_delim("~/Program/wanggang/shihuang/SC/hum/GCF_000001405.40_GRCh38.p14_genomic.gtf",
delim = "\t", escape_double = FALSE,
col_names = FALSE, comment = "#", trim_ws = TRUE)
gene_info = gtf %>% filter(X3 == "CDS") %>% dplyr::select(X9)
gene_info1 =gene_info %>% mutate(gene_id = str_extract(X9, '(?<=gene_id ")[^"]+')) %>%
mutate(protein_id = str_extract(X9, '(?<=protein_id ")[^"]+'))
gene_info1 = gene_info1 %>% dplyr::select(-1)
gene_info1 = gene_info1 %>% unique()
gene_info1 %>% write_tsv("~/Program/wanggang/shihuang/SC/hum/gene_to_protein.tsv")
R语言从gtf文件提取genename和对应的protein_id
最新推荐文章于 2024-07-18 15:16:00 发布