2021.05.11丨COG分析柱状图绘制

最新推荐文章于 2024-05-04 22:42:04 发布

穆易青

最新推荐文章于 2024-05-04 22:42:04 发布

阅读量4.6k

点赞数 1

分类专栏：生物信息数据处理读书笔记 RNA-seq 文章标签： r语言数据库 python

本文链接：https://blog.csdn.net/yangl7/article/details/116660853

版权

生物信息同时被 3 个专栏收录

88 篇文章 56 订阅

订阅专栏

RNA-seq

34 篇文章 15 订阅

订阅专栏

数据处理读书笔记

31 篇文章 3 订阅

订阅专栏

摘要

在RNA-seq项目中，需要将差异基因比对到各个数据库当中，生成相应的注释结果和图像，便于深度挖掘信息。COG（Cluster of Orthologous Groups ofproteins 同源蛋白簇）数据库可以帮助了解蛋白功能甚至进化关系（细/真菌）。此次记录一下COG分类图的绘制方法

环境与方法

R version 3.6.1 (2019-07-05)

文档准备

分类简称及描述

        # Code	Name
        J	Translation, ribosomal structure and biogenesis
        A	RNA processing and modification
        K	Transcription
        L	Replication, recombination and repair
        B	Chromatin structure and dynamics
        D	Cell cycle control, cell division, chromosome partitioning
        Y	Nuclear structure
        V	Defense mechanisms
        T	Signal transduction mechanisms
        M	Cell wall/membrane/envelope biogenesis
        N	Cell motility
        Z	Cytoskeleton
        W	Extracellular structures
        U	Intracellular trafficking, secretion, and vesicular transport
        O	Posttranslational modification, protein turnover, chaperones
        X	Mobilome: prophages, transposons
        C	Energy production and conversion
        G	Carbohydrate transport and metabolism
        E	Amino acid transport and metabolism
        F	Nucleotide transport and metabolism
        H	Coenzyme transport and metabolism
        I	Lipid transport and metabolism
        P	Inorganic ion transport and metabolism
        Q	Secondary metabolites biosynthesis, transport and catabolism
        R	General function prediction only
        S	Function unknown

比对结果

这里使用将差异基因的序列比对到eggnog数据库分析出来的结果
红色方框为COG分类注释结果

在这里插入图片描述

使用代码

对注释结果进行汇总

 import sys
        fun_file = sys.argv[1] #分类文件输入
        anno_list = sys.argv[2] #注释文件输入
        Describe_file = open(fun_file,'r')
        Class_file = open(anno_list,'r')
        Describe_line = Describe_file.readlines()
        Class_line = Class_file.readlines()
        newgtf_name = 'DrawAnnotationPic.R'
        desktop_path = './'
        file_path = desktop_path+newgtf_name+'.txt'
        file = open(file_path,'w')
        for i in Describe_line[1:]:
            anno_ID = i.split('\t')
            describe = anno_ID[1].replace('\n','')
            count = 0
            for j in Class_line:
                COG_ID = j.split('\t')
                #print(COG_ID)
                if len(COG_ID) > 2:
                    #print(COG_ID[-2])
                    #print(anno_ID[0])
                    if anno_ID[0] == COG_ID[-2]:
                        count += 1
                    else:
                        continue
            COG_list = anno_ID[0]+"\t"+describe+"\t"+str(count)+"\n"
            #print(COG_list)
            file.write(COG_list)
        file.close()
        Class_file.close()
        Describe_file.close()

得到汇总文件
在这里插入图片描述
绘制图像

        args<-commandArgs(T)
        if(length(args)<2){
            cat("[usage:]  ")
            quit("no")

        }
        cazyclass<-args[1]
        outfile<-args[2]
        library(ggplot2)
        dat<-read.table(cazyclass,header=F,sep='\t')
        pdf(file=paste(outfile,".pdf",sep=""),width=6*1.5,height=6)
        colnames(dat)<-c("Motif","Description","Number")
        ggplot(dat,aes(x=dat$Motif,y=dat$Number,fill=paste(dat$Motif,':',dat$Description)))+geom_bar(stat="identity")+theme(legend.title=element_blank(),legend.text=element_text(size=6),legend.key=element_rect(size=6))+labs(x='Function class',y='Number of matched genes')
        dev.off()

结果展示

在这里插入图片描述

总结

这里再提供另外两个思路供大家去探索，一是将eggnog的注释结果改为在本地对cog数据库进行比对（之前尝试过分析，时间较长，操作有些复杂）；二是将得到的统计结果放到一些生信公司提供的云工具上进行分析（联川、百迈客等），或许比自己分析更快。

在这里插入图片描述

穆易青

关注

1
点赞
踩
11

收藏

觉得还不错? 一键收藏
打赏
0
评论
2021.05.11丨COG分析柱状图绘制

目录摘要环境与方法文档准备分类简称及描述比对结果使用代码结果展示总结摘要在RNA-seq项目中，需要将差异基因比对到各个数据库当中，生成相应的注释结果和图像，便于深度挖掘信息。COG（Cluster of Orthologous Groups ofproteins 同源蛋白簇）数据库可以帮助了解蛋白功能甚至进化关系（细/真菌）。此次记录一下COG分类图的绘制方法环境与方法R version 3.6.1 (2019-07-05)文档准备分类简称及描述 # Code Name
复制链接

扫一扫