数据提取linux,提取Seurat数据做GSEA（Linux版）

最新推荐文章于 2024-08-01 20:00:27 发布

monte3card

最新推荐文章于 2024-08-01 20:00:27 发布

阅读量551

点赞数

文章标签：数据提取linux

在Linux上跑GSEA只是工程需要，在Windows上不能批量的跑；在单细胞数据上跑GSEA那是客户的需要。有需要就需要被满足，我们今天介绍一下如何提取Seurat数据做GSEA。这里的GSEA指的是GSEA官网软件，而不是fgsea, clusterProfiler等，该软件是由JAVA写的，所以应该安装合适的JAVA，为了能看懂一些常见的报错，建议学点JAVA(半年时间过去了)。

java安装与否

which java

/usr/bin/java

/usr/bin/java -version

openjdk version "1.8.0_181"

OpenJDK Runtime Environment (build 1.8.0_181-b13)

OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

gsea 安装与否

检查gsea是否安装成功

java -Xmx512m -cp gsea-3.0.jar xtools.gsea.Gsea -h

xtools.api.param.MissingReqdParamException:

Some required parameters (3) were not specified. The parameters are:

>gmx< Gene sets database (gmx or gmt files - one or more are allowed)

>res< Expression dataset - with rows as genes and columns as samples (for instance: res, gct, pcl files)

>cls< Phenotype labels for the samples in the expression dataset (cls file)

at xtools.api.param.ToolParamSet.check(ToolParamSet.java:340)

at xtools.api.AbstractTool.init(AbstractTool.java:169)

at xtools.api.AbstractTool.init(AbstractTool.java:193)

at xtools.gsea.Gsea.(Gsea.java:51)

at xtools.gsea.Gsea.main(Gsea.java:139)

选择一个通路(gene set)

在整理这份资料的时候，发现关于GSEA的大部分疑难杂症我们生信技能树都有文章，正所谓：

生信教程技能树

入门生信谁敢坑

提示：通路名字中尽量不要有/,如GLYCOLYSIS / GLUCONEOGENESIS ，因为在Linux中/意味着路径，在输出文件中有以通路名命名的文件。细胞命名也最好不要有这个符号呀。

提取数据做GSEA

SeuratRunGSEA

Idents(seuo)

seuo

seuo@assays$RNA@counts-> counts # 是用count还是data？

expr_data

# 整理GSEA需要的表达谱格式

write.table(rbind(c('symbols',colnames(expr_data)),

cbind(rownames(expr_data),expr_data)),

file='expr.txt',quote=F,sep='\t',col.names=F,row.names=F)

# 整理GSEA需要的分组格式

pheno

con

write(paste(length(pheno),'2 1'),con)

write(paste('# ',test1,' ',test2,sep=''),con)

classes

for (i in 1:length(pheno)){

classes

}

write(classes,con)

close(con)

# GSEA 命令

command

' -collapse false -nperm 1000 -permute gene_set -rnd_type no_balance -scoring_scheme weighted -rpt_label ',testname,

' -metric Diff_of_Classes -sort real -order descending -include_only_symbols false -make_sets true -median false -num 100',

' -plot_top_x 20 -rnd_seed 123456 -save_rnd_lists false -set_max 10000 -set_min 5 -zip_report false -out ', outdir, ' -gui false',sep='')

system(command)

com

write(command,com) # 保存运行的命令，以便在命令行中调参

close(com)

}

照例，请出pbmc3k数据集，测试一番：

library(Seurat)

library(SeuratData)

gmt

outdir = './'

base.name = './'

test1='B'

test2 = 'NK'

testname='seurat_annotations'

pbmc3k

An object of class Seurat

13714 features across 2700 samples within 1 assay

Active assay: RNA (13714 features, 0 variable features)

> head(pbmc3k@meta.data)

orig.ident nCount_RNA nFeature_RNA seurat_annotations

AAACATACAACCAC pbmc3k 2419 779 Memory CD4 T

AAACATTGAGCTAC pbmc3k 4903 1352 B

AAACATTGATCAGC pbmc3k 3147 1129 Memory CD4 T

AAACCGTGCTTCCG pbmc3k 2639 960 CD14+ Mono

AAACCGTGTATGCG pbmc3k 980 521 NK

AAACGCACTGGTAC pbmc3k 2163 781 Memory CD4 T

运行：

SeuratRunGSEA(pbmc3k,testname,test1,test2,gmt,outdir,testname)

结果在seurat_annotations.Gsea.XXXXXXXXXXXXX下，导出，用浏览器打开index.html文件即可：

e9b59353cd3a?utm_campaign=shakespeare

剩下的就是结合生物学意义解读结果了，网上有许多的介绍，这砖不搬了。

e9b59353cd3a?utm_campaign=shakespeare

两个参数注意一啊，我们用的都是默认的：

-create_svgs

Create SVG plot images along with PNGs (GZ compressed to save space as these are very large)

Default: false

Hints : true,false

-create_gcts

Create GCT files for the data backing the Gene Set Enrichment Heatmaps

Default: false

Hints : true,false

preRank 模式

对应细胞数较多的情况。

runGSEA_preRank

set.seed(1314)

Idents(seuo)

mk$gene

mk %>% filter(p_val_adj >0) ->mk1

mk$p_val_adj[which(mk$p_val_adj==0)]

mk %>% mutate(sign = ifelse(avg_logFC>0,1,-1),Pstatistics=-1*log(p_val_adj)*sign) -> mk

preRank.matrix

prerank

write.table(preRank.matrix,

file=prerank,

quote=F,

sep='\t',

col.names=F,

row.names=F)

#call java gsea version

command

' -scoring_scheme weighted -make_sets true -rnd_seed 123456 -set_max 500 -set_min 1 -zip_report false ',

' -out preRankResults -create_svgs true -gui false -rpt_label ',testname, sep='')

system(command)

com

write(command,com)

close(com)

}

monte3card

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫