SparCC

最新推荐文章于 2024-05-10 20:52:01 发布

jxxxxxxxxx

最新推荐文章于 2024-05-10 20:52:01 发布

阅读量1k

点赞数 1

文章标签： linux

本文链接：https://blog.csdn.net/jxxxxxxxxx/article/details/120341940

版权

SparCC

SparCC is a network inference tool that was specifically designed to be robust to data compositionality. The method is described in PLoS Comp 8(9): e1002687.

Step 1 - Compute correlations
SparCC is a python program that runs on command line. Please choose the input file “arctic_soils_filtered.txt”. Compute the compositionality-robust correlations as the median of 10 iterations as follows:

python SparCC.py arctic_soils_filtered.txt -i 10 --cor_file=arctic_soils_sparcc.txt > sparcc.log
where -i gives the number of iterations over which the correlations are averaged. SparCC averages its results over several estimates of the true fractions, which it estimates from the counts using the Dirichlet distribution.

Step 2 - Compute bootstraps
You can then generate bootstraps from the input data using the following command:

python MakeBootstraps.py arctic_soils_filtered.txt -n 100 -o Resamplings/boot
where Resamplings is a directory and boot is the prefix of all resampled data sets. You then have to launch SparCC on each of the resampled data sets. This is best done in a script. Here is a simple script that does the job. As an example, this bash script would generate 10 correlation matrices from the first 10 resampled data sets:

for i in 0 1 2 3 4 5 6 7 8 9
do
python SparCC.py Resamplings/boot_ $KaTeX parse error: Expected group after '_' at position 42: \dotststraps/sim_cor_̲$ i.txt >> sparcc.log
done
where Bootstraps is the directory into which correlation matrices computed from the resampled data matrices will be written. To compute p-values, more than 10 iterations are needed. Precomputed bootstrap correlations for 100 iterations can be downloaded here.

Step 3 - Compute p-values
Once the bootstrapped correlation scores have been computed, the p-values can be generated using command:

python PseudoPvals.py arctic_soils_sparcc.txt Bootstraps/sim_cor 10 -o pvals_two_sided.txt -t ‘two_sided’ >> sparcc.log
Step 4 - Visualization
Note that you need to threshold the p-value matrix at the desired cut-off and to convert it into a network using a script of your own. For example, below is a simple R script that will perform this task. In the script, the p-value matrix is converted into a matrix of significances. Do you know why? You can find the answer here.

#load R graph library igraph
library(igraph)
path=“pvals_two_sided.txt”
pvals=read.table(path,header=TRUE,sep="\t")
pvals.mat=pvals[,2:ncol(pvals)]
#set p-values of 0 to a non-zero, small p-value so we can take the logarithm
pvals.mat[pvals.mat==0]=0.000000001
#convert into significance
sig.mat=-1*log10(pvals.mat)
#remove all edges with significance below 1
sig.mat[sig.mat<1]=0
sig.mat=as.matrix(sig.mat)
#convert adjacency matrix into a graph
sparcc.graph=graph.adjacency(sig.mat,mode=“undirected”)
#display the graph
layout=layout.spring
plot(sparcc.graph, layout=layout)

不妨考虑使用R进行矩阵操作，根据相关性的强度以及显著水平自定义筛选，只保留具有显著的强相关关系，如下示例，最终获得邻接矩阵类型的网络文件。

#观测值的相关矩阵
cor_sparcc <- read.delim(‘cor_sparcc.out.txt’, row.names = 1, sep = ‘\t’, check.names = FALSE)

#伪 p 值矩阵
pvals <- read.delim(‘pvals.two_sided.txt’, row.names = 1, sep = ‘\t’, check.names = FALSE)

#保留 |相关性|≥0.8 且 p<0.01的值
cor_sparcc[abs(cor_sparcc) < 0.8] <- 0

pvals[pvals>=0.01] <- -1
pvals[pvals<0.01 & pvals>=0] <- 1
pvals[pvals==-1] <- 0

#筛选后的邻接矩阵
adj <- as.matrix(cor_sparcc) * as.matrix(pvals)
diag(adj) <- 0 #将相关矩阵中对角线中的值（代表了自相关）转为 0
write.table(data.frame(adj, check.names = FALSE), ‘neetwork.adj.txt’, col.names = NA, sep = ‘\t’, quote = FALSE)

图示邻接矩阵，不存在相关就是0，存在相关就是非0的数值，正值表示正相关，负值表示负相关，数值的绝对值大小代表相关强度。

R包igraph的网络操作

随后，不妨继续使用R，通过邻接矩阵构建网络，并对网络格式进行转换，以便能够使用更多工具（如Cytoscape、Gephi等）进行统计分析、可视化操作等。

igraph包提供了灵活的网络操作方法，首先通过它转换网络格式。

##网络格式转换
library(igraph)

#输入数据，邻接矩阵
neetwork_adj <- read.delim(‘neetwork.adj.txt’, row.names = 1, sep = ‘\t’, check.names = FALSE)
head(neetwork_adj)[1:6] #邻接矩阵类型的网络文件

#邻接矩阵 -> igraph 的邻接列表，获得含权的无向网络
g <- graph_from_adjacency_matrix(as.matrix(neetwork_adj), mode = ‘undirected’, weighted = TRUE, diag = FALSE)
g #igraph 的邻接列表

#这种转换模式下，默认的边权重代表了 sparcc 计算的相关性（存在负值）
#由于边权重通常为正值，因此最好取个绝对值，相关性重新复制一列作为记录
E(g) $s p a r c c < - E (g)$ weight
E(g) $w e i g h t < - a b s (E (g)$ weight)

#再转为其它类型的网络文件，例如
#再由 igraph 的邻接列表转换回邻接矩阵
adj_matrix <- as.matrix(get.adjacency(g, attr = ‘sparcc’))
write.table(data.frame(adj_matrix, check.names = FALSE), ‘network.adj_matrix.txt’, col.names = NA, sep = ‘\t’, quote = FALSE)

#graphml 格式，可使用 gephi 软件打开并进行可视化编辑
write.graph(g, ‘network.graphml’, format = ‘graphml’)

#gml 格式，可使用 cytoscape 软件打开并进行可视化编辑
write.graph(g, ‘network.gml’, format = ‘gml’)

#边列表，也可以直接导入至 gephi 或 cytoscape 等网络可视化软件中进行编辑
edge <- data.frame(as_edgelist(g))

edge_list <- data.frame(
source = edge[[1]],
target = edge[[2]],
weight = E(g) $w e i g h t, s p a r c c = E (g)$ sparcc
)
head(edge_list)

write.table(edge_list, ‘network.edge_list.txt’, sep = ‘\t’, row.names = FALSE, quote = FALSE, quote = FALSE))

#节点属性列表，对应边列表，记录节点属性，例如
node_list <- data.frame(
nodes_id = V(g)$name, #节点名称
degree = degree(g) #节点度
)
head(node_list)

write.table(node_list, ‘network.node_list.txt’, sep = ‘\t’, row.names = FALSE
python SparCC.py home/Jiaxin/Sparcc/lx1/BAC_OTU_table_more_than_0.05_percent.txt -i 20 --cor_file= home/Jiaxin/Sparcc/lx1/cor_mat_sparcc.out
python MakeBootstraps.py home/Jiaxin/Sparcc/lx1/BAC_OTU_table_more_than_0.05_percent.txt -n 500 -t home/Jiaxin/Sparcc/lx1/Resampling/permutation_#.txt -p example/pvals/
python SparCC.py /home/Jiaxin/Sparcc/lx1/Resampling/Resamplingpermutation_1.txt -i 20 -a SparCC --cor_file=/home/Jiaxin/Sparcc/lx1/sim_cor_.txt
mkdir Bootstraps
for i in seq 0 499
do
python SparCC.py /home/Jiaxin/Sparcc/lx1/Resampling/Resamplingpermutation_ $KaTeX parse error: Expected group after '_' at position 76: \dotststraps/sim_cor_̲$ i.txt
done
python PseudoPvals.py home/Jiaxin/Sparcc/lx1/cor_mat_sparcc.out home/Jiaxin/Sparcc/lx1/Bootstraps/sim_cor_#.txt 20 -o /home/Jiaxin/Sparcc/lx1/pvals_onesided.txt -t one_sided
python PseudoPvals.py home/Jiaxin/Sparcc/lx/cor_mat_sparcc.out home/Jiaxin/Sparcc/lx1/Bootstraps /sim_cor_#.txt 20 -o /home/Jiaxin/Sparcc/lx1/pvals_twosided.txt -t two_sided

示例：python SparCC.py example/fake_data.txt -i 20 --cor_file = example/basis_corr/cor_mat_sparcc.out
python MakeBootstraps.py example/fake_data.txt -n 5 -t permutation_＃.txt -p example/pvals/
python PseudoPvals.py example/basis_corr/cor_sparcc.out example/pvals/perm_cor_#.txt 5 -o pvals.txt -t one_sided
cor_to_network_from_Simon.py [-h] -i INPUT_COR [INPUT_COR …]
[-o [OUTPUT_DIR]] [–full]
[–cor_cutoff [COR_CUTOFF]]
[–pval [PVAL_FILE]]
[–pval_cutoff [PVAL_CUTOFF]]

#计算相关系数 Compute correlations
python /mnt/bai/public/bin/sparcc/SparCC.py arctic_soils_filtered.txt -i 10 --cor_file=arctic_soils_sparcc.txt > sparcc.log # -i 是迭代次数并求平均

#重采样 Compute bootstraps
python /mnt/bai/public/bin/sparcc/MakeBootstraps.py arctic_soils_filtered.txt -n 100 -t boot_#.txt -p Resamplings/ #-n指bootstraps次数

#对采样计算相关系数，方法默认SparCC，支持pearson, spearman和kendall
mkdir Bootstraps
for i in seq 0 99
do
python /mnt/bai/public/bin/sparcc/SparCC.py Resamplings/boot_ $KaTeX parse error: Expected group after '_' at position 52: \dotststraps/sim_cor_̲$ i.txt >> sparcc.log
done

#计算Pvalue Compute p-values
python /mnt/bai/public/bin/sparcc/PseudoPvals.py arctic_soils_sparcc.txt Bootstraps/sim_cor_#.txt 10 -o sparcc_pvals_two_sided.txt -t ‘two_sided’ >> sparcc.log

python PseudoPvals.py home/Jiaxin/Sparcc/lx/cor_sparcc.out home/Jiaxin/Sparcc/lx/Bootstraps/sim_cor_#.txt 5 -o pvals.txt -t one_sided

cd /opt/SparCC
python SparCC.py -h
python /opt/SparCC/SparCC.py -h
python SparCC.py example/fake_data.txt -i 20 --cor_file=example/basis_corr/cor_mat_sparcc.out
Options:
-h, --help show this help message and exit
-c COR_FILE, --cor_file=COR_FILE
File to which correlation matrix will be written.
-v COV_FILE, --cov_file=COV_FILE
File to which covariance matrix will be written.
-a ALGO, --algo=ALGO Name of algorithm used to compute correlations (SparCC
(default) | pearson | spearman | kendall)
-i ITER, --iter=ITER Number of inference iterations to average over (20
default).
-x XITER, --xiter=XITER
Number of exclusion iterations to remove strongly
correlated pairs (10 default).
-t TH, --thershold=TH
Correlation strength exclusion threshold (0.1
default).
conda remove -n name_of_my_env -all #删除conda创建的name_of_my_env环境
python MakeBootstraps.py -h
Usage: Make n simulated datasets used to get pseudo p-values.
Simulated datasets are generated by assigning each OTU in each sample an abundance that is randomly drawn (w. replacement) from the abundances of the OTU in all samples.
Simulated datasets are either written out as txt files.

Usage: python MakeBootstraps.py counts_file [options]
Example: python MakeBootstraps.py example/fake_data.txt -n 5 -t permutation_#.txt -p example/pvals/

Options:
-h, --help show this help message and exit
-n N Number of simulated datasets to create (100 default).
-t PERM_TEMPLATE, --template=PERM_TEMPLATE
The template for the permuted data file names. Should
not include the path, which is specified using the -p
option. The iteration number is indicated with a “#”.
For example: 'permuted/counts.permuted_#.txt’If not
provided a ‘.permuted_#.txt’ suffix will be added to
the counts file name.
-p OUTPATH, --path=OUTPATH
The path to which permuted data will be written. If
not provided files will be written to the cwd.
python PseudoPvals.py -h
Usage: Compute pseudo p-vals from a set correlations obtained from permuted data.
Pseudo p-vals are the percentage of times a correlation at least as extreme as the “real” one was observed in simulated datasets.
p-values can be either two-sided (considering only the correlation magnitude) or one-sided (accounting for the sign of correlations).
Files containing the permuted correlations should be named with a consistent template, where only the iteration number changes.
The permutation naming template is the second input argument with the iteration number replaced with a “#” character.
The template cannot contain additional “#” characters.
The total number of simulated sets is the third.

Usage: python PseudoPvals.py real_cor_file perm_template num_simulations [options]
Example: python PseudoPvals.py example/basis_corr/cor_sparcc.out example/pvals/perm_cor_#.txt 5 -o pvals.txt -t one_sided

Options:
-h, --help show this help message and exit
-t TYPE, --type=TYPE Type of p-values to computed. oned-sided | two-sided
(default).
-o OUTFILE, --outfile=OUTFILE
Name of file to which p-values will be written.
tail readme.rst

Now that we have all the correlations computed from the shuffled datasets, we’re ready to get the pseudo p-values.
Remember to make sure all the correlation files are in the same folder, are numbered sequentially, and have a ‘.txt’ extension.
The following will compute both one and two sided p-values.
::

python PseudoPvals.py example/basis_corr/cor_sparcc.out example/pvals/perm_cor_#.txt 5 -o example/pvals/pvals.one_sided.txt -t one_sided
python PseudoPvals.py example/basis_corr/cor_sparcc.out example/pvals/perm_cor_#.txt 5 -o example/pvals/pvals.one_sided.txt -t two_sided

python cor_to_network_from_Simon.py -h
usage: cor_to_network_from_Simon.py [-h] -i INPUT_COR [INPUT_COR …]
[-o [OUTPUT_DIR]] [–full]
[–cor_cutoff [COR_CUTOFF]]
[–pval [PVAL_FILE]]
[–pval_cutoff [PVAL_CUTOFF]]

Retrives read-taxonomy hits from the parsed (blast/last) result files
..(blast/last)out.parsed.txt files, e.g.
my_sample.refseq.lastout.parsed.txt, files from the /blast_results/
directory and formats them as .csv files for import into MEGAN. Requires that
the database keep the taxonomy in within square braces, e.g. [E. coli K12].

optional arguments:
-h, --help show this help message and exit
-i INPUT_COR [INPUT_COR …]
input correlation matrix is csv format
-o [OUTPUT_DIR] result directory where results are placed
–full flag to calculate the full matrix
–cor_cutoff [COR_CUTOFF]
absolute correlation cuttoff (default |0.3|)
–pval [PVAL_FILE] corresponding matrix of pvalues
–pval_cutoff [PVAL_CUTOFF]
pval cuttoff (default <0.05)
python /opt/SparCC/SparCC.py -h
用法：python SparCC.py counts_file [选项]
示例：python SparCC.py example/fake_data.txt -i 20 --cor_file = example/basis_corr/cor_mat_sparcc.out
选项：
-h，–help显示此帮助消息并退出
-c COR_FILE，–cor_file = COR_FILE
相关矩阵将被写入的文件。
-v COV_FILE，–cov_file = COV_FILE
协方差矩阵将写入的文件。
-a ALGO，–algo = ALGO用于计算相关性的算法的名称（SparCC
（默认）|皮尔逊|长矛手|肯德尔）
-i ITER，–iter = ITER平均（20以上）的推理迭代次数
默认）。
-x XITER，–xiter = XITER
要强力去除的排除迭代次数
相关对（默认10个）。
-t TH，–thershold = TH
相关强度排除阈值（0.1
默认）。
conda remove -n name_of_my_env -all＃删除conda创建的name_of_my_env环境
python MakeBootstraps.py -h
用法：制作n个用于获取伪p值的模拟数据集。
通过为每个样本中的每个OTU分配从所有样本中OTU的丰度中随机抽取（替换）的丰度来生成模拟数据集。
模拟的数据集要么作为txt文件写出。

用法：python MakeBootstraps.py counts_file [选项]
示例：python MakeBootstraps.py example/fake_data.txt -n 5 -t permutation_＃.txt -p example/pvals/
选项：
-h，–help显示此帮助消息并退出
-n N要创建的模拟数据集的数量（默认为100）。
-t PERM_TEMPLATE，–template = PERM_TEMPLATE
排列的数据文件名的模板。应该
不包括使用-p指定的路径
选项。迭代编号用“＃”表示。
例如：‘permuted / counts.permuted _＃.txt’如果不是
提供一个’.permuted _＃.txt’后缀
计数文件名。
-p OUTPATH，–path = OUTPATH
置换数据将写入的路径。如果
未提供的文件将被写入cwd。
python PseudoPvals.py -h
用法：根据从置换数据中获得的一组相关性计算伪p值。
伪p值是在模拟数据集中观察到的相关性至少与“真实”相关性极端一样大的次数的百分比。
p值可以是两面的（仅考虑相关幅度），也可以是单面的（考虑相关的符号）。
包含置换相关性的文件应使用一致的模板命名，其中仅迭代编号会发生变化。
排列命名模板是第二个输入参数，其中迭代编号用“＃”字符替换。
模板不能包含其他“＃”字符。
模拟集的总数是第三。

用法：python PseudoPvals.py real_cor_file perm_template num_simulations [选项]
示例：python PseudoPvals.py example/basis_corr/cor_sparcc.out example/pvals/perm_cor_#.txt 5 -o pvals.txt -t one_side

选项：
-h，–help显示此帮助消息并退出
-t TYPE，–type = TYPE要计算的p值的类型。双面双面的
（默认）。
-o OUTFILE，–outfile = OUTFILE
p值将被写入的文件的名称。

jxxxxxxxxx

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
SparCC

SparCCSparCC is a network inference tool that was specifically designed to be robust to data compositionality. The method is described in PLoS Comp 8(9): e1002687.Step 1 - Compute correlationsSparCC is a python program that runs on command line. Please
复制链接

扫一扫

SparCC

SparCC

“相关推荐”对你有帮助么？