Phylogenetic tree inference_01

最新推荐文章于 2021-08-31 21:03:03 发布

TIME_@

最新推荐文章于 2021-08-31 21:03:03 发布

阅读量372

点赞数

分类专栏：生物信息

本文链接：https://blog.csdn.net/geekfocus/article/details/105421798

版权

本文探讨了在R的phangorn包中如何进行系统发育树的推断，特别是使用UPGMA层次聚类和parsimony ratchet方法。内容涉及最大简约法（MP）、最大似然法（ML），以及Parsimony Ratchet的运行步骤，强调了该方法如何有效搜索树空间，提高搜索效率。在实际应用中，尽管在处理二进制数据时，pscore不再变化，但 Parsimony Ratchet 能更好地处理大型数据集。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

phangorn包

Introduction to phylogenies in R

acctran ------- Parsimony tree

acctran(tree, data)
#tree --> tree to start the nni search from.
#data --> A object of class phyDat containing sequences.

fitch(tree, data, site = "pscore")
#site --> return either ’pscore’ or ’site’ wise parsimony scores.

random.addition(data, method = "fitch")#random.addition can be used to produce starting trees.
#method --> one of ’fitch’ or ’sankoff’.

parsimony(tree, data, method = "fitch", ...)#parsimony returns the parsimony score of a tree using either the sankoff or the fitch algorithm.

sankoff(tree, data, cost = NULL, site = "pscore")
#cost --> A cost matrix for the transitions between two states.

optim.parsimony(tree, data, method = "fitch", cost = NULL, trace = 1, rearrangements = "SPR", ...)#optim.parsimony tries to find the maximum parsimony tree using either Nearest Neighbor Inter- change (NNI) rearrangements or sub tree pruning and regrafting (SPR).
#trace --> defines how much information is printed during optimisation.
#rearrangements --> SPR or NNI rearrangements.

pratchet(data, start = NULL, method = "fitch", maxit = 1000, minit = 10, k = 10, trace = 1, all = FALSE,rearrangements = "SPR", perturbation = "ratchet", ...)#pratchet implements the parsimony ratchet (Nixon, 1999) and is the preferred way to search for the best tree
#start --> a starting tree can be supplied.
#maxit --> maximum number of iterations in the ratchet.
#minit --> minimum number of iterations in the ratchet.
#k --> number of rounds ratchet is stopped, when there is no improvement.
#all --> return all equally good trees or just one of them.
#perturbation --> whether to use “ratchet”, “random_addition” or “stochastic” (nni) for shuffling the tree.

The “SPR” rearrangements are so far only available for the “fitch” method, “sankoff” only uses “NNI”. The “fitch” algorithm only works correct for binary trees.
parsimony returns the maximum parsimony score (pscore). optim.parsimony returns a tree after NNI rearrangements. pratchet returns a tree or list of trees containing the best tree(s) found during the search. acctran returns a tree with edge length according to the ACCTRAN criterion.

#binary data create
mylist <- list( Primary=c(),...)
#data conversion to phyDat
Regions <- phyDat(mylist, type = "USER", levels = c(0,1), return.index = TRUE)
#Pairwise Distances From Sequences
#dist.hamming, dist.ml and dist.logDet compute pairwise distances for an object of class phyDat. dist.ml uses DNA / AA sequences to compute distances under different substitution models.
dm <- dist.hamming(Regions)
#upgma产生无根，直接产生的UPGMAtree已经大致聚类好了，只是后面edge长度会变化
UPGMAtree <- upgma(dm)

#parsimony返回树的似然值，可以选择sankoff或者fitch算法，是目前用来描述树的最小变化数目,此处了解UPGMAtree的pscore是多少
parsimony(UPGMAtree, Regions, method = "fitch")

#random.addition用来产生starting trees，可以选择sankoff或者fitch算法。在实践中使用UPGMAtree做起始树
#treeRA <- random.addition(Regions)

疑问：pscore得到为169，测试数据中区分5组数据的只有160行，在upgma cluster后，使用SPR或者NNI，pscore都不再发生变化，是因为binary 数据很容易找到最优树吗？

parsimony(UPGMAtree, Regions, method = “fitch”)
[1] 169
treeSPR <- optim.parsimony(UPGMAtree, Regions, method = “fitch”, rearrangements = “SPR”)
Final p-score 169 after 0 nni operations
treeNNI <- optim.parsimony(UPGMAtree, Regions, method = “fitch”, rearrangements = “NNI”)
Final p-score 169 after 0 nni operations
treeNNI <- optim.parsimony(UPGMAtree, Regions)
Final p-score 169 after 0 nni operations

#optim.parsimony试图寻找最大似然树用NNI rearrangements或者SPR.SPR只适用于fitch,测试中不用执行，因为执行pscore也没有优化
#treeSPR <- optim.parsimony(UPGMAtree, Regions, method = "fitch", rearrangements = "SPR")

#pratchet使用parsimony ratchet搜索最优树,测试数据也没有得到更优化的pscore。但是必须执行此步骤
treeRatchet <- pratchet(Regions, start=UPGMAtree, maxit=200,minit=10, k=10, trace=1, rearrangements = "SPR", perturbation =

最低0.47元/天解锁文章