R包estimate评估肿瘤组织中基质及免疫细胞浸润水平

根据表达数据,ESTIMATE为研究人员提供了肿瘤纯度、存在的基质细胞水平和肿瘤组织中免疫细胞浸润水平的分数。https://bioinformatics.mdanderson.org/estimate/index.html​​​​​

1.  estimate安装

install.packages("estimate")

rforge <- "http://r-forge.r-project.org"
install.packages("estimate", repos=rforge, dependencies=TRUE)
library(estimate)
ls("package:estimate") 

in.file <- system.file("extdata", "sample_input.txt", package="estimate")
out.file <- tempfile(pattern="estimate", fileext=".gct")

2. estimate分析

#进行estimate分析
setwd("/Users/test")
filterCommonGenes(in.file, 
                  output.f=out.file,
                  id="GeneSymbol")

estimateScore(input.ds = out.file,
              output.ds="OV_estimate_score.gct", 
              platform="affymetrix")
# platform = c("affymetrix", "agilent", "illumina"))

plotPurity(scores="OV_estimate_score.gct", 
           platform="affymetrix",
           output.dir="estimated_purity_plots")
#生成plot文件

3. 累计分布曲线作图

scores=read.table("OV_estimate_score.gct",skip = 2,header = T)
rownames(scores)=scores[,1]
scores=t(scores[,3:ncol(scores)])
View(scores)
scores<- as.data.frame(scores)
scores$SampleID <- rownames(scores)

save(scores,file = 'BRCA_estimate_score.rdata') 
# m_score <- get(load('BRCA_estimate_score.rdata'))
class(scores)

# median(scores[,'ImmuneScore'])

## 画单条累计分布曲线
library(ggplot2)
p1<-ggplot(scores,aes(x=ImmuneScore)) +
      stat_ecdf(color = "green") +
      labs(y="accumulative propotion")

p2<-ggplot(scores,aes(x=StromalScore)) +
      stat_ecdf(color = "red") +
      labs(y="accumulative propotion")


## 画多条累计分布曲线
# 要先把数据框转化成长数据格式
library(reshape2)
scores2 <- scores[,-4] # 去掉不需要的列
dim(scores2)
long = melt(scores2, id=c("SampleID"),
            variable.name= 'Class', value.name = 'Value')
# 创建新的变量,含有分组信息
long$group <- rep(c(rep("g1",6),rep("g2",4)),3)
library(dplyr)
long<- mutate(long,group_score = paste(Class,group,sep="_"))

## 长格式转成宽格式数据框
#wide <- dcast(long,SampleID~Class,value.var='Value')

#作多条累计分布曲线,以不同颜色区分。
ggplot(long,aes(x=Value,color =Class,linetype=group)) +
  stat_ecdf(size =1) + # 线粗细
  labs(x="Score",
       y="Accumulative propotion",
       title="Accumulative Plotting") +
  theme(legend.position = c(0.15,0.7), #图例位置 ("none", "left", "right", "bottom", "top", or two-element numeric vector)
        legend.background =element_rect(fill = "white", colour = "grey50"),#图例背景
        panel.background=element_rect(fill = "grey95", # grey90
                                      colour = "black",
                                      size = 1), #画布背景颜色
        plot.title = element_text(hjust=0.5,size=16,vjust=0.5), #标题位置
        legend.text=element_text(size=10,colour='black'), #图例文字
        axis.text=element_text(size=8,colour="black"), #坐标轴文字
        axis.title.y = element_text(size = rel(1.3), angle = 90),#y坐标轴名称文字
        axis.title.x = element_text(size = rel(1.3)),#x坐标轴名称文字
        )

  • 1
    点赞
  • 34
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
由于estimate是一个非常常用的R,因此其源码可以在CRAN上找到:https://cran.r-project.org/src/contrib/estimate_1.0-0.tar.gz 该源码含了R代码和C代码,其R代码位于R文件夹下,C代码位于src文件夹下。以下是estimate的核心函数的R代码: ```R #' Estimate parameters of a distribution #' #' Fits the parameters of a distribution to a given data set using maximum #' likelihood estimation. The package estimates the parameters of the normal, #' log-normal, Weibull, and gamma distributions. In addition, it provides a #' generic function for estimating the parameters of any continuous distribution #' available in R. The package can handle right censored and interval censored #' data. The parameters of the distributions are estimated by numerical #' optimization using the \code{\link{nloptr}} package. #' #' @param data a numeric vector of data values. #' @param dist the name of the distribution to be estimated. The package #' currently supports the normal, log-normal, Weibull, and gamma distributions. #' @param control a list of control parameters for the optimization algorithm. #' See the documentation of \code{\link{nloptr}} for details. #' @param censor a vector of censoring values. If \code{NULL}, the data are #' assumed to be uncensored. If \code{censor} is a numeric vector of the same #' length as \code{data}, the values in \code{data} are assumed to be right #' censored with the corresponding values in \code{censor} as the censoring #' points. If \code{censor} is a matrix of two columns and the same number of #' rows as \code{data}, the values in \code{data} are assumed to be interval #' censored with the corresponding rows in \code{censor} as the lower and upper #' censoring points, respectively. #' @param weights a vector of weights for the data values. If \code{NULL}, all #' data values are assumed to have equal weight. Otherwise, the length of #' \code{weights} must be equal to the length of \code{data}. #' @param start a vector of starting values for the optimization algorithm. #' If \code{NULL}, reasonable starting values are computed automatically. #' @param lower a vector of lower bounds for the parameter values. If \code{NULL}, #' the parameter values are not constrained from below. #' @param upper a vector of upper bounds for the parameter values. If \code{NULL}, #' the parameter values are not constrained from above. #' #' @return An object of class \code{estimate} containing the following components: #' \item{estimate}{a named vector of estimated parameter values.} #' \item{vcov}{the estimated variance-covariance matrix of the parameter #' estimates.} #' \item{loglik}{the log-likelihood of the estimated parameters.} #' \item{converged}{a logical value indicating whether the optimization #' algorithm converged.} #' \item{message}{a character string containing a message from the optimization #' algorithm.} #' \item{data}{the input data.} #' \item{dist}{the name of the distribution estimated.} #' \item{censor}{the censoring information.} #' \item{weights}{the data weights.} #' \item{control}{the optimization control parameters.} #' \item{start}{the starting values for the optimization algorithm.} #' \item{lower}{the lower bounds for the parameter values.} #' \item{upper}{the upper bounds for the parameter values.} #' \item{nobs}{the number of observations in the data set.} #' \item{npar}{the number of estimated parameters.} #' #' @seealso \code{\link{nloptr}} #' @examples #' # Generate some data from a Weibull distribution #' set.seed(123) #' x <- rweibull(100, 2, 2) #' #' # Estimate the parameters of the Weibull distribution #' fit <- estimate(x, "weibull") #' fit #' plot(fit) #' #' # Estimate the parameters of the log-normal distribution with interval censored data #' y <- exp(rnorm(100, 1, 0.5)) #' c <- matrix(c(0.9, 1.1), nrow = 100, ncol = 2) #' fit <- estimate(y, "lnorm", censor = c) #' fit #' plot(fit) #' #' @export estimate <- function(data, dist, control = list(), censor = NULL, weights = NULL, start = NULL, lower = NULL, upper = NULL) { # Check input arguments if (!is.numeric(data) || is.na(sum(data))) stop("Invalid 'data' argument") if (!is.null(censor) && !is.numeric(censor)) stop("Invalid 'censor' argument") if (!is.null(censor) && (nrow(censor) != length(data) && ncol(censor) != 2)) { stop("Invalid 'censor' argument") } if (!is.null(weights) && length(weights) != length(data)) { stop("Invalid 'weights' argument") } if (!is.null(start) && length(start) != npar(dist)) { stop("Invalid 'start' argument") } if (!is.null(lower) && length(lower) != npar(dist)) { stop("Invalid 'lower' argument") } if (!is.null(upper) && length(upper) != npar(dist)) { stop("Invalid 'upper' argument") } # Compute starting values if not given if (is.null(start)) start <- start.default(data, dist) # Compute lower bounds if not given if (is.null(lower)) lower <- rep(-Inf, npar(dist)) # Compute upper bounds if not given if (is.null(upper)) upper <- rep(Inf, npar(dist)) # Set optimization options opt.control <- set.control(control) # Determine the appropriate log-likelihood function if (is.null(censor)) { if (is.null(weights)) { loglik <- get.loglik(dist) gradient <- get.gradient(dist) } else { loglik <- get.weighted.loglik(dist) gradient <- get.weighted.gradient(dist) } } else { if (is.null(weights)) { loglik <- get.censored.loglik(dist) gradient <- get.censored.gradient(dist) } else { loglik <- get.weighted.censored.loglik(dist) gradient <- get.weighted.censored.gradient(dist) } } # Perform optimization opt.result <- nloptr(x0 = start, eval_f = loglik, eval_grad_f = gradient, lb = lower, ub = upper, opts = opt.control) # Check convergence converged <- (opt.result$status == 0) # Compute variance-covariance matrix vcov <- vcov.default(opt.result, gradient, loglik, data, dist, censor, weights) # Return results list(estimate = opt.result$solution, vcov = vcov, loglik = -opt.result$objective, converged = converged, message = opt.result$message, data = data, dist = dist, censor = censor, weights = weights, control = opt.control, start = start, lower = lower, upper = upper, nobs = length(data), npar = npar(dist)) } ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值