FC and CPM and dispersion and P-value

The logCPM values can optionally be converted to RPKM or FPKM by subtracting log2 of gene
length, see rpkm(). The Arabidopsis case study of Section 4.6 gives two examples of this
in conjunction with MDS plots, one example making a plot from the log-counts-per-million
and another making a plot of shrunk log-fold-changes.


where d is the normalized DGEList object. This produces a matrix of log2 counts-per-million
(logCPM), with undefined values avoided and the poorly defined log-fold-changes for low
counts shrunk towards zero. Larger values for prior.count produce more shrinkage.

Fold change is a measure describing how much a quantity changes going from an initial to a final value.

For example, an initial value of 30 and a final value of 60 corresponds to a fold change of 2, or in common terms, a two-fold increase. Fold change is calculated simply as the ratio of the final value to the initial value, i.e. if the initial value is A and final value is B, the fold change is B/A. As another example, a change from 80 to 20 would be a fold change of 0.25, while a change from 20 to 80 would be a fold change of 4. Some practitioners replace a fold-change value that is less than 1 by the negative of its inverse[citation needed], e.g. a change from 80 to 20 would be a fold change of −4 (or in common terms, a four-fold decrease).

A benefit of expressing a change as the ratio between an initial value and a final value – a fold change – is that the change itself is emphasized rather than the absolute values. For example, an absolute change of 100 is significant for an experiment with only 200 samples but negligible for an experiment with over a million samples. This property makes the fold change suitable for statistical tests that need to normalize data to eliminate systematic error. The distributional fold change test is based upon this idea.

Fold change is often used in analysis of gene expression data in microarray and RNA-Seq experiments, for measuring change in the expression level of a gene.[1] A disadvantage to and serious risk of using fold change in this setting is that it is biased [2] and may miss differentially expressed genes with large differences (B-A) but small ratios (A/B), leading to a high miss rate at high intensities.

 

转 

http://www.cnblogs.com/Acceptyly/p/4159230.html

英文简称 : FC

 

中文全称 : 倍性变化

所属分类 : 生物科学

词条简介 : 一种用于描述两个用于相比的对象数量差异的方法。例如,第一个样本和第二个样本的量是50/10,那么FC(Ratio)就是5,反之就是0.2。

用这种方法分析微阵列的数据可以说明:

1)从基因表达的绝对值而来的表达变化是有意义的;

2)这种方法可以说明基因表达变化是否显著;

3)可以利用这种模型用于有效数据的筛选。

 

CPM counts per million 

 Inputing RNA-seq counts to clustering or heatmap routines designed for microarray data 

is not straight-forward, and the best way to do this is still a matter of research. To draw a

heatmap of individual RNA-seq samples, we suggest using moderated log-counts-per-million.

The can be calculated by cpm with positive values for prior.count, for example

> y <- cpm(d, prior.count=2, log=TRUE)

 

where d is the normalized DGEList object. This produces a matrix of log2 counts-per-million

(logCPM), with undefined values avoided and the poorly defined log-fold-changes for low
counts shrunk towards zero. Larger values for prior.count produce more shrinkage. The
logCPM values can optionally be converted to RPKM or FPKM by subtracting log2 of gene
length, see rpkm().
The Arabidopsis case study of Section 4.6 gives two examples of this
in conjunction with MDS plots and another making a plot of shrunk log-fold-changes., one example making a plot from the log-counts-per-million

statistical dispersion[dɪ'spɜː(r)ʃ(ə)n](离差)

In statisticsdispersion (also called variabilityscatter, or spread) denotes how stretched or squeezed[1] a distribution (theoretical or that underlying a statistical sample) is. Common examples of measures of statistical dispersion are the variance, standard deviation and interquartile range.

Dispersion is contrasted with location or central tendency, and together they are the most used properties of distributions.

A measure of statistical dispersion is a nonnegative real number that is zero if all the data are the same and increases as the data become more diverse.

The square root of the common dispersion gives the coefficient of variation of biological
variation (BCV)


p-value 

is a function of the observed sample results (a statistic) that is used for testing a statistical hypothesis. More specifically, the p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, assuming that the hypothesis under consideration is true.[1][2] Here, "more extreme" is dependent on the way the hypothesis is tested. Before the test is performed, a threshold value is chosen, called the significance level of the test, traditionally 5% or 1% [3] and denoted as α.

In statistics, the p-value

 

false discovery rate
(FDR)


 

转载于:https://www.cnblogs.com/AveryCh/articles/4954824.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值