各种平台的表达芯片跟mRNA-seq数据比较
文章见:http://journals.plos.org/plosone ... ournal.pone.0078644指定的细胞系是:Human CCR6+ CD4 memory T cell ,测了6个时间点,共12个样本表达芯片用的是Affymetrix GeneChip HT HG-U13...
文章见:http://journals.plos.org/plosone ... ournal.pone.0078644 指定的细胞系是:Human CCR6+ CD4 memory T cell ,测了6个时间点,共12个样本 表达芯片用的是Affymetrix GeneChip HT HG-U133+ PM arrays 测序用的是: Illumina HiSeq™ 2000 platform,PE,All reads were pair-end sequenced with an average insert size of 160 bp, and typical read-length of 90 bp.
芯片情况介绍:41,796 of the 54,714 probe sets were mapped to 20,741 genes, with 10,837 genes having more than one representative probe set.
比较前先把RPKM值和芯片数值归一化:
In summary, RNA-Seq based transcriptome expression was measured as RPKM for 36,004 transcripts, representing 22,300 unique genes. The median RPKM in all 12 samples was 0.49, and 28.6% to 32.5% (average = 30.3%) of genes had RPKM value of 0 in each sample. In order to make the transcriptome profiling comparable between both platforms (RNA-Seq vs. Microarray), the RPKM values were floored at 0.047, followed by log2 transformation. After the transformation, the difference between the median expression and the floored (minimal) expression by RNA-Seq is equal to the difference between the median expression and the minimal expression by microarray.
文章很有趣,值的细看 RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays http://genome.cshlp.org/content/18/9/1509.full
Another paper with a variety of comparisons between Affymetrix Exon arrays, custom NimbleGen arrays, and RNA-seq: Griffith, et al. Alternative expression analysis by RNA sequencing. Nature Methods. 2010 Oct;7(10):843-847. http://www.nature.com/nmeth/journal/v7/n10/full/nmeth.1503.html 尤其是这个correlation图,非常重要~~~~ https://www.researchgate.net/fig ... or-RNA-seq-the-LOG2 第一次看到把图片描述的比文章还长!~~~~~~~、 文章是:https://genomebiology.biomedcent ... 6/s13059-015-0694-1 这次是临床样本,498个primary neuroblastomas 芯片是:customized 4x44k oligonucleotide microarrays (Agilent Technologies) 测序是:Illumina HiSeq 2000 platform,TruSeq PE cluster Kit v3 数据都可以在NCBI里面拿到; Microarray and RNA-seq data can be accessed from the GEO database (www.ncbi.nlm.nih.gov/geo/) with accession numbers GSE49710 and GSE49711, respectively, which are included in SEQC Project SuperSeries GSE47792. |