计算机小白学生物信息的难度,基因表达数据分析，对于还是分子生物学小白的我来说太难了。。。...-CSDN博客

The raw gene expression data were extracted using NimbleScan software v 2.4.

原始基因表达数据被提取使用Ni..软件。

(首先有实验样本和对照样本，然后这两个样本，分别提取mRNA，然后逆转录合成cDNA,在这个过程中，用CY3CY5荧光染料分别标记两个DNA样本，然后将这两个DNA样本共同杂交到芯片上，用芯片扫描仪在532nm扫描CY3，在635nm扫描CY5，得到结果，这个芯片扫描仪叫 NimbleScan，它扫描完以后自动处理的软件也叫这个名字 )

The raw data of each gene were presented as the average signal intensity of 21–27 probes.

每个基因的原始数据被表示为21-27探针的平均信号强度。(就是芯片上有几个探针测荧光信号)

Then, the raw data in all the arrays were normalized to the medium value.The t -test approach in the CyberT program( Baldi and Long 2001 , Hatfi eld et al. 2003 ) was used to determine whether the difference in signal intensity between stressed and control samples was signifi cant ( P < 0.01).If the log2 ratio of the signal of WD/control for a particular gene was + 1 or more or −1-fold or less while the P -value in the t test was <0.01, the gene was classifi ed as up- or downregulated,respectively.

然后，所有阵列中的原始数据被标准化为中间值。CyberT 程序中的T测试方法被用于确定实验和对照样本中的信号强度的差别是否是显著的。当上面的T测试的P值是<0.01，如果一个特定基因的实验组和对照组的信号的log2比是+1或更多或-1或更少。那么这个基因就被分类到上调表达或下调表达。(用软件进行数据提取之后，首先要校正，然后先算CY3和CY5的比值，这个比值就是该基因在实验组中的表达水平。因为这个比值是比较小的，那么用LOG2算一下，它的数值就会变大一点，变的更明显一点。所以这个比值的正负，就是实验组相对于对照组来说，这个基因表达量多了，或者少了)

The reproducibility of the microarray data was presented with the volcano plots produced using Excel software. The categorization of gene expression was performed following hierarchical clustering and K-means clustering methods based on the MeV (Multi Experiment Viewer) software ( Saeed et al. 2003 ).

基因芯片数据的重复性用Excel软件生成的火山图展现出来。基因表达分类是使用等级聚类和K-均值的聚类方法得到的，这个方法基于MeV软件.(这个等级聚类你可以理解为，相似度高的分为一类，相似度低的分为一类)