【生信进阶练习1000days】day2-学习summarized experimental data与Down stream analysis

最新推荐文章于 2023-05-26 15:12:18 发布

Candle_light

最新推荐文章于 2023-05-26 15:12:18 发布

阅读量1.4k

点赞数 1

分类专栏： 2.3 生信进阶练习1000days

本文链接：https://blog.csdn.net/Candle_light/article/details/90443284

版权

2.3 生信进阶练习1000days 专栏收录该内容

14 篇文章 30 订阅

订阅专栏

学习章节

https://bioconductor.github.io/BiocWorkshops/r-and-bioconductor-for-everyone-an-introduction.html#working-with-summarized-experimental-data

文章目录

1. Working with summarized experimental data

1.1 简介

本章主要学习SummatizedExperiment包和SummarizedExperiment对象
SummarizedExperiment对象具有类似于矩阵的性质，我们可以通过行和列，对它取子集。
来自于SummarizedExperiment对象实验的数据assay()，它的行代表我们感兴趣的特征(例如基因)，列代表每个样本，（矩阵中的每个值可能代表每个基因的在不同样本中的表达量)

1.2 构建SummarizedExperiment对象

数据介绍
包含有8个样本，数据由RNA-seq实验产生，主要是用于观察4个人的平滑肌细胞系对地塞米松治疗的情况
我们可以使用函数browseVignettes("airway")查看关于这个数据集和实验的详细描述

## input data
fname <- file.choose() # airway_colData.csv
fname
## set the first column of the data to be treated as row names(将第一列作为数据的row-names)
colData <- read.csv(fname, row.names = 1)
colData

这组数据来源于Short Read Archive，包含SampleName,Run,Experiment,Sampel,BioSample这些列，另外我们还需要添加以下的列：

Cell:所使用的细胞系，本数据使用了4个细胞系
dex:这个样本是否添加了地塞米松
albut:二次治疗，我们可以忽略
avgLength:本次实验中，每个样本的RNA-seq的reads的平均长度

1.3 Assay data

现在导入assay数据

## importing the assay data from the file “airway_counts.csv”
fname <- file.choose() # airway_counts.csv
fname

counts <- read.csv(fname, row.names=1)
## coerce data.frame() to matrix using as.matrix()
counts <- as.matrix(counts)
## We see the dimensions and first few rows of the counts matrix
dim(counts)
#> [1] 33469 8
head(counts)

数据解释

以基因ENSG00000000003为例，样本SRR1039508 有679 个reads，覆盖了它；样本SRR1039509 有448个reads覆盖了它。

1.4 Creating a SummarizedExperiment object

## Attach the SummarizedExperiment library to our R session
library("SummarizedExperiment")
## Use the SummarizedExperiment() function to coordinate the assay and column data
## 校准数据
se <- SummarizedExperiment(assay = counts, colData = colData)
se
## use subset() on SummarizedExperiment to create subsets of the data in a coordinated way
## 取出数据中的子集，注意由于SummarizedExperiment是个二维矩阵，所以我们对他的操作也是基于二维的
subset(se, , dex == "trt")
## use assay() to extract the count matrix, 
## colSums() to calculate the library size (total number of reads overlapping genes in each sample)
## colSums()计算每个样本中覆盖了所有基因的reads总数
colSums(assay(se))
## 
se$lib.size <- colSums(assay(se))
colData(se)

2. 下游分析 Down-stream analysis

使用R包DESeq2来进行下游分析

## Down-stream analysis
library("DESeq2")
## including cell line as a covariate, 
## and dexamethazone treatment as the main factor that we are interested in
## 构建dds数据集
dds <- DESeqDataSet(se, design = ~ cell + dex)
dds
## performs advanced statistical analysis on the data in the dds object
## 进行统计分析
dds <- DESeq(dds)
## A table summarizing measures of differential expression can be extracted from the object
## 使用results查看差异分析结果
results(dds)

Candle_light

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【生信进阶练习1000days】day2-学习summarized experimental data与Down stream analysis

学习章节https://bioconductor.github.io/BiocWorkshops/r-and-bioconductor-for-everyone-an-introduction.html#working-with-summarized-experimental-data文章目录学习章节1. Working with summarized experimental data1.1...
复制链接

扫一扫