数据
wget https://ndownloader.figshare.com/articles/3219685?private_link=1d788fd384d33e913a2a -O 3219685.zip
!ls -l 3219685/
总用量 2588
-rw-r--r-- 1 root root 1340161 12月 9 07:43 GSE60450_Lactation-GenewiseCounts.txt
-rw-r--r-- 1 root root 1253364 12月 9 07:43 mouse_c2_v5.rdata
-rw-r--r-- 1 root root 22483 12月 9 07:43 mouse_H_v5.rdata
-rw-r--r-- 1 root root 4362 12月 9 07:43 ResultsTable_small.txt
-rw-r--r-- 1 root root 733 12月 9 07:43 SampleInfo_Corrected.txt
-rw-r--r-- 1 root root 733 12月 9 07:43 SampleInfo.txt
-rw-r--r-- 1 root root 278 12月 9 07:43 small_counts.txt
数据来源:
EGF-mediated induction of Mcl-1 at the switch to lactation is essential for alveolar cellsurvival (Fu et al. 2015)
原始的测序数据:
Gene Expression Omnibus database (GEO) under accession number GSE60450
R基础
我们使用RStudio作为集成开发环境,包括桌面版和服务器版,根据需求自行下载安装
服务器版RStudio安装配置
下载:wget https://download2.rstudio.org/rstudio-server-rhel-1.1.456-x86_64.rpm
安装:yum install rstudio-server-rhel-1.1.456-x86_64.rpm
修改配置文件
vi /etc/rstudio/rserver.conf
# Server Configuration File
rsession-which-r=/home/sunchengquan/R-3.5.1/bin/R
www-port=8787
如果个人目录下有利用anaconda安装R,可能会报错。不能使用root安装
启动:rstudio-server start
可以在网页上登录使用Rstudio
R 包
使用Bioconductor安装
source("http://bioconductor.org/biocLite.R")
biocLite("limma")
读入数据
# Read the data into R
small_counts <- read.table("3219685/small_counts.txt", header = TRUE)
print(small_counts)
Sample_1 Sample_2 Sample_3 Sample_4
Xkr4 438 300 65 237
Sox17 106 182 82 105
Mrpl15 309 234 337 300
Lypla1 652 515 948 935
Tcea1 1604 1495 1721 1317
Rgs20 4 2 14 4
Atp6v1h 769 752 1062 987
Rb1cc1 1494 1412 1157 967
Pcmtd1 1344 1242 1374 1593
Rrs1 1691 1808 2127 1653
dim(small_counts)
- 10
- 4
操作数据框
取子集
$ notation with the column name
#取Sample_1的数据
small_counts$Sample_1
- 438
- 106
- 309
- 652
- 1604
- 4
- 769
- 1494
- 1344
- 1691
[row, column] notation with numeric indices
small_counts[, 1]
- 438
- 106
- 309
- 652
- 1604
- 4
- 769
- 1494
- 1344
- 1691
[row, column] notation using the column name (in a vector)
small_counts[, c("Sample_1")]
- 438
- 106
- 309
- 652
- 1604
- 4
- 769
- 1494
- 1344
- 1691
small_counts[1:3, c("Sample_1", "Sample_3")]
Sample_1 | Sample_3 | |
---|---|---|
Xkr4 | 438 | 65 |
Sox17 | 106 | 82 |
Mrpl15 | 309 | 337 |
除第一个样本外的所有样本
small_counts[1:3, -1]
Sample_2 | Sample_3 | Sample_4 | |
---|---|---|---|
Xkr4 | 300 | 65 | 237 |
Sox17 | 182 | 82 | 105 |
Mrpl15 | 234 | 337 | 300 |
向量化操作
small_counts$Sample_1 * 2
- 876
- 212
- 618
- 1304
- 3208
- 8
- 1538
- 2988
- 2688
- 3382
log(small_counts[1:3,])
Sample_1 | Sample_2 | Sample_3 | Sample_4 | |
---|---|---|---|---|
Xkr4 | 6.082219 | 5.703782 | 4.174387 | 5.468060 |
Sox17 | 4.663439 | 5.204007 | 4.406719 | 4.653960 |
Mrpl15 | 5.733341 | 5.455321 | 5.820083 | 5.703782 |
计算每个样本的counts的和
sum(small_counts$Sample_1)
8411
sum(small_counts$Sample_2)
7942
如果有很多样本,这样操作很麻烦的,所以使用apply
函数,比循环快
注意 MARGIN = 1 意思是按行计算,而MARGIN = 2 按列计算
sample_sums = apply(small_counts, MARGIN = 2, sum)
print(sample_sums)
Sample_1 Sample_2 Sample_3 Sample_4
8411 7942 8887 8098
可以省略MARGIN
sample_sums = apply(small_counts, 2, sum)
print(sample_sums)
Sample_1 Sample_2 Sample_3 Sample_4
8411 7942 8887 8098
数据类型
5 main types: doubles, integers, complex, logical and character.
typeof(3.14)
‘double’
typeof(1L)
‘integer’
typeof(1+1i)
‘complex’
typeof(TRUE)
‘logical’
typeof('banana')
‘character’
ResultsTable_small <- read.table("3219685/ResultsTable_small.txt", header=TRUE)
head(ResultsTable_small)
ENTREZID | SYMBOL | logFC | AveExpr | t | P.Value | adj.P.Val |
---|---|---|---|---|---|---|
24117 | Wif1 | 1.819943 | 2.975545 | 20.10780 | 1.063770e-10 | 1.01624e-06 |
381290 | Atp2b4 | -2.143885 | 3.944066 | -19.07495 | 1.982934e-10 | 1.01624e-06 |
78896 | 1500015O10Rik | 2.807548 | 3.036519 | 18.54773 | 2.758828e-10 | 1.01624e-06 |
226101 | Myof | -2.329744 | 6.223525 | -18.26861 | 3.297667e-10 | 1.01624e-06 |
16012 | Igfbp6 | -2.896115 | 1.978449 | -18.21525 | 3.413066e-10 | 1.01624e-06 |
231830 | Micall2 | 2.253400 | 4.760597 | 18.02627 | 3.858161e-10 | 1.01624e-06 |
str
查看ResultsTable_small的结构.
str(ResultsTable_small)
'data.frame': 40 obs. of 7 variables:
$ ENTREZID : int 24117 381290 78896 226101 16012 231830 16669 55987 231991 14620 ...
$ SYMBOL : Factor w/ 40 levels "1500015O10Rik",..: 40 3 1 26 20 23 21 8 9 16 ...
$ logFC : num 1.82 -2.14 2.81 -2.33 -2.9 ...
$ AveExpr : num 2.98 3.94 3.04 6.22 1.98 ...
$ t : num 20.1 -19.1 18.5 -18.3 -18.2 ...
$ P.Value : num 1.06e-10 1.98e-10 2.76e-10 3.30e-10 3.41e-10 ...
$ adj.P.Val: num 1.02e-06 1.02e-06 1.02e-06 1.02e-06 1.02e-06 ...
一个向量中包含多种的数据类型,会发生什么?
my_vector = c(1, "hello", TRUE)
print(my_vector)
[1] "1" "hello" "TRUE"
typeof(my_vector)
‘character’
R会自动转化同种数据类型,数据类型优先级:
logical -> integer -> numeric -> complex -> character.
my_vector = c(1,TRUE, TRUE)
print(my_vector)
typeof(my_vector)
[1] 1 1 1
‘double’
因子
str(ResultsTable_small)
'data.frame': 40 obs. of 7 variables:
$ ENTREZID : int 24117 381290 78896 226101 16012 231830 16669 55987 231991 14620 ...
$ SYMBOL : Factor w/ 40 levels "1500015O10Rik",..: 40 3 1 26 20 23 21 8 9 16 ...
$ logFC : num 1.82 -2.14 2.81 -2.33 -2.9 ...
$ AveExpr : num 2.98 3.94 3.04 6.22 1.98 ...
$ t : num 20.1 -19.1 18.5 -18.3 -18.2 ...
$ P.Value : num 1.06e-10 1.98e-10 2.76e-10 3.30e-10 3.41e-10 ...
$ adj.P.Val: num 1.02e-06 1.02e-06 1.02e-06 1.02e-06 1.02e-06 ...
因子看起来像字符数据,但是包含有分类信息,一串数字,标签的下标
str(ResultsTable_small$SYMBOL)
Factor w/ 40 levels "1500015O10Rik",..: 40 3 1 26 20 23 21 8 9 16 ...
typeof(ResultsTable_small$SYMBOL)
‘integer’
如果你不想使用因子,都按字符处理
ResultsTable_small <- read.table("3219685/ResultsTable_small.txt", stringsAsFactors = FALSE, header=TRUE)
str(ResultsTable_small)
'data.frame': 40 obs. of 7 variables:
$ ENTREZID : int 24117 381290 78896 226101 16012 231830 16669 55987 231991 14620 ...
$ SYMBOL : chr "Wif1" "Atp2b4" "1500015O10Rik" "Myof" ...
$ logFC : num 1.82 -2.14 2.81 -2.33 -2.9 ...
$ AveExpr : num 2.98 3.94 3.04 6.22 1.98 ...
$ t : num 20.1 -19.1 18.5 -18.3 -18.2 ...
$ P.Value : num 1.06e-10 1.98e-10 2.76e-10 3.30e-10 3.41e-10 ...
$ adj.P.Val: num 1.02e-06 1.02e-06 1.02e-06 1.02e-06 1.02e-06 ...
排序
sort(x)是对向量x进行排序,返回值排序后的数值向量
向量排序从小到大
sort(ResultsTable_small$logFC)
- -6.07014263352471
- -5.82788863265927
- -5.14626842050727
- -3.31364787941005
- -3.21114827988465
- -2.89611515497497
- -2.65339801437433
- -2.59810458251622
- -2.59704434679791
- -2.5385964096814
- -2.32974392966638
- -2.31272074376764
- -2.17189594243266
- -2.14388533952125
- -2.07146867497747
- -2.01180757857908
- -1.7089733203604
- -1.56742438255758
- -1.52029112638995
- -1.51546863348474
- -1.33143737986022
- -1.258670931154
- -1.10915597439346
- 1.47467090878791
- 1.52240538027464
- 1.710379971561
- 1.7513404533859
- 1.78860123725529
- 1.81994310357102
- 1.88756079716885
- 1.97277123486981
- 2.18037012538574
- 2.25339982481145
- 2.27887939443659
- 2.34291447312525
- 2.76674499153781
- 2.80754753168061
- 2.83562374639041
- 3.60009376671151
- 3.73893325921556
向量排序从大到小
sort(ResultsTable_small$logFC, decreasing = TRUE)
- 3.73893325921556
- 3.60009376671151
- 2.83562374639041
- 2.80754753168061
- 2.76674499153781
- 2.34291447312525
- 2.27887939443659
- 2.25339982481145
- 2.18037012538574
- 1.97277123486981
- 1.88756079716885
- 1.81994310357102
- 1.78860123725529
- 1.7513404533859
- 1.710379971561
- 1.52240538027464
- 1.47467090878791
- -1.10915597439346
- -1.258670931154
- -1.33143737986022
- -1.51546863348474
- -1.52029112638995
- -1.56742438255758
- -1.7089733203604
- -2.01180757857908
- -2.07146867497747
- -2.14388533952125
- -2.17189594243266
- -2.31272074376764
- -2.32974392966638
- -2.5385964096814
- -2.59704434679791
- -2.59810458251622
- -2.65339801437433
- -2.89611515497497
- -3.21114827988465
- -3.31364787941005
- -5.14626842050727
- -5.82788863265927
- -6.07014263352471
对字符也适用
sort(ResultsTable_small$SYMBOL)
- '1500015O10Rik'
- 'Ak1'
- 'Atp2b4'
- 'Bhlhe41'
- 'Ccdc129'
- 'Ccdc153'
- 'Chil1'
- 'Cpxm2'
- 'Creb5'
- 'Csf1'
- 'Csn1s2b'
- 'Cyp2s1'
- 'Ddit4'
- 'Fam102b'
- 'Fam110a'
- 'Gjb3'
- 'Gpsm2'
- 'Hmcn1'
- 'Hs6st2'
- 'Igfbp6'
- 'Krt19'
- 'Lif'
- 'Micall2'
- 'Mrgprf'
- 'Mtmr11'
- 'Myof'
- 'Naaa'
- 'Nfatc2'
- 'Nr1d1'
- 'Pdzd3'
- 'Ppp2r3a'
- 'Serpinf1'
- 'Skil'
- 'Slit3'
- 'Smad7'
- 'Sox4'
- 'Tnni2'
- 'Tppp3'
- 'Trp53inp1'
- 'Wif1'
对数据框排序
order()的返回值是对应“排名”的元素所在向量中的位置
order(ResultsTable_small$logFC)
- 19
- 18
- 11
- 23
- 14
- 5
- 37
- 9
- 31
- 30
- 4
- 7
- 32
- 2
- 39
- 36
- 17
- 27
- 33
- 8
- 21
- 28
- 38
- 24
- 40
- 12
- 35
- 34
- 1
- 16
- 20
- 13
- 6
- 29
- 26
- 15
- 3
- 25
- 10
- 22
ResultsTable_small$logFC[order(ResultsTable_small$logFC)]
- -6.07014263352471
- -5.82788863265927
- -5.14626842050727
- -3.31364787941005
- -3.21114827988465
- -2.89611515497497
- -2.65339801437433
- -2.59810458251622
- -2.59704434679791
- -2.5385964096814
- -2.32974392966638
- -2.31272074376764
- -2.17189594243266
- -2.14388533952125
- -2.07146867497747
- -2.01180757857908
- -1.7089733203604
- -1.56742438255758
- -1.52029112638995
- -1.51546863348474
- -1.33143737986022
- -1.258670931154
- -1.10915597439346
- 1.47467090878791
- 1.52240538027464
- 1.710379971561
- 1.7513404533859
- 1.78860123725529
- 1.81994310357102
- 1.88756079716885
- 1.97277123486981
- 2.18037012538574
- 2.25339982481145
- 2.27887939443659
- 2.34291447312525
- 2.76674499153781
- 2.80754753168061
- 2.83562374639041
- 3.60009376671151
- 3.73893325921556
ResultsTable_small[order(ResultsTable_small$logFC), ]
ENTREZID | SYMBOL | logFC | AveExpr | t | P.Value | adj.P.Val | |
---|---|---|---|---|---|---|---|
19 | 12992 | Csn1s2b | -6.070143 | 3.56295004 | -14.16565 | 6.377604e-09 | 5.131276e-06 |
18 | 21953 | Tnni2 | -5.827889 | 0.30207159 | -14.40327 | 5.265278e-09 | 4.622914e-06 |
11 | 211577 | Mrgprf | -5.146268 | -0.93683349 | -16.36573 | 1.196263e-09 | 1.718703e-06 |
23 | 170761 | Pdzd3 | -3.313648 | -0.06019306 | -13.62372 | 9.982985e-09 | 6.580512e-06 |
14 | 270150 | Ccdc153 | -3.211148 | -1.34083882 | -15.50126 | 2.249931e-09 | 2.539851e-06 |
5 | 16012 | Igfbp6 | -2.896115 | 1.97844876 | -18.21525 | 3.413066e-10 | 1.016240e-06 |
37 | 67971 | Tppp3 | -2.653398 | 4.90816305 | -12.22845 | 3.416616e-08 | 1.419445e-05 |
9 | 231991 | Creb5 | -2.598105 | 4.27592952 | -16.53634 | 1.059885e-09 | 1.718703e-06 |
31 | 232016 | Ccdc129 | -2.597044 | 5.00471484 | -13.02266 | 1.672195e-08 | 8.524957e-06 |
30 | 67111 | Naaa | -2.538596 | 3.29074575 | -13.04083 | 1.645823e-08 | 8.524957e-06 |
4 | 226101 | Myof | -2.329744 | 6.22352456 | -18.26861 | 3.297667e-10 | 1.016240e-06 |
7 | 16669 | Krt19 | -2.312721 | 8.74189184 | -17.07937 | 7.264548e-10 | 1.640127e-06 |
32 | 76123 | Gpsm2 | -2.171896 | 4.99093472 | -12.76344 | 2.102397e-08 | 1.015751e-05 |
2 | 381290 | Atp2b4 | -2.143885 | 3.94406593 | -19.07495 | 1.982934e-10 | 1.016240e-06 |
39 | 74134 | Cyp2s1 | -2.071469 | 1.40704575 | -12.20154 | 3.502805e-08 | 1.419445e-05 |
36 | 18019 | Nfatc2 | -2.011808 | 5.79499693 | -12.27561 | 3.271067e-08 | 1.419445e-05 |
17 | 194126 | Mtmr11 | -1.708973 | 2.50804119 | -14.48746 | 4.922928e-09 | 4.576586e-06 |
27 | 545370 | Hmcn1 | -1.567424 | 3.10302591 | -13.19053 | 1.444832e-08 | 8.306595e-06 |
33 | 329739 | Fam102b | -1.520291 | 4.18813047 | -12.75357 | 2.120968e-08 | 1.015751e-05 |
8 | 55987 | Cpxm2 | -1.515469 | 2.83451194 | -16.64333 | 9.829870e-10 | 1.718703e-06 |
21 | 20564 | Slit3 | -1.331437 | 3.44179493 | -13.88522 | 8.026279e-09 | 6.040348e-06 |
28 | 60599 | Trp53inp1 | -1.258671 | 6.11839605 | -13.16241 | 1.480464e-08 | 8.306595e-06 |
38 | 235542 | Ppp2r3a | -1.109156 | 6.50105941 | -12.22041 | 3.442139e-08 | 1.419445e-05 |
24 | 73847 | Fam110a | 1.474671 | 6.84086068 | 13.62251 | 9.993185e-09 | 6.580512e-06 |
40 | 20677 | Sox4 | 1.522405 | 7.46932835 | 12.13548 | 3.724389e-08 | 1.462109e-05 |
12 | 20317 | Serpinf1 | 1.710380 | 3.38883490 | 15.77280 | 1.838727e-09 | 2.356351e-06 |
35 | 50786 | Hs6st2 | 1.751340 | 0.53953600 | 12.43097 | 2.836919e-08 | 1.280991e-05 |
34 | 79362 | Bhlhe41 | 1.788601 | 6.18368494 | 12.70504 | 2.214908e-08 | 1.029541e-05 |
1 | 24117 | Wif1 | 1.819943 | 2.97554452 | 20.10780 | 1.063770e-10 | 1.016240e-06 |
16 | 20482 | Skil | 1.887561 | 8.49892507 | 14.65488 | 4.311334e-09 | 4.258521e-06 |
20 | 17131 | Smad7 | 1.972771 | 6.71751902 | 14.14348 | 6.493642e-09 | 5.131276e-06 |
13 | 74747 | Ddit4 | 2.180370 | 6.86479110 | 15.70145 | 1.938279e-09 | 2.356351e-06 |
6 | 231830 | Micall2 | 2.253400 | 4.76059697 | 18.02627 | 3.858161e-10 | 1.016240e-06 |
29 | 217166 | Nr1d1 | 2.278879 | 6.26087761 | 13.12885 | 1.524242e-08 | 8.306595e-06 |
26 | 12654 | Chil1 | 2.342914 | 5.57645724 | 13.21976 | 1.408760e-08 | 8.306595e-06 |
15 | 11636 | Ak1 | 2.766745 | 4.30347462 | 15.27694 | 2.664640e-09 | 2.807465e-06 |
3 | 78896 | 1500015O10Rik | 2.807548 | 3.03651950 | 18.54773 | 2.758828e-10 | 1.016240e-06 |
25 | 12977 | Csf1 | 2.835624 | 7.47759094 | 13.41902 | 1.187300e-08 | 7.505634e-06 |
10 | 14620 | Gjb3 | 3.600094 | 3.52528051 | 16.46627 | 1.113755e-09 | 1.718703e-06 |
22 | 16878 | Lif | 3.738933 | 6.68203417 | 13.73344 | 9.105708e-09 | 6.541210e-06 |
取子集操作
使用逻辑语句
Subsetting using logical statements
ResultsTable_small$logFC > 3
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- TRUE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- TRUE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
ResultsTable_small$logFC[ResultsTable_small$logFC > 3]
- 3.60009376671151
- 3.73893325921556
应用到数据框
ResultsTable_small[ResultsTable_small$logFC > 3, ]
ENTREZID | SYMBOL | logFC | AveExpr | t | P.Value | adj.P.Val | |
---|---|---|---|---|---|---|---|
10 | 14620 | Gjb3 | 3.600094 | 3.525281 | 16.46627 | 1.113755e-09 | 1.718703e-06 |
22 | 16878 | Lif | 3.738933 | 6.682034 | 13.73344 | 9.105708e-09 | 6.541210e-06 |
ResultsTable_small[ResultsTable_small$logFC > 3 | ResultsTable_small$logFC < -3, ]
ENTREZID | SYMBOL | logFC | AveExpr | t | P.Value | adj.P.Val | |
---|---|---|---|---|---|---|---|
10 | 14620 | Gjb3 | 3.600094 | 3.52528051 | 16.46627 | 1.113755e-09 | 1.718703e-06 |
11 | 211577 | Mrgprf | -5.146268 | -0.93683349 | -16.36573 | 1.196263e-09 | 1.718703e-06 |
14 | 270150 | Ccdc153 | -3.211148 | -1.34083882 | -15.50126 | 2.249931e-09 | 2.539851e-06 |
18 | 21953 | Tnni2 | -5.827889 | 0.30207159 | -14.40327 | 5.265278e-09 | 4.622914e-06 |
19 | 12992 | Csn1s2b | -6.070143 | 3.56295004 | -14.16565 | 6.377604e-09 | 5.131276e-06 |
22 | 16878 | Lif | 3.738933 | 6.68203417 | 13.73344 | 9.105708e-09 | 6.541210e-06 |
23 | 170761 | Pdzd3 | -3.313648 | -0.06019306 | -13.62372 | 9.982985e-09 | 6.580512e-06 |
ResultsTable_small[abs(ResultsTable_small$logFC) > 3, ]
ENTREZID | SYMBOL | logFC | AveExpr | t | P.Value | adj.P.Val | |
---|---|---|---|---|---|---|---|
10 | 14620 | Gjb3 | 3.600094 | 3.52528051 | 16.46627 | 1.113755e-09 | 1.718703e-06 |
11 | 211577 | Mrgprf | -5.146268 | -0.93683349 | -16.36573 | 1.196263e-09 | 1.718703e-06 |
14 | 270150 | Ccdc153 | -3.211148 | -1.34083882 | -15.50126 | 2.249931e-09 | 2.539851e-06 |
18 | 21953 | Tnni2 | -5.827889 | 0.30207159 | -14.40327 | 5.265278e-09 | 4.622914e-06 |
19 | 12992 | Csn1s2b | -6.070143 | 3.56295004 | -14.16565 | 6.377604e-09 | 5.131276e-06 |
22 | 16878 | Lif | 3.738933 | 6.68203417 | 13.73344 | 9.105708e-09 | 6.541210e-06 |
23 | 170761 | Pdzd3 | -3.313648 | -0.06019306 | -13.62372 | 9.982985e-09 | 6.580512e-06 |
操作符 %in%
my_genes <- c("Smad7", "Wif1", "Fam102b", "Tppp3")
ResultsTable_small$SYMBOL %in% my_genes
- TRUE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- TRUE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- FALSE
- TRUE
- FALSE
- FALSE
- FALSE
- TRUE
- FALSE
- FALSE
- FALSE
ResultsTable_small[ResultsTable_small$SYMBOL %in% my_genes, ]
ENTREZID | SYMBOL | logFC | AveExpr | t | P.Value | adj.P.Val | |
---|---|---|---|---|---|---|---|
1 | 24117 | Wif1 | 1.819943 | 2.975545 | 20.10780 | 1.063770e-10 | 1.016240e-06 |
20 | 17131 | Smad7 | 1.972771 | 6.717519 | 14.14348 | 6.493642e-09 | 5.131276e-06 |
33 | 329739 | Fam102b | -1.520291 | 4.188130 | -12.75357 | 2.120968e-08 | 1.015751e-05 |
37 | 67971 | Tppp3 | -2.653398 | 4.908163 | -12.22845 | 3.416616e-08 | 1.419445e-05 |
match
函数
%in%这个操作符只返回逻辑向量TRUE 或者FALSE,而且返回值应该与%in%这个操作符前面的向量程度相等。也就是说它相当于遍历了C里面的一个个元素,判断它们是否在B中出现过,然后返回是或者否即可。
而match(C,B)的结果就很不一样了,它的返回结果同样与前面的向量等长,但是它并非返回逻辑向量,而是遍历了C里面的一个个元素,判断它们是否在B中出现过,如果出现就返回在B中的索引号,如果没有出现,就返回NA。
B<-seq(5,15,2)
C<-1:5
match(C,B)
C%in%B
- <NA>
- <NA>
- <NA>
- <NA>
- 1
- FALSE
- FALSE
- FALSE
- FALSE
- TRUE
match(my_genes, ResultsTable_small$SYMBOL)
- 20
- 1
- 33
- 37
和my_genes中的排序一样
ResultsTable_small[match(my_genes, ResultsTable_small$SYMBOL), ]
ENTREZID | SYMBOL | logFC | AveExpr | t | P.Value | adj.P.Val | |
---|---|---|---|---|---|---|---|
20 | 17131 | Smad7 | 1.972771 | 6.717519 | 14.14348 | 6.493642e-09 | 5.131276e-06 |
1 | 24117 | Wif1 | 1.819943 | 2.975545 | 20.10780 | 1.063770e-10 | 1.016240e-06 |
33 | 329739 | Fam102b | -1.520291 | 4.188130 | -12.75357 | 2.120968e-08 | 1.015751e-05 |
37 | 67971 | Tppp3 | -2.653398 | 4.908163 | -12.22845 | 3.416616e-08 | 1.419445e-05 |
subset函数
a = subset(ResultsTable_small, abs(logFC) > 3)
a
ENTREZID | SYMBOL | logFC | AveExpr | t | P.Value | adj.P.Val | |
---|---|---|---|---|---|---|---|
10 | 14620 | Gjb3 | 3.600094 | 3.52528051 | 16.46627 | 1.113755e-09 | 1.718703e-06 |
11 | 211577 | Mrgprf | -5.146268 | -0.93683349 | -16.36573 | 1.196263e-09 | 1.718703e-06 |
14 | 270150 | Ccdc153 | -3.211148 | -1.34083882 | -15.50126 | 2.249931e-09 | 2.539851e-06 |
18 | 21953 | Tnni2 | -5.827889 | 0.30207159 | -14.40327 | 5.265278e-09 | 4.622914e-06 |
19 | 12992 | Csn1s2b | -6.070143 | 3.56295004 | -14.16565 | 6.377604e-09 | 5.131276e-06 |
22 | 16878 | Lif | 3.738933 | 6.68203417 | 13.73344 | 9.105708e-09 | 6.541210e-06 |
23 | 170761 | Pdzd3 | -3.313648 | -0.06019306 | -13.62372 | 9.982985e-09 | 6.580512e-06 |
指定列
a = subset(ResultsTable_small, abs(logFC) > 3,select = c(SYMBOL, logFC))
a
SYMBOL | logFC | |
---|---|---|
10 | Gjb3 | 3.600094 |
11 | Mrgprf | -5.146268 |
14 | Ccdc153 | -3.211148 |
18 | Tnni2 | -5.827889 |
19 | Csn1s2b | -6.070143 |
22 | Lif | 3.738933 |
23 | Pdzd3 | -3.313648 |