技巧(六)
20. 细胞周期 —— II
我们接续上次讲的绘制细胞周期图片,在第一部分的示例中,我们将细胞周期的每个阶段分开绘制为不同的轴。在这里,我们将整个周期绘制在一个轴上。
# cycle.txt
chr - cycle cycle 0 100 greys-6-seq-5
20.1 使用裁剪
我们将每个阶段定义为周期轴上的裁剪区域
karyotype = cycle.txt
chromosomes = cycle[g1]:0-45;cycle[s]:45-80;cycle[g2]:80-95;cycle[m]:95-100
然后在 <spacing>
块中定义 break
参数来控制裁剪区域间的间距
<ideogram>
<spacing>
default = 0.005r
break = 1r
</spacing>
</ideogram>
20.2 为阶段上色
颜色的定义与第一部分一样
palette = greys-6-seq
<phases>
g1 = 3
s = 4
g2 = 5
m = 6
</phases>
# g1, s, g2, m are tags defined in 'chromosomes' above
chromosomes_color = g1=conf(palette)-conf(phases,g1),
s=conf(palette)-conf(phases,s),
g2=conf(palette)-conf(phases,g2),
m=conf(palette)-conf(phases,m)
其他刻度参数也沿用上一部分
21. Nature 封面图
下面我们将展示如何自动生成封面图片
21.1 图片元素
该图片包含 23
个片段,表示人类的染色体 1-22
和 X
染色体。
图中展示的染色体长度与 hg19
版本的染色体并不完全一致,在这里我们使用的是组装长度
使用柔和的配色方案,在橙色、绿色、蓝色和紫色之间循环。我们用这个颜色方案来重新定义默认颜色。
图中的数据以 6
个同心圆方式显示,它们之间的间距向内侧略微减小,每个轨迹都会高亮显示固定的区域,并在染色体着色之后再着色
21.2 配色
我们从封面图片中提取到了如下的颜色配置,并通过加 *
号来重新为变量赋值
# circos.conf
<<include etc/colors_fonts_patterns.conf>>
<colors>
chr1* = 163,132,130
chr2* = 188,162,118
chr3* = 216,196,96
chr4* = 233,212,56
chr5* = 229,229,50
chr6* = 212,222,56
chr7* = 195,215,57
chr8* = 177,209,58
chr9* = 160,204,61
chr10* = 139,198,61
chr11* = 128,193,95
chr12* = 115,186,126
chr13* = 102,183,152
chr14* = 91,178,176
chr15* = 61,174,199
chr16* = 36,170,224
chr17* = 75,129,194
chr18* = 85,111,180
chr19* = 92,92,168
chr20* = 98,70,156
chr21* = 101,45,145
chr22* = 121,74,141
chrx* = 140,104,137
</colors>
然后在 <image>
块中定义图像背景
<image>
<<include etc/image.conf>>
background* = black
</image>
21.3 轨迹位置
每个轨迹都有相同的数据源,但由于随机改变数据的动态规则,会使其外观不同
# variables used in each plot.conf block
plot_width = 80
plot_padding = 25
num_plots = 6
<plots>
type = highlight
file = bins.txt
stroke_thickness = 0
<<include plot.conf>>
<<include plot.conf>>
<<include plot.conf>>
<<include plot.conf>>
<<include plot.conf>>
<<include plot.conf>>
<<include plot.conf>>
</plots>
plot.conf
文件的定义
<plot>
r1 = dims(ideogram,radius_inner)
- conf(plot_padding)*eval(remap(counter(plot),0,conf(num_plots),1,0.9))
- eval((conf(plot_width)+conf(plot_padding))*counter(plot)*eval(remap(counter(plot),0,conf(num_plots),1,0.9)))
r0 = conf(.,r1)
- conf(plot_width)*eval(remap(counter(plot),0,conf(num_plots),1,0.9))
post_increment_counter = plot:1
<<include rules.conf>>
</plot>
轨迹的内半径和外半径(r0,r1)
,通过 plot_padding
和 plot_width
参数进行设置
每次绘制图像时,变量 counter(plot)
的值自动加 1
通过 dims(ideogram,radius_inner)
获得 ideogram
内半径的值
用 remap(VAR,MIN,MAX,TARGETMIN,TARGETMAX)
将 VAR
的值从 [MIN,MAX]
重新映射到 [TARGETMIN,TARGETMAX]
21.4 轨迹数据
每个轨迹使用相同的数据,定义了 7.5 Mb
基因组区域
hs1 0 7499999
hs1 7500000 14999999
hs1 15000000 22499999
hs1 22500000 29999999
...
然后在 plot
块中用 rule
动态更改颜色
# rules.conf
<rules>
<rule>
# The first condition tests that bins are further than 5 Mb from the
# start and end of each ideogram. This ensures that the color
# for the first/last bin will be the same as the ideogram.
condition = var(start) >= 5e6 && var(end) < chrlen(var(chr))-5e6
# The probability that the second condition is true is proportional to
# the track counter. Bins in inner tracks are more likely to trigger
# this rule. Here, rand() is a uniformly distributed random number in
# the range [0,1).
condition = rand() < remap(counter(plot),0,conf(num_plots)-1,1/conf(num_plots),1)
# If this rule is true, the color of the bin is changed to that of a
# random ideogram.
fill_color = eval("chr" . (sort {rand() <=> rand()} (1..22,"x"))[0])
</rule>
<rule>
condition = 1
fill_color = eval("chr" . lc substr(var(chr),2))
</rule>
</rules>
22. 不只是基因组
circos
并不是只能用来绘制基因组区域,还可以绘制任何形式的轴。
在这里,我们将轴的各部分对应于美国总统候选人在辩论中发言的总字数。
我们在核型文件定义这些片段。例如,我们假设奥巴马说了 2,000
个单词,理查森说了 1,000
个单词,依此类推。
# karyotype.txt
chr - obama obama 0 2000 dem
chr - richardson richardson 0 1000 dem
chr - clinton clinton 0 1500 dem
chr - mccain mccain 0 1000 rep
chr - romney romney 0 1750 rep
chr - huckabee huckabee 0 1250 rep
最后一个字段根据其是共和党和民主党,用经典的蓝/红配色方案设置颜色
<<include etc/colors_fonts_patterns.conf>>
# append to the colors block
<colors>
rep = 211,121,111
dem = 85,143,190
</colors>
22.1 片段切片
每个片段分为不同的切片,每个切片表示在特定辩论中演讲的单词数
# slices.txt
obama 0 300 # Obama's 1st debate words
obama 301 750 # 2nd
obama 751 950 # 3rd
obama 951 1250 # 4th
obama 1251 1500 # 5th
obama 1501 2000 # 6th
这些切片在 ideogram
的顶部绘制为空心高亮,并带有白色的粗轮廓。
<plot>
file = slices.txt
type = highlight
r0 = dims(ideogram,radius_inner)
r1 = dims(ideogram,radius_outer)
fill_color = undef
stroke_color = white
stroke_thickness = 5
</plot>
22.2 指名道姓
当一位候选人在演讲中提到另一位候选人的名字时,我们会画一个 link
。
link
从辩论部分开始,其中提到了另一个候选人的名字,那么 link
的结束就是所述候选人片段的中心
# links.txt
# Obama mentions Clinton in his 1st debate
obama 150 150 clinton 750 750
# McCain mentions Clinton in his 3rd debate
mccain 875 875 clinton 750 750
# Huckabee mentions Clintin in his 2nd debate
huckabee 525 525 clinton 750 750
默认情况下,link
的颜色设置为 rep
,即共和党红色
<link>
file = links.txt
radius = dims(ideogram,radius_inner)
bezier_radius = 0r
thickness = 5
color = rep
...
</link>
如果推荐候选人是民主党人,则会添加一条规则以将链接颜色更改为 dem
<rules>
<rule>
# set dem color if start is on a democrat
condition = var(chr1) =~ /obama|richardson|clinton/
color = dem
</rule>
</rules>
22.3 关注候选人
要显示来自给定候选对象的 link
,可以使用 from()
函数返回 link
起始段的名称
<rule>
# only links from obama are shown (all others are hidden by setting show=no)
# the condition test is equivalent to
# var(chr1) ne "obama"
condition = ! from(obama)
show = no
</rule>
或者,要测试 link
结束段的标识,可以使用 to()
函数。
<rule>
# only links to mccain are shown (all others are hidden by setting show=no)
# the condition test is equivalent to
# var(chr2) ne "mccain"
condition = ! to(mccain)
show = no
</rule>
或者用 fromto()
测试 link
的两端
<rule>
# only links from obama to mccain are shown (all others are hidden by setting show=no)
# the condition test is equivalent to
# var(chr1) ne "obama" || var(chr2) ne "mccain"
condition = ! fromto(obama,mccain)
show = no
</rule>