1.文件准备+运行方法
Tau (tissue-specific gene expression)[组织特异性得分]:计算一个基因不同组织中特异性表达的打分,介于0-1之间,越接近1越特异。
The tau index indicates how specific or broadly expressed a gene or transcript is, within studied tissues. Genes with a tau index close to 1 are more specifically expressed in one tissue, while genes with a tau index closer to 0 are equally expressed across all tissues studied[1].
tau 指数表明基因或转录本在研究组织中的特异性或广泛表达程度。tau 指数接近 1 的基因在某一组织中的表达更为特异,而 tau 指数接近 0 的基因在研究的所有组织中表达相同。
输入文件格式如下:
第一列是基因名,之后几列是表达量矩阵。
参数如下:
Options:
-h, --help Display this help message
-i, --in <file> Specify an input file
-o, --out <file> Specify an output file
-r, --replicates <int> number of replicates
-i 是输入文件的文件名
-o 是输出文件的文件名,默认是 “tau.txt”
-r 是该矩阵的生物学次数,默认参数是3被重复。如果是已经每组重复已经求过平均值了,那么可以直接使用1
比如输入文件时input.txt,输入文件是output.txt,4倍生物学重复,那应该是:
perl tau.pl -i input.txt -o output.txt -r 4
具体的运算方法是参考下面公式计算的[1]:
where N is the number of tissues being studied and xi is the expression profile component for a given tissue, normalised by the maximal component value for that gene (i.e. the expression of that gene in the tissue it is most highly expressed in)
其中 N 是研究组织的数量,xi 是特定组织的表达谱分量,以该基因的最大分量值(即该基因在其表达量最高的组织中的表达量)进行归一化处理
2.代码
#!/usr/bin/perl
#Tau.pl
#HYG
use strict;
use warnings;
use List::Util qw(sum);
use Getopt::Long;
my $help_requested;
my $file;
my $out="tau.txt";
my $r=3;
sub usage {
print "Usage: $0 [options]\n";
print "Options:\n";
print " -h, --help Display this help message\n";
print " -i, --in <file> Specify an input file\n";
print " -o, --out <file> Specify an output file default: tau.txt\n";
print " -r, --replicates <int> number of replicates default: 3 \n";
}
if (@ARGV == 0) {
usage();
exit;
}
GetOptions(
'h|help' => \$help_requested,
'i|in=s' => \$file,
'o|out=s' => \$out,
'r|replicates=i' => \$r,
);
if ($help_requested) {
usage();
exit;
}
open FL, "$file" or die "cannot open the file $file\n";
open OUT, ">$out" or die "no output file name\n";
while(my $line = <FL>){
chomp $line;
my @array = split /\s+/,$line ;
my $gene = shift @array;
my $n = scalar @array;
my $tau;
unless ($n % $r == 0 ){
die "not be divisible\n";
}
unless ($n > $r){
die "Total expression can not more than or equal the number of eplicates ";
}
my @tpm_values;
my @groups;
foreach my $element (@array){
push @groups, $element;
if (@groups == $r) {
my $average = sum(@groups) / $r;
push @tpm_values, $average;
@groups = ();
}
}
my $max_expression = max(@tpm_values);
next if($max_expression <= 0);
for my $tpm (@tpm_values){
print "$tpm\n";
my $xi = $tpm / $max_expression;
$tau += (1 - $xi);
}
$tau = $tau / (@tpm_values - 1);
print OUT "$gene\t$tau\n";
}
close FL;
close OUT;
sub max {
return (sort {$a <=> $b} @_)[-1];
}
参考文献
1.Palmer, D., Fabris, F., Doherty, A., Freitas, A.A. & de Magalhaes, J.P. Ageing transcriptome meta-analysis reveals similarities and differences between key mammalian tissues. Aging (Albany NY) 13, 3313-3341 (2021).