Tau (Tau index score) 计算

1.文件准备+运行方法

Tau (tissue-specific gene expression)[组织特异性得分]:计算一个基因不同组织中特异性表达的打分,介于0-1之间,越接近1越特异。

The tau index indicates how specific or broadly expressed a gene or transcript is, within studied tissues. Genes with a tau index close to 1 are more specifically expressed in one tissue, while genes with a tau index closer to 0 are equally expressed across all tissues studied[1].

tau 指数表明基因或转录本在研究组织中的特异性或广泛表达程度。tau 指数接近 1 的基因在某一组织中的表达更为特异,而 tau 指数接近 0 的基因在研究的所有组织中表达相同。

输入文件格式如下:

第一列是基因名,之后几列是表达量矩阵。

参数如下:

Options:

-h, --help Display this help message

-i, --in <file> Specify an input file

-o, --out <file> Specify an output file

-r, --replicates <int> number of replicates

-i 是输入文件的文件名

-o 是输出文件的文件名,默认是 “tau.txt”

-r 是该矩阵的生物学次数,默认参数是3被重复。如果是已经每组重复已经求过平均值了,那么可以直接使用1

比如输入文件时input.txt,输入文件是output.txt,4倍生物学重复,那应该是:

perl tau.pl -i input.txt -o output.txt -r 4

具体的运算方法是参考下面公式计算的[1]:

where N is the number of tissues being studied and xi is the expression profile component for a given tissue, normalised by the maximal component value for that gene (i.e. the expression of that gene in the tissue it is most highly expressed in)

其中 N 是研究组织的数量,xi 是特定组织的表达谱分量,以该基因的最大分量值(即该基因在其表达量最高的组织中的表达量)进行归一化处理

2.代码

#!/usr/bin/perl
#Tau.pl
#HYG
use strict;
use warnings;
use List::Util qw(sum);
use Getopt::Long;

my $help_requested;
my $file;
my $out="tau.txt";
my $r=3;

sub usage {
        print "Usage: $0 [options]\n";
        print "Options:\n";
        print "  -h, --help        Display this help message\n";
        print "  -i, --in <file> Specify an input file\n";
        print "  -o, --out <file> Specify an output file      default: tau.txt\n";
        print "  -r, --replicates <int> number of replicates    default: 3 \n";
}
if (@ARGV == 0) {
        usage();
        exit;
}

GetOptions(
        'h|help' => \$help_requested,
        'i|in=s' => \$file,
        'o|out=s' => \$out,
        'r|replicates=i' => \$r,
);

if ($help_requested) {
        usage();
        exit;
}

open FL, "$file" or die "cannot open the file $file\n";
open OUT, ">$out" or die "no output file name\n";
while(my $line = <FL>){
        chomp $line;
        my @array = split /\s+/,$line ;
        my $gene = shift @array;
        my $n = scalar @array;
        my $tau;
        unless ($n % $r == 0 ){
                die "not be  divisible\n";
        }
        unless ($n > $r){
                die "Total expression can not more than or equal the number of eplicates ";
        }
        my @tpm_values;
        my @groups;
        foreach my $element (@array){
                push @groups, $element;
                if (@groups == $r) {
                        my $average = sum(@groups) / $r;

                        push @tpm_values, $average;
                        @groups = ();
                }
        }
        my $max_expression = max(@tpm_values);
        next if($max_expression <= 0);
        for my $tpm (@tpm_values){
                print "$tpm\n";
                my $xi = $tpm / $max_expression;
                $tau += (1 - $xi);
        }
        $tau = $tau / (@tpm_values - 1);
        print OUT "$gene\t$tau\n";
}
close FL;
close OUT;
sub max {
    return (sort {$a <=> $b} @_)[-1];
}

参考文献

1.Palmer, D., Fabris, F., Doherty, A., Freitas, A.A. & de Magalhaes, J.P. Ageing transcriptome meta-analysis reveals similarities and differences between key mammalian tissues. Aging (Albany NY) 13, 3313-3341 (2021).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值