Tau (Tau index score) 计算

最新推荐文章于 2024-09-28 01:28:15 发布

lucien_H

最新推荐文章于 2024-09-28 01:28:15 发布

阅读量1.9k

点赞数 25

文章标签： scala 开发语言后端

本文链接：https://blog.csdn.net/weixin_42058223/article/details/139634395

版权

1.文件准备+运行方法

Tau (tissue-specific gene expression)[组织特异性得分]：计算一个基因不同组织中特异性表达的打分，介于0-1之间，越接近1越特异。

The tau index indicates how specific or broadly expressed a gene or transcript is, within studied tissues. Genes with a tau index close to 1 are more specifically expressed in one tissue, while genes with a tau index closer to 0 are equally expressed across all tissues studied[1].

tau 指数表明基因或转录本在研究组织中的特异性或广泛表达程度。tau 指数接近 1 的基因在某一组织中的表达更为特异，而 tau 指数接近 0 的基因在研究的所有组织中表达相同。

输入文件格式如下：

第一列是基因名，之后几列是表达量矩阵。

参数如下：

Options:

-h, --help Display this help message

-i, --in <file> Specify an input file

-o, --out <file> Specify an output file

-r, --replicates <int> number of replicates

-i 是输入文件的文件名

-o 是输出文件的文件名，默认是 “tau.txt”

-r 是该矩阵的生物学次数，默认参数是3被重复。如果是已经每组重复已经求过平均值了，那么可以直接使用1

比如输入文件时input.txt，输入文件是output.txt，4倍生物学重复，那应该是：

perl tau.pl -i input.txt -o output.txt -r 4

具体的运算方法是参考下面公式计算的[1]：

where N is the number of tissues being studied and xi is the expression profile component for a given tissue, normalised by the maximal component value for that gene (i.e. the expression of that gene in the tissue it is most highly expressed in)

其中 N 是研究组织的数量，xi 是特定组织的表达谱分量，以该基因的最大分量值（即该基因在其表达量最高的组织中的表达量）进行归一化处理

2.代码

#!/usr/bin/perl
#Tau.pl
#HYG
use strict;
use warnings;
use List::Util qw(sum);
use Getopt::Long;

my $help_requested;
my $file;
my $out="tau.txt";
my $r=3;

sub usage {
        print "Usage: $0 [options]\n";
        print "Options:\n";
        print "  -h, --help        Display this help message\n";
        print "  -i, --in <file> Specify an input file\n";
        print "  -o, --out <file> Specify an output file      default: tau.txt\n";
        print "  -r, --replicates <int> number of replicates    default: 3 \n";
}
if (@ARGV == 0) {
        usage();
        exit;
}

GetOptions(
        'h|help' => \$help_requested,
        'i|in=s' => \$file,
        'o|out=s' => \$out,
        'r|replicates=i' => \$r,
);

if ($help_requested) {
        usage();
        exit;
}

open FL, "$file" or die "cannot open the file $file\n";
open OUT, ">$out" or die "no output file name\n";
while(my $line = <FL>){
        chomp $line;
        my @array = split /\s+/,$line ;
        my $gene = shift @array;
        my $n = scalar @array;
        my $tau;
        unless ($n % $r == 0 ){
                die "not be  divisible\n";
        }
        unless ($n > $r){
                die "Total expression can not more than or equal the number of eplicates ";
        }
        my @tpm_values;
        my @groups;
        foreach my $element (@array){
                push @groups, $element;
                if (@groups == $r) {
                        my $average = sum(@groups) / $r;

                        push @tpm_values, $average;
                        @groups = ();
                }
        }
        my $max_expression = max(@tpm_values);
        next if($max_expression <= 0);
        for my $tpm (@tpm_values){
                print "$tpm\n";
                my $xi = $tpm / $max_expression;
                $tau += (1 - $xi);
        }
        $tau = $tau / (@tpm_values - 1);
        print OUT "$gene\t$tau\n";
}
close FL;
close OUT;
sub max {
    return (sort {$a <=> $b} @_)[-1];
}

参考文献

1.Palmer, D., Fabris, F., Doherty, A., Freitas, A.A. & de Magalhaes, J.P. Ageing transcriptome meta-analysis reveals similarities and differences between key mammalian tissues. Aging (Albany NY) 13, 3313-3341 (2021).