awk、perl对多个文件取交集

最新推荐文章于 2021-01-09 19:45:48 发布

weixin_30426065

最新推荐文章于 2021-01-09 19:45:48 发布

阅读量147

点赞数

文章标签： awk

原文链接：http://www.cnblogs.com/huang6894/p/3472592.html

版权

我手头上有五个文件,他们的格式都是一样的,如果我想求他们的交集,并且如果1、2、3、6、7列都相同,则输出其文件名“\t"$0.我尝试用awk去做,可是结果并不齐全.应该怎么做呢?
1.505.txt

WINGS 1000 4000 3 3/18_707 2 3
ANNY 4000 7000 4 4/18_707 3 4
MOLLY 3000 4300 5 5/18_707 4 5
TINAG 8000 10000 6 6/18_707 5 6

2.707.txt
WINGS 1000 4000 3 3/20_505 2 3
WINGS 5000 6000 8 8/20_505 3 3
SANLY 2000 4000 9 9/20_505 2 2
TINAG 8000 10000 11 11/20_505 5 6

3.808.txt
WINGS 1000 4000 3 1/20_808 2 3
WINGS 5000 6000 5 5/20_808 3 3
ANNY 4000 7000 9 9/20_808 3 3
TINAG 8000 10000 4 4/20_808 5 6

4.909.txt
WINGS 1000 4000 3 3/20_909 2 3
MKEA 1000 6200 1 1/30_909 3 3
TNLY 2000 4000 9 9/20_909 2 2
TINAG 8000 10000 11 11/20_909 5 6

5.202.txt
WINGS 1000 4000 3 1/20_202 2 3
WINGS 5000 6000 5 5/20_202 3 3
ANNY 4000 7000 9 9/20_202 3 3
TINAG 8000 10000 4 4/20_202 5 6

__________________________________________________________________________________________
结果是:
505.txt WINGS 1000 4000 3 3/18_707 2 3
707.txt WINGS 1000 4000 3 3/20_505 2 3
808.txt WINGS 1000 4000 3 1/20_808 2 3
909.txt WINGS 1000 4000 3 3/20_909 2 3
202.txt WINGS 1000 4000 3 1/20_202 2 3
505.txt TINAG 8000 10000 6 6/18_707 5 6
707.txt TINAG 8000 10000 11 11/20_505 5 6
808.txt TINAG 8000 10000 4 4/20_808 5 6
909.txt TINAG 8000 10000 11 11/20_909 5 6
202.txt TINAG 8000 10000 4 4/20_202 5 6
——————————————————————————————————————————

awk -vD=',' '{if(F!=FILENAME)f++;F=FILENAME;n=$1D$2D$3D$6D$7;a[n]=a[n]F" "$0"\n";c[n]++}END{for(n in c)if(c[n]==f)printf("%s",a[n])}' 505.txt 707.txt 808.txt
---------------------------------------------------------------------------------------------

 1 #!/usr/bin/perl
 2 my @files = qw/505.txt 202.txt 707.txt 808.txt 909.txt/;
 3 my ( $N, %A );
 4 
 5 for my $C (@files) {
 6     open my ($F), $C;
 7     unless ( $N++ ) {
 8         while (<$F>) {
 9             my @B = (split)[ 0, 1, 2, 5, 6 ];
10             push @{ $A{"@B"}{$C} }, "$C $_";
11         }
12         next;
13     }
14 
15     while (<$F>) {
16         my @B = (split)[ 0, 1, 2, 5, 6 ];
17         $A{"@B"} and push @{ $A{"@B"}{$C} }, "$C $_";
18     }
19     %A = map { $_, $A{$_} } grep keys %{ $A{$_} } == $N, keys %A;
20 }
21 
22 for ( values %A ) {
23     keys %$_ == @files and print map @$_, values %$_;
24 }

 1 另一个例子：
 2 #!/bin/sh
 3 #$ -S /bin/sh
 4 dir="$1"
 5 date_start=`date|awk -F"[ :]" '{print $4*3600 + $5*60 +$6}'`
 6 for i in `ls  $dir/*_vcfanno.bed.gz`
 7 do
 8 p=`basename $i`
 9 zcat $i|awk -F "\t" '{if($32!~/^utr$`/)print}' |awk -F "\t" '{if($32!~/ncRNA/)print}'|awk -F "\t" '{if($32!~/unknown/)print}'|awk -F "\t" '{if($32!~/abnormal/)print}'|a
10 done
11 awk -F "\t" '{print $1"@"$2"@"$3"@"$11"@"$13"@"$14}' *.bed.gz|awk '{n=$1$2$3;a[n]++==1;b[n]=$0;if(a[n]>11)print b[n]}'|sed 's/@/\t/g' > middle
12 
13 for k in `ls  ./*_vcfanno.bed.gz`
14 do
15 awk 'NR==FNR{a[$1$2]=FILENAME"\t"$0;next}{if($1$2 in a)print a[$1$2];}' $k middle >>result_for_bed
16 done
17 
18 #rm *_vcfanno.bed.gz
19 rm middle
20 date_end=`date|awk -F"[ :]" '{print $4*3600 + $5*60 +$6}'`
21 time=`expr "$date_end" - "$date_start"`
22 echo "This program have taken $time seconds"
23 ----------------------------------------------------------------------------

 1 #!/usr/bin/perl -w
 2 ##Usage:
 3 ##perl $0 $dir > result.txt
 4 
 5 my @files = glob "$ARGV[0]/*vcfanno.bed.gz";
 6 my ( $N, %A );
 7 
 8 for my $C (@files) {
 9     open my ($F), $C;
10         unless ( $N++ ) {
11                 while (<$F>) {
12                             my @B = (split)[ 0, 1, 2, 10, 12, 13 ];
13                                         push @{ $A{"@B"}{$C} }, "$C $_";
14                                                 }
15                                                         next;
16                                                             }
17 
18                                                                 while (<$F>) {
19                                                                         my @B = (split)[ 0, 1, 2, 10, 12, 13 ];
20                                                                                 $A{"@B"} and push @{ $A{"@B"}{$C} }, "$C $_";
21                                                                                     }
22                                                                                         %A = map { $_, $A{$_} } grep keys %{ $A{$_} } == $N, keys %A;
23                                                                                         }
24 
25                                                                                         for ( values %A ) {
26                                                                                             keys %$_ == @files and print map @$_, values %$_;
27                                                                                             }

转载于:https://www.cnblogs.com/huang6894/p/3472592.html

weixin_30426065

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
awk、perl对多个文件取交集

我手头上有五个文件,他们的格式都是一样的,如果我想求他们的交集,并且如果1、2、3、6、7列都相同,则输出其文件名“\t"$0.我尝试用awk去做,可是结果并不齐全.应该怎么做呢?1.505.txtWINGS 1000 4000 3 3/18_707 2 3ANNY 4000 7000 4 4/18_707 3 4MOLLY 3000 4300 5 5/18_707 4 5TINAG 8...
复制链接

扫一扫