我手头上有五个文件,他们的格式都是一样的,如果我想求他们的交集,并且如果1、2、3、6、7列都相同,则输出其文件名“\t"$0.我尝试用awk去做,可是结果并不齐全.应该怎么做呢?
1.505.txt
WINGS 1000 4000 3 3/18_707 2 3
ANNY 4000 7000 4 4/18_707 3 4
MOLLY 3000 4300 5 5/18_707 4 5
TINAG 8000 10000 6 6/18_707 5 6
2.707.txt
WINGS 1000 4000 3 3/20_505 2 3
WINGS 5000 6000 8 8/20_505 3 3
SANLY 2000 4000 9 9/20_505 2 2
TINAG 8000 10000 11 11/20_505 5 6
3.808.txt
WINGS 1000 4000 3 1/20_808 2 3
WINGS 5000 6000 5 5/20_808 3 3
ANNY 4000 7000 9 9/20_808 3 3
TINAG 8000 10000 4 4/20_808 5 6
4.909.txt
WINGS 1000 4000 3 3/20_909 2 3
MKEA 1000 6200 1 1/30_909 3 3
TNLY 2000 4000 9 9/20_909 2 2
TINAG 8000 10000 11 11/20_909 5 6
5.202.txt
WINGS 1000 4000 3 1/20_202 2 3
WINGS 5000 6000 5 5/20_202 3 3
ANNY 4000 7000 9 9/20_202 3 3
TINAG 8000 10000 4 4/20_202 5 6
__________________________________________________________________________________________
结果是:
505.txt WINGS 1000 4000 3 3/18_707 2 3
707.txt WINGS 1000 4000 3 3/20_505 2 3
808.txt WINGS 1000 4000 3 1/20_808 2 3
909.txt WINGS 1000 4000 3 3/20_909 2 3
202.txt WINGS 1000 4000 3 1/20_202 2 3
505.txt TINAG 8000 10000 6 6/18_707 5 6
707.txt TINAG 8000 10000 11 11/20_505 5 6
808.txt TINAG 8000 10000 4 4/20_808 5 6
909.txt TINAG 8000 10000 11 11/20_909 5 6
202.txt TINAG 8000 10000 4 4/20_202 5 6
——————————————————————————————————————————
awk -vD=',' '{if(F!=FILENAME)f++;F=FILENAME;n=$1D$2D$3D$6D$7;a[n]=a[n]F" "$0"\n";c[n]++}END{for(n in c)if(c[n]==f)printf("%s",a[n])}' 505.txt 707.txt 808.txt ---------------------------------------------------------------------------------------------
1 #!/usr/bin/perl 2 my @files = qw/505.txt 202.txt 707.txt 808.txt 909.txt/; 3 my ( $N, %A ); 4 5 for my $C (@files) { 6 open my ($F), $C; 7 unless ( $N++ ) { 8 while (<$F>) { 9 my @B = (split)[ 0, 1, 2, 5, 6 ]; 10 push @{ $A{"@B"}{$C} }, "$C $_"; 11 } 12 next; 13 } 14 15 while (<$F>) { 16 my @B = (split)[ 0, 1, 2, 5, 6 ]; 17 $A{"@B"} and push @{ $A{"@B"}{$C} }, "$C $_"; 18 } 19 %A = map { $_, $A{$_} } grep keys %{ $A{$_} } == $N, keys %A; 20 } 21 22 for ( values %A ) { 23 keys %$_ == @files and print map @$_, values %$_; 24 }
1 另一个例子: 2 #!/bin/sh 3 #$ -S /bin/sh 4 dir="$1" 5 date_start=`date|awk -F"[ :]" '{print $4*3600 + $5*60 +$6}'` 6 for i in `ls $dir/*_vcfanno.bed.gz` 7 do 8 p=`basename $i` 9 zcat $i|awk -F "\t" '{if($32!~/^utr$`/)print}' |awk -F "\t" '{if($32!~/ncRNA/)print}'|awk -F "\t" '{if($32!~/unknown/)print}'|awk -F "\t" '{if($32!~/abnormal/)print}'|a 10 done 11 awk -F "\t" '{print $1"@"$2"@"$3"@"$11"@"$13"@"$14}' *.bed.gz|awk '{n=$1$2$3;a[n]++==1;b[n]=$0;if(a[n]>11)print b[n]}'|sed 's/@/\t/g' > middle 12 13 for k in `ls ./*_vcfanno.bed.gz` 14 do 15 awk 'NR==FNR{a[$1$2]=FILENAME"\t"$0;next}{if($1$2 in a)print a[$1$2];}' $k middle >>result_for_bed 16 done 17 18 #rm *_vcfanno.bed.gz 19 rm middle 20 date_end=`date|awk -F"[ :]" '{print $4*3600 + $5*60 +$6}'` 21 time=`expr "$date_end" - "$date_start"` 22 echo "This program have taken $time seconds" 23 ----------------------------------------------------------------------------
1 #!/usr/bin/perl -w 2 ##Usage: 3 ##perl $0 $dir > result.txt 4 5 my @files = glob "$ARGV[0]/*vcfanno.bed.gz"; 6 my ( $N, %A ); 7 8 for my $C (@files) { 9 open my ($F), $C; 10 unless ( $N++ ) { 11 while (<$F>) { 12 my @B = (split)[ 0, 1, 2, 10, 12, 13 ]; 13 push @{ $A{"@B"}{$C} }, "$C $_"; 14 } 15 next; 16 } 17 18 while (<$F>) { 19 my @B = (split)[ 0, 1, 2, 10, 12, 13 ]; 20 $A{"@B"} and push @{ $A{"@B"}{$C} }, "$C $_"; 21 } 22 %A = map { $_, $A{$_} } grep keys %{ $A{$_} } == $N, keys %A; 23 } 24 25 for ( values %A ) { 26 keys %$_ == @files and print map @$_, values %$_; 27 }