【问题】
I want to match column 1 of file 1 with column 1 of file 2 and then column 2 of file 1 with column 1 of file 3 and then print the matches. The columns in the files are separated by tabs. For example:
file 1:
fji01dde AIDJFMGKG dlp02sle VMCFIJGM cmr03lsp CKEIFJ
file 2:
fji01dde 25 30 dlp02sle 40 50 cmr03lsp 60 70
file 3:
AIDJFMGKG CKEIFJ
output needs to be:
fji01dde AIDJFMGKG 25 30 cmr03lsp CKEIFJ 60 70
I only want lines that are common in all three files.
The below code works well for the first two files, but I need to incorporate the third file. Any ideas?
#!/usr/bin/env perl use strict; my (%file1,%file2); ## Open the 1st file open(A,"file1"); while(<A>){ chomp; ## Split the current line on tabs into the @F array. my @F=split(/\t/); push @{$file1{$F[0]}},@F[1..$#F]; } ## Open the 2nd file open(B,"file2"); while(<B>){ chomp; ## Split the current line on tabs into the @F array. my @F=split(/\t/); if (defined($file1{$F[0]})) { foreach my $col (@{$file1{$F[0]}}) { print "$F[0]\t$col\t@F[1..$#F]\n"; } } }
What would the output be if file3 did not have CKEIFJ
It would be just fji01dde AIDJFMGKG 25 30
别人给出的解答:
use strict; use warnings; use Text::CSV_XS qw(csv); my @csv_files = @ARGV; # Parse all the CSV files into arrays of arrays. my $data1 = csv( in => $csv_files[0], sep_char => "\t" ); # Parse the other CSV files into hashes of rows keyed on the columns we're going to search on. my $data2 = csv( in => $csv_files[1], sep_char => "\t", headers => ["code", "num1", "num2"\], key => "code" ); my $data3 = csv( in => $csv_files[2], sep_char => "\t", headers => \["CODE"], key => "CODE" ); for my $row1 (@$data1) { my $row2 = $data2->{$row1->[0]}; my $row3 = $data3->{$row1->[1]}; if( $row2 && $row3 ) { print join "\t", $row1->[0], $row1->[1], $row2->{num1}, $row2->{num2}; print "\n"; } }
【回答】
这是典型的结构化计算,file1 的 1 号字段和 file2 的 1 号字段存在左关联,file3 的 1 号字段和 file1 的 2 号字段存在左关联,用支持结构化算法的语言很容易描述并解答问题,Perl 缺乏相关函数,虽能实现但难以理解。
可以选用集算器 SPL 实现上述算法,代码简单易懂:
A | |
1 | =file("D:/file1.txt").import() |
2 | =file("D:/file2.txt").import() |
3 | =file("D:/file3.txt").import() |
4 | =A1.switch(#1,A2:#1) |
5 | =A3.switch(#1,A1:#2) |
6 | =A5.new(#1.#1.#1, #1.#2, #1.#1.#2, #1.#1.#3) |
A4-A5:建立表间关联
A6:按关联路径取数,其中 #1.#1.#1 表示 file3 的 1 号字段,指向 file1 记录里的 1 号字段,再指向 file2 里的 1 号字段。