多文本键值关联

润乾软件

于 2022-07-05 12:01:46 发布

阅读量105

点赞数

分类专栏： JAVA计算文章标签： java 文本 Excel

本文链接：https://blog.csdn.net/raqsoft/article/details/125616772

版权

JAVA计算专栏收录该内容

363 篇文章 2 订阅

订阅专栏

该博客讨论了如何匹配三个文件中特定列的内容。它讲述了如何将文件1的第1列与文件2的第1列，以及文件1的第2列与文件3的第1列进行关联，并只打印出在所有三个文件中都存在的匹配行。当文件3缺少某些内容时，输出会发生变化。作者提出了一个问题，并分享了一个使用集算器SPL的解决方案，指出该语言在处理这种结构化计算时简洁易懂。

摘要由CSDN通过智能技术生成

【问题】
I want to match column 1 of file 1 with column 1 of file 2 and then column 2 of file 1 with column 1 of file 3 and then print the matches. The columns in the files are separated by tabs. For example:

file 1:

fji01dde AIDJFMGKG
dlp02sle VMCFIJGM
cmr03lsp CKEIFJ

file 2:

fji01dde 25  30
dlp02sle 40  50
cmr03lsp 60  70

file 3:

AIDJFMGKG
CKEIFJ

output needs to be:

fji01dde AIDJFMGKG 25  30
cmr03lsp CKEIFJ 60  70

I only want lines that are common in all three files.

The below code works well for the first two files, but I need to incorporate the third file. Any ideas?

#!/usr/bin/env perl
use strict;
my (%file1,%file2);
## Open the 1st file
open(A,"file1");
while(<A>){
 chomp;
 ## Split the current line on tabs into the @F array.
 my  @F=split(/\t/);
 push @{$file1{$F[0]}},@F[1..$#F];
}
## Open the 2nd file
open(B,"file2");
while(<B>){
 chomp;
 ## Split the current line on tabs into the @F array.
 my  @F=split(/\t/);
 if (defined($file1{$F[0]})) {
 foreach  my $col (@{$file1{$F[0]}}) {
 print  "$F[0]\t$col\t@F[1..$#F]\n";
 }
 }
}

What would the output be if file3 did not have CKEIFJ

It would be just fji01dde AIDJFMGKG 25 30

别人给出的解答：

use strict;
use warnings;
use  Text::CSV_XS qw(csv);
my  @csv_files = @ARGV;
# Parse all the CSV files into arrays of arrays.
my $data1 = csv( in => $csv_files[0], sep_char => "\t" );
# Parse the other CSV files into hashes of rows keyed on the columns we're going to search on.
my $data2 = csv( in => $csv_files[1],
sep_char => "\t",
 headers => ["code", "num1", "num2"\],
key => "code"
 );
my $data3 = csv( in => $csv_files[2],
sep_char => "\t",
 headers => \["CODE"],
key => "CODE"
 );
for  my $row1 (@$data1) {
 my $row2 = $data2->{$row1->[0]};
 my $row3 = $data3->{$row1->[1]};
 if( $row2 && $row3 ) {
 print join "\t", $row1->[0], $row1->[1], $row2->{num1}, $row2->{num2};
 print  "\n";
 }
}

【回答】
这是典型的结构化计算，file1 的 1 号字段和 file2 的 1 号字段存在左关联，file3 的 1 号字段和 file1 的 2 号字段存在左关联，用支持结构化算法的语言很容易描述并解答问题，Perl 缺乏相关函数，虽能实现但难以理解。

可以选用集算器 SPL 实现上述算法，代码简单易懂：

	A
1	=file("D:/file1.txt").import()
2	=file("D:/file2.txt").import()
3	=file("D:/file3.txt").import()
4	=A1.switch(#1,A2:#1)
5	=A3.switch(#1,A1:#2)
6	=A5.new(#1.#1.#1, #1.#2, #1.#1.#2, #1.#1.#3)

A4-A5：建立表间关联

A6：按关联路径取数，其中 #1.#1.#1 表示 file3 的 1 号字段，指向 file1 记录里的 1 号字段，再指向 file2 里的 1 号字段。

润乾软件

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
多文本键值关联

【问题】I want to match column 1 of file 1 with column 1 of file 2 and then column 2 of file 1 with column 1 of file 3 and then print the matches. The columns in the files are separated by tabs. For example:file 1:fji01dde AIDJFMGKGdlp02sle VMCFIJGMcmr03ls
复制链接

扫一扫

专栏目录