分组后合并记录中的字段值

【问题】

As for example i have this data in csv file which has the column names as: “people”, “committers”, "repositoryCommitters

The “people” column has the ids from 1-5923 and i want to match the ids if they have the common repository from the “repositoryCommitters” column like for example:

people | repositoryCommitters
1 | x
2 | x
3 | y

people id 1 and 2 has the common repo “x” and how do i get this ids and print in the output like:

*Edges
1 2

means 1 and 2 are link because they have the common repository.

For now the code i have is:

package network;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Scanner;

public class Read {
 static String line;
 static BufferedReader br1 = null, br2 =null;
 static ArrayList<String> pList = new ArrayList<String>();
 static ArrayList<String> rList = new ArrayList<String>();
 static File fileName = new File("networkBuilder.txt");

 public static void main(String\[\] args) throws IOException
 { String fileContent = "*Vertices " ;

System.out.println("Enter your current directory: ");
 Scanner scanner = new Scanner(System.in);
 String directory = scanner.nextLine();

try {
 br1 = new BufferedReader(new FileReader(directory + "//people.csv"));
 br2 = new BufferedReader(new FileReader(directory + "//repo.csv"));

} catch(FileNotFoundException e)
 {
System.out.println(e.getMessage() + " \\n file not found re-run and try again");
 System.exit(0);
 }
 int count = 0;
 try {
 while((line = br1.readLine()) != null){ //skip first line
 while((line = br1.readLine()) != null)
 {
 pList.add(line); // add to array list
 count++ ;

 } }

} catch (IOException error) {
 System.out.println(error.getMessage() + "Error reading file");
 }
 \**Vertices**\ 
System.out.println("\\n"); // new line
 System.out.println(fileContent + count); //print out vertices
 //print out each item in the ArrayList
 int size = pList.size();
 for(int i=0; i < size; i++){
 String\[\] data=(pList.get(i)).split(",");
 System.out.println(data\[1\]);

} 
// Save the console output in a text file
 try{
 PrintStream myconsole = new PrintStream(new File(directory + "network.txt"));
 System.setOut(myconsole);
 //print out each item in the ArrayList
int sz = pList.size(); System.out.println(fileContent + count); //print out vertices
 for(int i=0; i < sz; i++){
 String\[\] data=(pList.get(i)).split(",");
 System.out.println(data\[1\]);
 }
 } catch(Exception er){
 }

 /* try{
 FileWriter fw = new FileWriter(fileName);
 Writer output = new BufferedWriter(fw);
 int size = pList.size();
 for(int j=0; j<size; j++){

 output.write(fileContent + count);
 ((BufferedWriter) output).newLine();
 output.write(pList.get(j) + "\\n");
 ((BufferedWriter) output).newLine();
 }
output.close(); 

 } */

 /** Edges**/
 fileContent = "\\n*Edges";
 System.out.println(fileContent);
 // peopleCSV();
 // repoCSV();

 } // end of main
}

And the output is:

Enter your current directory:

_C:\Users\StudentDoubts\Documents

*Vertices 5923
1
2
3 . . .

【回答】

根据第二列分组,组内将第 1 列合并到同一行,硬编码实现这种算法太复杂,这种情况用集算器实现更方便,SPL 代码简单易懂:

A
1=file(“people.txt”).import@t(;,"|")
2=A1.group(repositoryCommitters).new(~.(people).concat(“ “):*Edges)
3=file("D:/result.txt").export@t(A2)

如果想给输出的每行加上 repositoryCommitters,只需要将 A2 改为

=A1.group(repositoryCommitters).new(~.(people).string(" "):*Edges,repositoryCommitters:repositoryCommitters)

集算器提供了 JDBC 接口,可以像数据库一样使用,Java 如何调用 SPL 脚本

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值