We have a need to compare two CSV files. Let say file one have a few rows, and second file could have the same no of rows or more. Most of the rows could remain same on both files.Looking for the best approach to do a diff between these two files and read only those rows which has a difference in the second file from the first file. The application processing the file is in Java.
What are the best approaches for this?
Note : it would be great if we can know a row is updated, inserted or deleted in the second file.
Requirements:-
There won't be any duplicate records
File 1 and file 2 could have same no of records with a few rows with updated values in file2 (Records updated)
File 2 could have a few rows removed ( this is treated as record deleted)
File 2 could have a few new rows added ( this is treated as record inserted)
On of the column could be treated a the primary key of the record, that won't change in both the files.
解决方案
One method for doing this would be to use java's Set interface; read each line as a string, add it to the set, then do a removeAll() with the second set on the first set, thus retaining the rows which differ. This, of course, assumes that there are no duplicate rows in the files.
// using FileUtils to read in the files.
HashSet f1 = new HashSet(FileUtils.readLines("file1.csv"));
HashSet f2 = new HashSet(FileUtils.readLines("file2.csv"));
f1.removeAll(f2); // f1 now contains only the lines which are not in f2
Update
Okay, so you have a PK field. I'll just assume you know how to get that from your string; use openCSV or regex or whatever you want. Make an actual HashMap instead of a HashSet as above, use the PK as the key and the row as the value.
HashMap f1 = new HashMap();
HashMap f2 = new HashMap();
// read f1, f2; use PK field as the key
List deleted = new ArrayList();
List updated = new ArrayList();
for(Map.Entry entry : f1.keySet()) {
if(!f2.containsKey(entry.getKey()) {
deleted.add(entry.getValue());
} else {
if(!f2.get(entry.getKey().equals(f1.getValue())) {
updated.add(f1.getValue());
}
}
}
for(String key : f1.keySet()) {
f2.remove(key);
}
// f2 now contains only "new" rows