I've a requirement with deleting the data from Hbase. I want to delete the latest version of each cell based on the row key in Hbase.
I thought of an approach to get the column names and latest timestamp of each column with the given rowkey.....then perform the delete operation iteratively with each column and its time stamp.
But I'm not able to get the column names, so I'm not able do it.
Please share if you have any thoughts or working code ?
解决方案Deletes work by creating tombstone markers. For example, let's suppose
we want to delete a row. For this you can specify a version, or else
by default the currentTimeMillis is used. What this means is “delete
all cells where the version is less than or equal to this version”.
HBase never modifies data in place, so for example a delete will not
immediately delete (or mark as deleted) the entries in the storage
file that correspond to the delete condition. Rather, a so-called
tombstone is written, which will mask the deleted values[17]. If the
version you specified when deleting a row is larger than the version
of any value in the row, then you can consider the complete row to be
deleted.
So I don't see the problem with following the standard Delete procedure.
However, if you want to delete only the latest versions of your cells you could use the setTimestamp method of Scan class. So, what you could do is:
List deletes = new ArrayList<>();
Scan scan = new Scan();
scan.setTimestamp(latestVersionTimeStamp); //latestVersionTimeStamp is a long variable
//set your filters here
ResultScanner rscanner = table.getScanner(scan);
for(Result rs : rscanner){
deletes.add(new Delete(rs.getRow()));
}
try{
table.delete(deletes);
}
catch(Exception e){
e.printStackTrace();
}
However, if your Time Stamp isn't the same across cells, this will not work for all of them. This probably will.
List deletes = new ArrayList<>();
ArrayList timestamps = new ArrayList<>();//your list of timestamps
Delete d;
Scan scan = new Scan();
//set your filters here
ResultScanner rscanner = table.getScanner(scan);
for(Pair item : zip(rscanner, timestamps)){
d=new Delete(item.getLeft().getRow())
d.setTimestamp(item.getRight());
deletes.add(d);
}
try{
table.delete(deletes);
}
catch(Exception e){
e.printStackTrace();
}
I don't guarantee this will work, however. The official guides are vague enough and I might have misinterpreted anything. If I did indeed misinterpret, alert me and I will delete this answer.