这是紧接上面的一道题,比较有趣
Question 2
The data set is here. The format is:
[# of nodes] [# of bits for each node's label]
[first bit of node 1] ... [last bit of node 1]
[first bit of node 2] ... [last bit of node 2]
...
For example, the third line of the file "0 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1" denotes the 24 bits associated with node #2.
The distance between two nodes u and v in this problem is defined as the Hamming distance--- the number of differing bits --- between the two nodes' labels. For example, the Hamming distance between the 24-bit label of node #2 above and the label "0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 0 1 0 0 1 0 1" is 3 (since they differ in the 3rd, 7th, and 21st bits).
The question is: what is the largest value of k such that there is a k -clustering with spacing at least 3? That is, how many clusters are needed to ensure that no pair of nodes with all but 2 bits in common get split into different clusters?
NOTE: The graph implicitly defined by the data file is so big that you probably can't write it out explicitly, let alone sort the edges by cost. So you will have to be a little creative to complete this part of the question. For example, is there some way you can identify the smallest distances without explicitly looking at every pair of nodes?
首先这些点的存储是二进制的,因此用一个BitSet来表示每一个节点是一个很直接的选择。
这道题的问题可以解释为:把所有Hamming Distance为0,1,2的点都聚合到一起,剩下多少个点?很自然地,利用Union Find,把距离为0,1,2的点都union到一起,剩下的节点数就是结果。
剩下的问题就是如何快速找到距离为0,1,2的点。对每个pair进行暴力搜索显然不大行,因为文本中一共有200000个点,每个点有24个bit,这耗时太长了。其实仔细想想,对每一个节点,只需要找到与其每个bit都相同,只有一个bit不相同以及有两个bit不同的节点即可。这其实并没有很多种可能性(1 + 24 + 276 = 301)。所以启用作弊神器hashmap就可以进行“暴力”搜索了,耗时不长。
建立一个hashmap,利用每一个BitSet作key,相应的index(即这个节点在union find结构中的编号)作为value,可以很快地得到答案。其中用到的UnionFind和上篇文章中得一样。
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.BitSet;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Set;
import java.util.StringTokenizer;
public class Clustering2 {
private HashMap<BitSet, Integer> dataSet;
private UnionFind uf;
private int numberOfBits;
private int numberOfNodes;
private String filename = "/Users/Zhusong/Documents/Study/AlgorithmsDesignAndAnalysis/Assignments/Ass2/Ass2Prob2/clustering_big.txt";
public static void main(String[] args) {
// TODO Auto-generated method stub
Clustering2 clustering2 = new Clustering2();
clustering2.run();
}
private void run(){
dataSet = new HashMap<BitSet, Integer>();
build();
calDis1();
calDis2();
System.out.println(uf.count());
}
/**
* find nodes that have a distance of 2
*/
private void calDis2(){
Set<BitSet> keySet = dataSet.keySet();
Iterator<BitSet> iterator = keySet.iterator();
while (iterator.hasNext()) {
BitSet bitSet = (BitSet) iterator.next();
for (int i = 0; i < numberOfBits; i++) {
for (int j = 0; (j < numberOfBits) && (j != i); j++) {
BitSet temp = (BitSet) bitSet.clone();
temp.flip(i);
temp.flip(j);
if (dataSet.containsKey(temp)) {
uf.union(dataSet.get(bitSet), dataSet.get(temp));
}
}
}
}
}
/**
* find nodes that have a distance of 1
*/
private void calDis1(){
Set<BitSet> keySet = dataSet.keySet();
Iterator<BitSet> iterator = keySet.iterator();
while (iterator.hasNext()) {
BitSet bitSet = (BitSet) iterator.next();
for (int i = 0; i < numberOfBits; i++) {
BitSet temp = (BitSet) bitSet.clone();
temp.flip(i);
if (dataSet.containsKey(temp)) {
uf.union(dataSet.get(bitSet), dataSet.get(temp));
}
}
}
}
/**
* 1. read in the text file
* 2. create a union find structure
* 3. build the hash map and union the nodes that have 0 distances
*/
private void build(){
File file = new File(filename);
try {
BufferedReader rd = new BufferedReader(new FileReader(file));
String line = rd.readLine();
StringTokenizer tokenizer = new StringTokenizer(line);
numberOfNodes = Integer.parseInt(tokenizer.nextToken());
uf = new UnionFind(numberOfNodes);
numberOfBits = Integer.parseInt(tokenizer.nextToken());
//
int index = 0;
while ((line = rd.readLine()) != null) {
tokenizer = new StringTokenizer(line);
BitSet bitSet = new BitSet(numberOfBits);
//creates a bit set demonstrating the current node
for (int i = 0; i < numberOfBits; i++) {
if (tokenizer.nextToken().equals("1")) {
bitSet.set(i);
}
}
//put it in the hash map if no identical nodes are already in there
if (!dataSet.containsKey(bitSet)) {
dataSet.put(bitSet, index);
}
//union the two nodes if the current node has a duplicate in the hash map
else {
uf.union(index, dataSet.get(bitSet));
}
index++;
}
rd.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}