题目:
许多单词都和另一个单词相似。例如:通过改变第1个字母,单词wine可以变成dine、fine、line、mine等。通过改变第3个字母,wine可以变成wide、wife、wipe、或wire等。通过改变第4个字母,wine可以变成wind、wing、wink或wins。
假设我们有一个词典,由大约89000个不同长度的不同单词组成。实际上:最可变化的单词是3个字母的、4个字母和5个字母的,不过,更长的单词检查起来更消耗时间。
分析1:
最直接了当的策略是使用一个Map对象,其中的关键字是单词,而关键字的值是用1字母替换能够从关键字变换得到的一列单。(运行时间91s)
public static void pringHighChangeables(Map<String, List<String>> adjWords, int minWords) {
//给出包含一些单词作为关键字和只在一个字母上不同的一列单词作为关键字的值
for(Map.Entry<String, List<String>> entry:adjWords.entrySet() ){
List<String> words = entry.getValue();
if(words.size()>=minWords){
System.out.println(entry.getKey()+" "+"(");
System.out.println(words.size()+"):");
for (String w : words) {
System.out.println(" "+w);
}
System.out.println();
}
}
}
//Return true if word1 and word2 are the same length
//and differ in only one character
private static boolean oneCharOff(String word1,String word2){
//检测两个单词是否只在一个字母上不同的例程
if(word1.length()!=word2.length()){
return false;
}
int diffs=0;
for (int i = 0; i < word1.length(); i++) {
if(word1.charAt(i)!=word2.charAt(i)){
if(++diffs>1){
return false;
}
}
}
return diffs==1;
}
private static <KeyType> void update(Map<KeyType, List<String>> m,KeyType key,String value){
//更新满足条件单的单词集合
List<String> list=m.get(key);
if(list==null){
list=new ArrayList<String>();
m.put(key, list);
}
list.add(value);
}
//Computing a map in which the keys are words and values are Lists of words
//that differ in only one character from the corresponding key
//Uses a quadratic algorithm(with appropriate Map)
public static Map<String, List<String>> computeAdjacentWords(List<String> theWords){
Map<String, List<String>> adjWords = new TreeMap<String,List<String>>();
String [] words=new String[theWords.size()];
//转成数组
theWords.toArray(words);
for (int i = 0; i < words.length; i++) {
for (int j = i+1; j < words.length; j++) {
if(oneCharOff(words[i], words[j])){
//比较两个单词是否只有一个字母不同
update(adjWords, words[i], words[j]);
update(adjWords, words[j], words[i]);
}
}
}
return adjWords;
}
分析2:
思路:将单词按照长度分组。可相比分析1明显缩短一半运行时间(51s)。
//解法2:按照单词长度分组
public static Map<String, List<String>> computeAdjacentWords2(List<String> theWords){
Map<String, List<String>> adjWords = new TreeMap<String,List<String>>();
Map<Integer, List<String>> wordsByLength = new TreeMap<Integer,List<String>>();
//分组单词
for(String s:theWords){
update(wordsByLength, s.length(), s);
}
//进行分组
for(List<String> groupsWords:wordsByLength.values()){
String [] words=new String[groupsWords.size()];
groupsWords.toArray(words);
for (int i = 0; i < words.length; i++) {
for (int j = i+1; j < words.length; j++) {
update(adjWords, words[i], words[j]);
update(adjWords, words[j], words[i]);
}
}
}
return adjWords;
}
分析3:
该算法更复杂点,使用一些附加映射!和分析2一样,将单词按照长度分组,然后分别对每组运算。(运行时间4s)
//分开
for(Map.Entry<Integer, List<String>> entry:wordsByLength.entrySet()){
List<String> groupsWords = entry.getValue();
Integer groupNum = entry.getKey();
//遍历每个组
for(int i=0;i<groupNum;i++){
//移除特定不同的字母
Map<String, List<String>> repToWord=new TreeMap<String,List<String>>();
for(String str:groupsWords){
String rep=str.substring(0, i)+str.substring(i+1);
update(repToWord, rep, str);
}
//and then look for map values with more than one string
for(List<String> wordClique:repToWord.values()){
if(wordClique.size()>=2){
for(String s1:wordClique){
for(String s2:wordClique){
if(s1!=s2){
update(adjWords, s1, s2);
}
}
}
}
}
}
}
return adjWords;
}