首先是第一个方法GraphPoet的实现
public GraphPoet(File corpus) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(corpus));
String line;
String front, next;
int temp;
String fulltxt = "";
while((line = br.readLine()) != null) {
fulltxt = fulltxt + line + " ";//得到全文
}
br.close();
fulltxt = fulltxt.toLowerCase();//大写转小写
String[] wordset = fulltxt.split(" ");//将句子以各种分句符号或空格划分,得到单词集合
for(int i = 0; i < wordset.length - 1; i++) {
front = wordset[i];
next = wordset[i + 1];
if(!graph.vertices().contains(front) || !graph.targets(front).containsKey(next)) {//若front不在点集中或不含到next的边则添加一条边
graph.set(front, next, 1);
continue;
}
if(graph.targets(front).containsKey(next)) {//若已有该边则weight++
temp = graph.targets(front).get(next) + 1;
graph.set(front, next, temp);
continue;
}
}
checkRep();
//throw new RuntimeException("not implemented");
}
首先将输入的文本提取到字符串text中,再用spilt方法进行处理,spilt仅根据空格切分即可,标点符号不能切割。因为单词后面接上标点符号有助于对上下文的判断,比如Hi, nice to meet you. 如果将标点符号切割掉,就会将Hi这个单词与nice形成前后文,这不是我们所需要的。同时,我们生成诗句时也需要保留原标点符号。
然后是poet实现
public String poem(String input) {
String[] wordset = input.split(" ");//把input分词
String str = wordset[0];
Set<String> frontset = new HashSet<>();
Set<String> nextset = new HashSet<>();
int temp = 0;
String tempstr;
for(int i = 1; i < wordset.length; i++) {
tempstr = "";
temp = 0;
frontset = graph.targets(wordset[i - 1].toLowerCase()).keySet();//将wordset中的第i-1位的顶点的targets中所有顶点的label赋值给frontset
nextset = graph.sources(wordset[i].toLowerCase()).keySet();//将wordset中的第i位的顶点的sources中所有顶点的label赋值给frontset
for(String front:frontset ) { //遍历寻找两个词之间是否存在“桥梁”
for(String next:nextset) {
if(front == next) {
temp ++;
tempstr = front;
}
}
}
if(temp == 1) {//若两词之间有且只有一条长度为1的桥
str = str + " " + tempstr + " " + wordset[i];
continue;
}
str = str + " " + wordset[i];
}
checkRep();
return str;
//throw new RuntimeException("not implemented");
}
我的思路是将input中的词先spilt分词,然后将前后两个词放入循环,若查找到存在桥,就将桥加入输出字符串中