HIT软构lab2中GraphPoet的一种实现_软构实验graphpoet-CSDN博客

本文链接：https://blog.csdn.net/m0_46991095/article/details/118526769

GraphPoet是一种文本处理算法，用于构建单词之间的关系图，并生成新的诗句。它首先读取文件中的文本，将文本转换为小写并分词。接着，检查并建立词汇间的边，表示它们的关联。在诗歌生成过程中，输入的单词通过查找图中相邻的‘桥梁’单词来构造诗句，确保相邻单词间存在特定路径。这种方法在保留原始标点符号的同时，生成连贯的诗句。

摘要由CSDN通过智能技术生成

首先是第一个方法GraphPoet的实现

public GraphPoet(File corpus) throws IOException {
    	BufferedReader br = new BufferedReader(new FileReader(corpus));
    	String line;
    	String front, next;
    	int temp;
    	String fulltxt = "";
    	while((line = br.readLine()) != null) {
    		fulltxt = fulltxt + line + " ";//得到全文
    	}
    	br.close();
    	fulltxt = fulltxt.toLowerCase();//大写转小写   	
    	String[] wordset = fulltxt.split(" ");//将句子以各种分句符号或空格划分，得到单词集合
    	for(int i = 0; i < wordset.length - 1; i++) {
    		front = wordset[i];
    		next = wordset[i + 1];
    		if(!graph.vertices().contains(front) || !graph.targets(front).containsKey(next)) {//若front不在点集中或不含到next的边则添加一条边
    			graph.set(front, next, 1);
    			continue;
    		}
    		if(graph.targets(front).containsKey(next)) {//若已有该边则weight++
    			temp = graph.targets(front).get(next) + 1;
    			graph.set(front, next, temp);
    			continue;
    		}
    	}
    	checkRep();
        //throw new RuntimeException("not implemented");
    }

首先将输入的文本提取到字符串text中，再用spilt方法进行处理，spilt仅根据空格切分即可，标点符号不能切割。因为单词后面接上标点符号有助于对上下文的判断，比如Hi, nice to meet you. 如果将标点符号切割掉，就会将Hi这个单词与nice形成前后文，这不是我们所需要的。同时，我们生成诗句时也需要保留原标点符号。

然后是poet实现

public String poem(String input) {
    	String[] wordset = input.split(" ");//把input分词
    	String str = wordset[0];
    	Set<String> frontset = new HashSet<>();
    	Set<String> nextset = new HashSet<>();
    	int temp = 0;
    	String tempstr;
    	for(int i = 1; i < wordset.length; i++) {
    		tempstr = "";
    		temp = 0;
    		frontset = graph.targets(wordset[i - 1].toLowerCase()).keySet();//将wordset中的第i-1位的顶点的targets中所有顶点的label赋值给frontset
    		nextset = graph.sources(wordset[i].toLowerCase()).keySet();//将wordset中的第i位的顶点的sources中所有顶点的label赋值给frontset
    		for(String front:frontset ) {    	//遍历寻找两个词之间是否存在“桥梁”		
    			for(String next:nextset) {
    				if(front == next) {
    					temp ++;
    					tempstr = front;
    				}
    			}
    		}
    		if(temp == 1) {//若两词之间有且只有一条长度为1的桥
    			str = str + " " + tempstr + " " + wordset[i];
    			continue;
    		}
    		str = str + " " + wordset[i];
    	}
    	checkRep();
    	return str;
        //throw new RuntimeException("not implemented");
    }