我为此苦苦挣扎了几天,我在想也许有人能帮我。
我试图完成的是处理一个文本文件,其中包含一组问题和答案。文件(.doc或.docx)的内容如下:
Document Name
1. Question one:
a. Answer one to question one
b. Answer two to question one
c. Answer three to question one
2. Question two:
a. Answer one to question two
c. Answer two to question two
e. Answer three to question two
到目前为止,我尝试的是:
通过apache poi读取文档的内容,如下所示:
fis = new FileInputStream(new File(FilePath));
XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor extract = new XWPFWordExtractor(doc);
String extractorText = extract.getText();
所以,到现在为止,我已经掌握了文件的内容。接下来,我尝试创建一个regex模式,它将匹配问题开头的数字和点。(
1。
,
12。
)继续,直到它与冒号匹配:
Pattern regexPattern = Pattern.compile("^(\\d|\\d\\d)+\\.[^:]+:\\s*$", Pattern.MULTILINE);
Matcher regexMatcher = regexPattern.matcher(extractorText);
但是,当我试图循环查看结果集时,我找不到任何问题文本:
while (regexMatcher.find()) {
System.out.println("Found");
for (int i = 0; i < regexMatcher.groupCount() - 2; i += 2) {
map.put(regexMatcher.group(i + 1), regexMatcher.group(i + 2));
System.out.println("#" + regexMatcher.group(i + 1) + " >> " + regexMatcher.group(i + 2));
}
}
我不确定我哪里错了,因为我是一个Java新手,希望有人能帮我。
另外,如果有人对如何创建一个包含问题和与之相关的答案的地图有更好的方法,我们将非常感激。
提前谢谢。
编辑:我试图获得一个类似地图的东西,其中包含键(问题文本)和另一个字符串列表,这些字符串表示与该问题相关的一组答案,例如:
Map> desiredResult = new HashMap<>();
desiredResult.entrySet().forEach((entry) -> {
String questionText = entry.getKey();
List answersList = entry.getValue();
System.out.println("Now at question: " + questionText);
answersList.forEach((answerText) -> {
System.out.println("Now at answer: " + answerText);
});
});
将产生以下输出:
Now at question: 1. Question one:
Now at answer: a. Answer one to question one
Now at answer: b. Answer two to question one
Now at answer: c. Answer three to question one