四六级过滤算法
经过几天不断踩坑,不断摸索,最终终于将四六级句子提取清洗算法写出来了,先上算法
String file="F:/static/2018年12月六级.txt";
FileReader fileReader=new FileReader(file);
String result = fileReader.readString();
String[] split1 = result.split("\\r?\\n");
List<String> lists = (List<String>) Convert.toList(split1);
List<List<String>> sentensList =new ArrayList<>();
for (String listobj : lists){
String[] split = listobj.split("[\\pP]");
List<String> list = (List<String>) Convert.toList(split);
//list 是句子的集合
// o 是单个句子
for (String o : list) {
//将单个句子拆分为一个一个词语
String[] split2 = o.split("\\s+");
//list1单语集合
List<String> list1 = (List<String>) Convert.toList(split2);
//o1为单个单词
List<Object> tmp = new ArrayList<>();
for (Object o1 : list1) {
QueryWrapper<Words> queryWrapper=new QueryWrapper<>();
queryWrapper.eq("word",o1);
List<Words> list2 = wordsService.list(queryWrapper);
if (isNumeric((String) o1) || isChinese((String) o1)||list2.size()==0) {
tmp.add(o1);
}
}
list1.removeAll(tmp);
if (list1.size() >= 5 && list1.size() <= 15) {
sentensList.add(list1);
}
}
}
for (List<String> objects : sentensList) {
String s = String.join(" ",objects);
QuestionBank questionBank=new QuestionBank();
questionBank.setSentens(s);
questionBank.setTitle("2018-12 CET-6");
//将objects(完整句子)再次拆分为单词
String[] s1 = s.split(" ");
List<String> list = (List<String>) Convert.toList(s1);
//让句子中的介词不参与计算平均值
List<Object> tmp = new ArrayList<>();
// as is be are were was the in for on to by at of have has had
String[] jieci={"as", "is", "be", "are", "were", "was", "the" ,"in", "for", "on", "to", "by", "at", "of" ,"have" , "has", "had"};
for (String obj : jieci) {
tmp.add(obj);
}
list.removeAll(tmp);
List<Integer> wordSort=new ArrayList<>();
for (String s2 : list) {
QueryWrapper<Words> queryWrapper=new QueryWrapper<>();
queryWrapper.eq("word",s2);
queryWrapper.eq("book_id","CET6Full");
/* .or()
.eq("book_id","high").or()
.eq("book_id","middle").or()
.eq("book_id","primary");*/
List<Words> list1 = wordsService.list(queryWrapper);
if (list1!=null&&list1.size()>0){
Words words = list1.get(0);
wordSort.add(words.getSort());
}
}
if (wordSort!=null&&wordSort.size()>0){
Integer sortMax = Collections.max(wordSort);
Integer sortMin = Collections.min(wordSort);
if (wordSort.size()>7){
wordSort.remove(sortMax);
wordSort.remove(sortMin);
}
int sum=0;
for (Integer integer : wordSort) {
sum+=integer;
}
int avg=Math.round(sum/wordSort.size());
questionBank.setAvgrank(avg);
questionBank.setMaxrank(sortMax);
}else {
questionBank.setMaxrank(s1.length);
questionBank.setAvgrank(1);
}
questionBankService.save(questionBank);
}
总的来说,就是由大化小,先按段落进行分割,再按句子进行分割,再依次从数据库查询单词的难度定级,然后计算该句子的平均等级和最高等级.
学习充电
主要系统的学习了docker的相关知识,在后续的项目部署中可以使用的到
了解了Devops,仅仅只是了解,但十分想快速掌握这门技术,非常强大.