四六级过滤算法

四六级过滤算法

经过几天不断踩坑,不断摸索,最终终于将四六级句子提取清洗算法写出来了,先上算法

String file="F:/static/2018年12月六级.txt";
        FileReader fileReader=new FileReader(file);
        String result = fileReader.readString();
        String[] split1 = result.split("\\r?\\n");
        List<String> lists = (List<String>) Convert.toList(split1);
        List<List<String>> sentensList =new ArrayList<>();
        for (String listobj : lists){
            String[] split = listobj.split("[\\pP]");
            List<String> list = (List<String>) Convert.toList(split);


            //list 是句子的集合
            // o 是单个句子

            for (String o : list) {
                //将单个句子拆分为一个一个词语
                String[] split2 = o.split("\\s+");
                //list1单语集合
                List<String> list1 = (List<String>) Convert.toList(split2);
                //o1为单个单词
                List<Object> tmp = new ArrayList<>();
                for (Object o1 : list1) {
                    QueryWrapper<Words> queryWrapper=new QueryWrapper<>();
                    queryWrapper.eq("word",o1);

                    List<Words> list2 = wordsService.list(queryWrapper);

                    if (isNumeric((String) o1) || isChinese((String) o1)||list2.size()==0) {
                        tmp.add(o1);
                    }
                }

                list1.removeAll(tmp);

                if (list1.size() >= 5 && list1.size() <= 15) {
                    sentensList.add(list1);
                }
            }
        }
        for (List<String> objects : sentensList) {
            String s = String.join(" ",objects);
            QuestionBank questionBank=new QuestionBank();
            questionBank.setSentens(s);
            questionBank.setTitle("2018-12 CET-6");
            //将objects(完整句子)再次拆分为单词
            String[] s1 = s.split(" ");

            List<String> list = (List<String>) Convert.toList(s1);

            //让句子中的介词不参与计算平均值
            List<Object> tmp = new ArrayList<>();
            // as is be are were was the in for on to by at  of have  has had
            String[] jieci={"as", "is", "be", "are", "were", "was", "the" ,"in", "for", "on", "to", "by", "at",  "of" ,"have" , "has", "had"};
            for (String obj : jieci) {
                tmp.add(obj);
            }
            list.removeAll(tmp);

            List<Integer> wordSort=new ArrayList<>();

            for (String s2 : list) {
                QueryWrapper<Words> queryWrapper=new QueryWrapper<>();
                queryWrapper.eq("word",s2);
                queryWrapper.eq("book_id","CET6Full");
                /*      .or()
                        .eq("book_id","high").or()
                        .eq("book_id","middle").or()
                        .eq("book_id","primary");*/
                List<Words> list1 = wordsService.list(queryWrapper);
                if (list1!=null&&list1.size()>0){
                    Words words = list1.get(0);
                    wordSort.add(words.getSort());
                }
            }
            if (wordSort!=null&&wordSort.size()>0){

            Integer sortMax = Collections.max(wordSort);
            Integer sortMin = Collections.min(wordSort);
            if (wordSort.size()>7){

            wordSort.remove(sortMax);
            wordSort.remove(sortMin);
            }
            int sum=0;
            for (Integer integer : wordSort) {
                sum+=integer;
            }
            int avg=Math.round(sum/wordSort.size());
            questionBank.setAvgrank(avg);
            questionBank.setMaxrank(sortMax);
            }else {

            questionBank.setMaxrank(s1.length);
            questionBank.setAvgrank(1);
            }

            questionBankService.save(questionBank);

        }

总的来说,就是由大化小,先按段落进行分割,再按句子进行分割,再依次从数据库查询单词的难度定级,然后计算该句子的平均等级和最高等级.

学习充电

主要系统的学习了docker的相关知识,在后续的项目部署中可以使用的到
了解了Devops,仅仅只是了解,但十分想快速掌握这门技术,非常强大.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值