本文是记录,分析一套word文档另存为html压缩包的记录。其实分析word文档有很多的方法,但是将他转为html更为灵活一些。也希望有好方法的互相交流,可给我留言
将word转html有很多的方法,我用过的最方便的是docx4j,这里不做记录。此处案例是上传并分析一套多元测评的案例,类似一套心理测试试卷。
下面是试卷简约样本1,第一步,上传并校验压缩包内容
一。上传并解压压缩包。
1.有一方法是解析压缩包,解析压缩包代码是
/**
* 针对word TO html 返回html路径
* 解压缩ZIP文件,将ZIP文件里的内容解压到descFileName目录下
* @param zipFileName 需要解压的ZIP文件
* @param descFileName 目标文件
*/
public static List<String> unZipFilesReturnPath(String zipFileName, String descFileName) {
//支持多个试卷上传html解压
List<String> htmlFilePath=Lists.newArrayList();
Set<String> filesFilePath=new HashSet<>();
String descFileNames = descFileName;
if (!descFileNames.endsWith(File.separator)) {
descFileNames = descFileNames + File.separator;
}
try {
// 根据ZIP文件创建ZipFile对象
ZipFile zipFile = new ZipFile(zipFileName,"GBK");
ZipEntry entry = null;
String entryName = null;
String descFileDir = null;
byte[] buf = new byte[4096];
int readByte = 0;
// 获取ZIP文件里所有的entry
@SuppressWarnings("rawtypes")
Enumeration enums = zipFile.getEntries();
// 遍历所有entry
while (enums.hasMoreElements()) {
entry = (ZipEntry) enums.nextElement();
// 获得entry的名字
entryName = entry.getName();
if((entryName.toLowerCase().contains(".htm") || entryName.toLowerCase().contains(".html")) && entryName.indexOf("/") == -1){
htmlFilePath.add(descFileName+"/"+entryName);
}
if(entryName.toLowerCase().contains(".files") && entryName.contains("/")){
String[] split = entryName.split("/");
filesFilePath.add(split[0]);
}
descFileDir = descFileNames + entryName;
if (entry.isDirectory()) {
// 如果entry是一个目录,则创建目录
new File(descFileDir).mkdirs();
continue;
} else {
// 如果entry是一个文件,则创建父目录
new File(descFileDir).getParentFile().mkdirs();
}
File file = new File(descFileDir);
// 打开文件输出流
OutputStream os = new FileOutputStream(file);
// 从ZipFile对象中打开entry的输入流
InputStream is = zipFile.getInputStream(entry);
while ((readByte = is.read(buf)) != -1) {
os.write(buf, 0, readByte);
}
os.close();
is.close();
}
zipFile.close();
if(filesFilePath.size() != htmlFilePath.size() ){
return null;
}
logger.debug("文件解压成功!");
return htmlFilePath;
} catch (Exception e) {
logger.debug("文件解压失败:" + e.getMessage());
return null;
}
}
二。接下来就是解析html内容
//总方法
@Transactional(readOnly = false)
public String totalAnalysisEvaluationHtml(MultiPaperVo multiPaperVo ,List<String> htmlUrlList) {
try {
if(multiPaperVo ==null || htmlUrlList == null || htmlUrlList.size() < 1 ){
return "没有查询到文件信息";
}
//校验paper数据
String checkPaperInfo = this.checkPaperInfo(multiPaperVo);
if(StringUtils.isNotBlank(checkPaperInfo)){
return checkPaperInfo;
}
//将paper信息存入
QtPaper qtPaper = new QtPaper();
qtPaper = this.insertPaperInfo(multiPaperVo, qtPaper);
//循环解析数据
for(String htmlUrl : htmlUrlList){
if(StringUtils.isBlank(htmlUrl)){
continue;
}
//解析数据
File input= new File(htmlUrl);
//判断是否有文件
if(!input.exists()){
logger.debug("file is null:zipan bu dui");
continue;
}
//解析文件
Document doc = Jsoup.parse(input, "GBK", "");//input 要解析的文件 ,要解析文件的编码 ,假如没有文件的时候要解析的文件
//获取文件内容
Element body = doc.body();
//处理图片带来的垃圾数据
body=this.refiningBody(body);
//校验图片信息
String cheackImg = this.cheackImg(body);
if(StringUtils.isNotBlank(cheackImg)){
return cheackImg;
}
//分解信息
List<Element> analysisQtPaper = this.analysisQtPaper(body);
if(analysisQtPaper==null || analysisQtPaper.size() < 1){
return "试卷中不存在<Ф试卷名称Ф>";
}
//得到解析的集合
TestPaperVo vo = this.arrangePaper(analysisQtPaper);
//校验数据
String checkExamPaper = this.checkExamPaper( vo , analysisQtPaper , qtPaper);
if(StringUtils.isNotBlank(checkExamPaper)){
return checkExamPaper;
}
// 删除以前的图片
String deletePic = this.deletePic(qtPaper);
if(StringUtils.isNotBlank(deletePic)){
logger.error("服务器删除图片出错,图片地址为:"+deletePic);
}
//t图片的替换
body = this.replaceImgs(body, htmlUrl,qtPaper);
//判断是否有图片 //如果有
Elements links = body.getElementsByTag("img");
if(links != null && links.size()>0){
analysisQtPaper = this.analysisQtPaper(body);
vo = this.arrangePaper(analysisQtPaper);
}
//分解插入数据方法
String insertExamPaperInfo = this.insertExamPaperInfo(vo,qtPaper);
if(StringUtils.isNotBlank(insertExamPaperInfo)){
return insertExamPaperInfo;
}
}
return null;
} catch (Exception e) {
e.printStackTrace();
return RestCode.FAILURE.getCode();
}
}
1.我们从解析文件开始说,解析文件就是将文件变为Document,方便后续文件的解析,其实从这开始就是一些业务的数据,然后是获取body的数据,因为我们要的就是文档的内容,其他的就不要了
2.其实文档有图片的话,源码中很多的垃圾解释,而且超级长,据说是ie浏览器低版本采用的东西,我们要将他去掉
/**
* 处理图片带来的垃圾数据
*
* @param body
* @return
*/
public Element refiningBody(Element body){
Elements elist = body.getElementsByTag("span");
for(Element e:elist){
while(e.html().indexOf("<!--") !=-1){
String str = e.html().substring(e.html().toString().indexOf("<!--"), e.html().toString().indexOf("-->")+3);
e.html( e.html().replace(str, ""));
}
}
Elements plist = body.getElementsByTag("p");
for(Element ep:plist){
while(ep.html().indexOf("<!--") !=-1){
String str = ep.html().substring(ep.html().toString().indexOf("<!--"), ep.html().toString().indexOf("-->")+3);
ep.html(ep.html().replace(str, ""));
}
}
return body;
}
3.校验图片主要是校验外网图片,因为很多文档从网上直接下载的,图片是个链接,这样的我们暂时不要,假如你需要,校验出来后,你要去外网下载图片,然后上传
/**
* 校验图片的方法
*
* @param body
* @return
*/
public String cheackImg(Element body){
Elements links = body.getElementsByTag("img");
for (Element link : links) {
String linkHref = link.attr("src");
if(linkHref.contains("http") || linkHref.contains("https")){//如果是外网的绝对路径,不转直接存
return "图片包含外网链接图片:"+linkHref;
}
}
return null;
}
4.分解数据,就是将他按照段落分解,然后存入集合,个人习惯,你也可以直接用
/**
* 分解题目信息
*
* @param body
* @return
*/
public List<Element> analysisQtPaper(Element body){
//获得第一个标签
Element firstElement= body.getElementsContainingOwnText(LabelEnum.PAPERS_NAME.getDesc()).first();
if(firstElement == null){
return null;
}
// Elements el = firstElement.parents();
// firstElement = el.first();
//目前向上获取到p标签
boolean selectP = true;//是否向上
while(selectP){
if(!firstElement.tagName().equals("p")){
firstElement = firstElement.parent();
}else{
selectP = false;
}
}
//寻找该标签下所有的兄弟标签
boolean isAddFor = true;//是否有下一个兄弟标签
List<Element> bodyElementList = new ArrayList<>();//用于存储所有的兄弟信息
bodyElementList.add(firstElement);
while (isAddFor) {
firstElement = firstElement.nextElementSibling();
if(firstElement == null){
isAddFor=false;
}else{
bodyElementList.add(firstElement);
}
}
return bodyElementList;
}
*其中有一个是标签枚举类,这些事自定义标签
public enum LabelEnum {
PAPERS_NAME("Ф试卷名称Ф", "试卷名称"),
PAPERS_DESC("Ф试卷简介Ф", "试卷简介"),
PAPERS_DEMAND("Ф作答要求Ф", "作答要求"),
PAPERS_INFO_ONE("<Ⅰ卷>", "Ⅰ卷"),
PAPERS_INFO_TWO("<Ⅱ卷>", "Ⅱ卷"),
QUESTION_FATHER_CONTENT("Ф父题内容Ф", "父题内容"),
QUESTION_FATHER_CONTENT_END("Ф父题结束Ф", "父题结束"),
QUESTION_CONTENT("Ф题目内容Ф", "题目内容"),
QUESTION_MODULE("Ф试题所属模块Ф", "试题所属模块"),
QUESTION_SCORE("Ф分值Ф", "分值"),
QUESTION_SUBJECTIVE("Ф主观题Ф", "主观题"),
QUESTION_ANALYZE("Ф试题解析Ф", "试题解析"),
QUESTION_OPTION_APPRAISE("Ф选项评析Ф", "选项评析"),
QUESTION_ANSWER_APPRAISE("Ф答案评析Ф", "答案评析"),
QUESTION_CLASSIFICATION("Ф试题所属分类Ф", "试题所属分类"),
OPTION_COMMENTARY("<★>", "★");
String key;
String desc;
LabelEnum(String key, String desc) {
this.key = key;
this.desc = desc;
}
public static LabelEnum getEnumByKey(String key) {
LabelEnum[] enums = LabelEnum.values();
for (LabelEnum em : enums) {
if (em.getKey().equals(key)) {
return em;
}
}
return null;
}
public String getKey() {
return key;
}
public void setKey(String key) {
this.key = key;
}
public String getDesc() {
return desc;
}
public void setDesc(String desc) {
this.desc = desc;
}
}
5.分析数据,并将数据分类,代码中的标签内段落数差异很大,所以要做成这样
/**
* 分解数据
*
* @param bodyList
* @return
*/
public TestPaperVo arrangePaper(List<Element> bodyList){
//试卷名称
List<Element> exNameList = new ArrayList<>();
//试卷简介
List<Element> exDescList = new ArrayList<>();
//作答要求
List<Element> exDemandList = new ArrayList<>();
//卷体信息
List<Element> exInfoList = new ArrayList<>();
List<Element> exInfoNextList = new ArrayList<>();
//题目信息
List<List<Element>> exQuestionList = new ArrayList<>();
List<Element> questionList = new ArrayList<>();
//试题分值
List<Element> questionScoreList = new ArrayList<>();
//试题所属模块
List<Element> questionMouduleList = new ArrayList<>();
//试题数量
Integer questionNum = 0;
//设置
Boolean isExName = Boolean.FALSE;
Boolean isExDesc = Boolean.FALSE;
Boolean isDemand = Boolean.FALSE;
Boolean isExInfo = Boolean.FALSE;
Boolean isNew = Boolean.FALSE;
Boolean isAdd = Boolean.FALSE;
Boolean isfather = Boolean.FALSE;
Boolean isMoudule = Boolean.FALSE;
int paperInfo=0;
for(int f=0;f< bodyList.size() ; f++){
//是否是试卷名称
if(bodyList.get(f).text().contains(LabelEnum.PAPERS_NAME.getKey())){
isExName= Boolean.TRUE;
}
//是否是简介
if(bodyList.get(f).text().contains(LabelEnum.PAPERS_DESC.getKey())){
isExDesc= Boolean.TRUE;
isExName= Boolean.FALSE;
isDemand= Boolean.FALSE;
isExInfo= Boolean.FALSE;
}
//是否是作答要求
if(bodyList.get(f).text().contains(LabelEnum.PAPERS_DEMAND.getKey())){
isDemand=Boolean.TRUE;
isExName= Boolean.FALSE;
isExDesc= Boolean.FALSE;
isExInfo= Boolean.FALSE;
}
//卷体信息
if(bodyList.get(f).text().contains(LabelEnum.PAPERS_INFO_ONE.getKey())||bodyList.get(f).text().contains(LabelEnum.PAPERS_INFO_TWO.getKey())){
isExInfo = Boolean.TRUE;
isExName= Boolean.FALSE;
isExDesc= Boolean.FALSE;
isDemand= Boolean.FALSE;
paperInfo++;
}
//题目内容
//是否是父题
if(bodyList.get(f).text().contains(LabelEnum.QUESTION_FATHER_CONTENT.getKey())){
isNew =Boolean.TRUE;
isAdd =Boolean.TRUE;
isfather =Boolean.TRUE;
isExName= Boolean.FALSE;
isExDesc= Boolean.FALSE;
isDemand= Boolean.FALSE;
isExInfo= Boolean.FALSE;
}
//是父子题目的判断结尾
if(bodyList.get(f).text().contains(LabelEnum.QUESTION_FATHER_CONTENT_END.getKey())){
isfather =Boolean.FALSE;
isExName= Boolean.FALSE;
isExDesc= Boolean.FALSE;
isDemand= Boolean.FALSE;
isExInfo= Boolean.FALSE;
}
//不是父子题目
if(bodyList.get(f).text().contains(LabelEnum.QUESTION_CONTENT.getKey()) && !isfather){
isNew =Boolean.TRUE;
isAdd =Boolean.TRUE;
isExName= Boolean.FALSE;
isExDesc= Boolean.FALSE;
isDemand= Boolean.FALSE;
isExInfo= Boolean.FALSE;
}
if(bodyList.get(f).text().contains(LabelEnum.QUESTION_MODULE.getKey()) ){
isMoudule =Boolean.TRUE;
}
if(paperInfo>1&&(bodyList.get(f).text().contains(LabelEnum.PAPERS_INFO_ONE.getKey()) || bodyList.get(f).text().contains(LabelEnum.PAPERS_INFO_TWO.getKey()))){
isNew =Boolean.TRUE;
isAdd =Boolean.TRUE;
isExName= Boolean.FALSE;
isExDesc= Boolean.FALSE;
isDemand= Boolean.FALSE;
// isExInfo= Boolean.FALSE;
}
//根据以上判断是否加入
if(isExName){
exNameList.add(bodyList.get(f));
}
if(isExDesc){
exDescList.add(bodyList.get(f));
}
if(isDemand){
exDemandList.add(bodyList.get(f));
}
if(isExInfo){
exInfoList.add(bodyList.get(f));
exInfoNextList.add(bodyList.get(f+1));
}
if(isMoudule){
questionMouduleList.add(bodyList.get(f));
isMoudule = false;
}
if(isAdd){
if(isNew){
isNew=false;
if(questionList.size()>0){
exQuestionList.add(questionList);
}
questionList = new ArrayList<>();
}
questionList.add(bodyList.get(f));
//如果是最后一个,要判断是否要加、
if(f>(bodyList.size()-2)){
if(questionList.size()>0){
exQuestionList.add(questionList);
}
}
}
//试题数量统计
if(bodyList.get(f).text().contains(LabelEnum.QUESTION_CONTENT.getKey())){
questionNum++;
}
//试题分值项目
if(bodyList.get(f).text().contains(LabelEnum.QUESTION_SCORE.getKey())){
questionScoreList.add(bodyList.get(f));
}
}
//添加返回信息
TestPaperVo vo = new TestPaperVo();
vo.setExNameList(exNameList);
vo.setExDescList(exDescList);
vo.setExDemandList(exDemandList);
vo.setExInfoList(exInfoList);
vo.setExQuestionList(exQuestionList);
vo.setQuestionNum(questionNum);
vo.setQuestionScoreList(questionScoreList);
vo.setExInfoNextList(exInfoNextList);
vo.setQuestionMouduleList(questionMouduleList);
return vo;
}
6.检验此处不错解释,业务需求
public String checkExamPaper(TestPaperVo vo ,List<Element> analysisQtPaper ,QtPaper qtPaper){
//验证父题的对应关系
String checkFatherLabel = this.checkFatherLabel(analysisQtPaper);
if(StringUtils.isNotBlank(checkFatherLabel)){
if(checkFatherLabel.length()>50){
return "<Ф父题内容Ф>和<Ф父题结束Ф>标签不对应,出错原文内容:"+checkFatherLabel.substring(0, 50);
}else{
return "<Ф父题内容Ф>和<Ф父题结束Ф>标签不对应,出错原文内容:"+checkFatherLabel;
}
}
//校验标签数量
String checkHtmlLabelNum = this.checkHtmlLabelNum(analysisQtPaper);
if(StringUtils.isNotBlank(checkHtmlLabelNum)){
// return "重复标签:"+checkHtmlLabelNum;
if(checkHtmlLabelNum.length()>50){
return "同行内出现多个标签,出错原文内容:"+checkHtmlLabelNum.substring(0, 50);
}else{
return "同行内出现多个标签,出错原文内容:"+checkHtmlLabelNum;
}
}
//校验标签和规范
String checkHtmlLabelNorm = this.checkHtmlLabelNorm(analysisQtPaper);
if(StringUtils.isNotBlank(checkHtmlLabelNorm)){
// return "标签不规范:"+checkHtmlLabelNorm;
if(checkHtmlLabelNorm.length()>50){
return "试卷中有标签错误,错误原文内容:"+checkHtmlLabelNorm.substring(0,50);
}else{
return "试卷中有标签错误,错误原文内容:"+checkHtmlLabelNorm;
}
}
//校验分数
String checkExamScore = this.checkExamScore(qtPaper,vo);
if(StringUtils.isNotBlank(checkExamScore)){
// return checkExamScore;
return "试卷分值总分和小题分值之和不等";
}
//校验数量
if(qtPaper.getQuestionCount() != vo.getQuestionNum()){
// return "页面填写题目数量与页面题目数量不相等:页面填写题目数量:"+qtPaper.getQuestionCount()+"试卷题目数量"+vo.getQuestionNum();
return "试题数目与<Ф题目内容Ф>标签总数不等";
}
//校验卷体信息
if(vo.getExInfoList().size() > 0){
//标签要么没有,要么成对出现
if(vo.getExInfoList().size() != 2){
return "<Ⅰ卷>和<Ⅱ卷>标签不对应";
}
if(vo.getExInfoNextList().size()<1 || !vo.getExInfoNextList().get(0).text().equals(vo.getExQuestionList().get(0).get(0).text()) ){
// return "卷体信息只能写于第一个题目内容前";
return "<Ⅰ卷>和<Ⅱ卷>标签不对应";
}
if(vo.getExInfoNextList().size()==2){
if(!vo.getExInfoNextList().get(1).text().contains(LabelEnum.QUESTION_FATHER_CONTENT.getKey()) && !vo.getExInfoNextList().get(1).text().contains(LabelEnum.QUESTION_CONTENT.getKey()) ){
return "<Ⅰ卷>和<Ⅱ卷>标签不对应";
}
}
}
//卷中题的属性数目必须等于字典中定义的卷属性数目
String checkMudule = checkMudule(vo,qtPaper);
if(null!= checkMudule){
return checkMudule;
}
//校验题目规范
String checkExamTopic = this.checkExamTopic(vo);
if(StringUtils.isNotBlank(checkExamTopic)){
return checkExamTopic;
}
return null;
}
7.删除照片是这样的,一种类型的卷子只能有一套,这样来说的话,你要上传就要把以前的照片删除,自我判断是否需要
/**
* 删除以前所有的图片
*
* @param qtPaper
* @return
*/
public String deletePic(QtPaper qtPaper){
UploadFileUtil uploadFileUtil = SpringContextHolder.getBean(UploadFileUtil.class);
String paperPice = qtPaper.getGrade()+"_"+qtPaper.getClassDivideType()+"_"+qtPaper.getTestType();
String imagePath = uploadFileUtil.getImagePath(UploadFileUtil.FileType.paper, "temc", paperPice);//要删除的路径
//截取图片地址
String[] split = imagePath.split(paperPice);
imagePath = split[0]+paperPice;
//如果删除成功返回空,如果失败返回删除路径
//删除的方法
boolean deleteDir = FileUtils.deleteDir(new File(imagePath));
if(deleteDir){
return null;
}else{
return imagePath;
}
}
8.替换图片主要作用是将上传的图片地址重新放入文档内容中,并将3k以下变为64的码
/**
* 替换图片
* @param body
* @param htmlUrl
* @return
*/
public Element replaceImgs(Element body,String htmlUrl,QtPaper qtPaper ){
String paperPice = qtPaper.getGrade()+"_"+qtPaper.getClassDivideType()+"_"+qtPaper.getTestType();
Elements links = body.getElementsByTag("img");
UploadFileUtil uploadFileUtil = SpringContextHolder.getBean(UploadFileUtil.class);
//在此处获取所有的图片 可将图片替换
for (Element link : links) {
String linkHref = link.attr("src");
String fileName =uploadFileUtil.getImagePath(UploadFileUtil.FileType.paper, "temc", paperPice, IdGen.uuid() +linkHref.substring(linkHref.lastIndexOf("/")+1,linkHref.length()));
//把文件从临时目录拷贝到正式目录,超过3k文件保存,否则base64存储
FileUtils.copyFile(htmlUrl.substring(0, htmlUrl.lastIndexOf("/")+1)+linkHref, fileName);
String picurl = uploadFileUtil.getImageUrl(fileName);
linkHref = FileUtils.getImageStr(fileName);
if(linkHref.equals("0")){
link.attr("src",picurl);
}else{
link.attr("src",linkHref);
}
}
return body;
}
/**
* @Description: 根据图片地址转换为base64编码字符串
* @Author:
* @CreateTime:
* @return
*/
public static String getImageStr(String imgFile) {
String imageString="data:image/png;base64,";//拼上base64显示标签
InputStream inputStream = null;
byte[] data = null;
try {
inputStream = new FileInputStream(imgFile);
data = new byte[inputStream.available()];
inputStream.read(data);
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
// 加密
imageString =imageString+ new String(Base64.encodeBase64(data));
return imageString;
}
9.因为上文将该类下图片全部删掉了,所以本文由图片就去重新分析一下,具体为什么,记不清了
10。分析插入数据。只贴代码,不做解释,待会贴几个方法
/**
* 插入数据信息
*
* @param vo
*/
private String insertExamPaperInfo(TestPaperVo vo,QtPaper qtPaper) {
//paper信息
List<String> labelList = new ArrayList<>();
if(vo.getExNameList() != null && vo.getExNameList().size()>0){
labelList.add("<"+LabelEnum.PAPERS_NAME.getKey()+">");
qtPaper.setPaperName(this.deleteLabelString(vo.getExNameList(),labelList));//试卷名称
}
qtPaper.setClassify(Constants.TEST_ASSESSMENT_CLASSIFY_MULTI);//测评分类
qtPaper.setPaperState(Constants.UNPUBLISH_PAPER_STATE);//是否发布
qtPaper.setDelFlag(Constants.DEL_FLAG_NORMAL);//是否删除
if(vo.getExDescList() != null && vo.getExDescList().size()>0){
labelList.clear();
labelList.add("<"+LabelEnum.PAPERS_DESC.getKey()+">");
qtPaper.setDescription(this.deleteLabelString(vo.getExDescList(),labelList));//简介
}
if(vo.getExDemandList() != null && vo.getExDemandList().size() > 0){
labelList.clear();
labelList.add("<"+LabelEnum.PAPERS_DEMAND.getKey()+">");
qtPaper.setDemand(this.deleteLabelString(vo.getExDemandList(),labelList));//简介
}
//循环得到题目具体信息
TransferVo tVo= new TransferVo();
if(vo.getExInfoList()!= null && vo.getExInfoList().size() >0 ){
if(vo.getExInfoList().get(0).text().contains(LabelEnum.PAPERS_INFO_ONE.getKey()) ){
tVo.setPaperInfo(Constants.PAPER_INFO_ONE);
}else{
tVo.setPaperInfo(Constants.PAPER_INFO_TWO);
}
}
//取卷体信息第一个作为卷体信息
List<QtQuestion> questions = new ArrayList<>();
for(List<Element> elListFor : vo.getExQuestionList()){
//判断是否包含卷体信息
if(elListFor.get(0).text().contains(LabelEnum.PAPERS_INFO_ONE.getKey()) || elListFor.get(0).text().contains(LabelEnum.PAPERS_INFO_TWO.getKey()) ){
if(elListFor.get(0).text().contains(LabelEnum.PAPERS_INFO_ONE.getKey())){
tVo.setPaperInfo(Constants.PAPER_INFO_ONE);
}else{
tVo.setPaperInfo(Constants.PAPER_INFO_TWO);
}
continue;
}
//判断是否是父子题
if(elListFor.get(0).text().contains(LabelEnum.QUESTION_FATHER_CONTENT.getKey()) ){
QtQuestion combingFatherTopic = this.combingFatherTopic(elListFor, tVo);
questions.add(combingFatherTopic);
}else{//单体
QtQuestion combingAloneTopic = this.combingAloneTopic(elListFor, tVo, null, Constants.Level_SINGLE);
questions.add(combingAloneTopic);
}
}
qtPaper.setQuestions(questions);
//调用插入方法
boolean savePaperWithQuestions = ptPaperService.saveWithQuestions(qtPaper);
if(!savePaperWithQuestions){
return "信息入库时失败,请稍后重试!";
}
return null;
}
/**
* 父子题目的方法
* @param elListFor
* @param tVo
* @return
*/
public QtQuestion combingFatherTopic(List<Element> elListFor, TransferVo tVo){
//先得到父题内容
boolean isFatherTitle = false;
List<Element> fatherTitleList = new ArrayList<>();
//试题解析
boolean isFatherAnalyze = false;
List<Element> fatherAnalyzeList = new ArrayList<>();
//所有小题目
boolean isQuestion = false;
boolean isNew = false;
List<List<Element>> questionList = new ArrayList<>();
//单个题目
List<Element> questionSingleList = new ArrayList<>();
for(Element elFor : elListFor){
if(elFor.text().contains(LabelEnum.QUESTION_FATHER_CONTENT.getKey())){
isFatherTitle=true;
}
if(!isQuestion && elFor.text().contains(LabelEnum.QUESTION_ANALYZE.getKey()) ){
isFatherAnalyze=true;
isFatherTitle=false;
}
if(elFor.text().contains(LabelEnum.QUESTION_CONTENT.getKey())){
isNew =true;
isQuestion=true;
isFatherTitle=false;
isFatherAnalyze=false;
}
if(elFor.text().contains(LabelEnum.QUESTION_FATHER_CONTENT_END.getKey())){
if(questionSingleList.size()>0){
questionList.add(questionSingleList);
}
isQuestion=false;
}
if(isFatherTitle){
fatherTitleList.add(elFor);
}
if(isFatherAnalyze){
fatherAnalyzeList.add(elFor);
}
if(isQuestion){
if(isNew){
isNew=false;
if(questionSingleList.size()>0){
questionList.add(questionSingleList);
}
questionSingleList = new ArrayList<>();
}
questionSingleList.add(elFor);
}
}
//梳理父子题
List<String> labellist =new ArrayList<>();
labellist.add(LabelEnum.QUESTION_FATHER_CONTENT.getKey());
String description =this.deleteLabelString(fatherTitleList , labellist);
//父标题
QtQuestion insertQtQuestion = this.insertQtQuestion( tVo,null, description, null,"",Constants.Level_ParentChild_PARENT,0);
List<QtQuestionAttribute> attributes = new ArrayList<>(); // 试题属性
List<QtQuestion> children = new ArrayList<>(); // 该题的子题,注意设置对应的 level
//判断是否有解析
if(fatherAnalyzeList.size() > 0 ){
labellist.clear();
labellist.add(LabelEnum.QUESTION_ANALYZE.getKey());
QtQuestionAttribute insertQtQuestionAttribute = this.insertQtQuestionAttribute("stsxAnalysis", this.deleteLabelString(fatherAnalyzeList , labellist));
attributes.add(insertQtQuestionAttribute);
insertQtQuestion.setAttributes(attributes);
}
for(int f=0 ; f<questionList.size() ; f++){
QtQuestion combingAloneTopic = combingAloneTopic(questionList.get(f), tVo, f+1,Constants.Level_ParentChild_CHILD);
children.add(combingAloneTopic);
}
insertQtQuestion.setChildren(children);
return insertQtQuestion;
}
/**
* 单体题目的方法
*
* @param elListFor
* @param tVo
* @param orderNumber 父子题目的时候才传递
* @param level 等级
*/
public QtQuestion combingAloneTopic(List<Element> elListFor ,TransferVo tVo,Integer orderNumber,String level){
if(elListFor == null || elListFor.size() < 1){
return null;
}
//设置题目内容集合
Boolean isTitleContent =Boolean.FALSE;
List<Element> titleContentList = new ArrayList<>();
//所属模块
Boolean isModule =Boolean.FALSE;
List<Element> moduleList = new ArrayList<>();
//试题所属分类
Boolean isClassification =Boolean.FALSE;
List<Element> classificationList = new ArrayList<>();
//分值
Boolean isScore =Boolean.FALSE;
List<Element> scoreList = new ArrayList<>();
//主客观
Boolean isObjective =Boolean.TRUE;//客观
//试题解析
Boolean isQuestionAnalyze =Boolean.FALSE;
List<Element> questionAnalyzeList = new ArrayList<>();
//选项评析
Boolean isQuestionOptionAppraise =Boolean.FALSE;
List<Element> questionOptionAppraiseList = new ArrayList<>();
//答案评析
Boolean isQuestionAnswerAppraise =Boolean.FALSE;
List<Element> questionAnswerAppraiseList = new ArrayList<>();
//循环数据
for(Element elFor : elListFor){
if(elFor.text().contains(LabelEnum.QUESTION_CONTENT.getKey())){
isTitleContent =Boolean.TRUE;
isModule =Boolean.FALSE;
isClassification =Boolean.FALSE;
isScore =Boolean.FALSE;
isQuestionAnalyze =Boolean.FALSE;
isQuestionOptionAppraise =Boolean.FALSE;
isQuestionAnswerAppraise =Boolean.FALSE;
}
if(elFor.text().contains(LabelEnum.QUESTION_MODULE.getKey())){
isModule =Boolean.TRUE;
isTitleContent =Boolean.FALSE;
isClassification =Boolean.FALSE;
isScore =Boolean.FALSE;
isQuestionAnalyze =Boolean.FALSE;
isQuestionOptionAppraise =Boolean.FALSE;
isQuestionAnswerAppraise =Boolean.FALSE;
}
if(elFor.text().contains(LabelEnum.QUESTION_CLASSIFICATION.getKey())){
isClassification =Boolean.TRUE;
isTitleContent =Boolean.FALSE;
isModule =Boolean.FALSE;
isScore =Boolean.FALSE;
isQuestionAnalyze =Boolean.FALSE;
isQuestionOptionAppraise =Boolean.FALSE;
isQuestionAnswerAppraise =Boolean.FALSE;
}
if(elFor.text().contains(LabelEnum.QUESTION_SCORE.getKey())){
isScore =Boolean.TRUE;
isTitleContent =Boolean.FALSE;
isModule =Boolean.FALSE;
isClassification =Boolean.FALSE;
isQuestionAnalyze =Boolean.FALSE;
isQuestionOptionAppraise =Boolean.FALSE;
isQuestionAnswerAppraise =Boolean.FALSE;
}
if(elFor.text().contains(LabelEnum.QUESTION_SUBJECTIVE.getKey())){
isObjective =Boolean.FALSE;
isTitleContent =Boolean.FALSE;
isModule =Boolean.FALSE;
isClassification =Boolean.FALSE;
isScore =Boolean.FALSE;
isQuestionAnalyze =Boolean.FALSE;
isQuestionOptionAppraise =Boolean.FALSE;
isQuestionAnswerAppraise =Boolean.FALSE;
}
if(elFor.text().contains(LabelEnum.QUESTION_ANALYZE.getKey())){
isQuestionAnalyze =Boolean.TRUE;
isTitleContent =Boolean.FALSE;
isModule =Boolean.FALSE;
isClassification =Boolean.FALSE;
isScore =Boolean.FALSE;
isQuestionOptionAppraise =Boolean.FALSE;
isQuestionAnswerAppraise =Boolean.FALSE;
}
if(elFor.text().contains(LabelEnum.QUESTION_OPTION_APPRAISE.getKey())){
isQuestionOptionAppraise =Boolean.TRUE;
isTitleContent =Boolean.FALSE;
isModule =Boolean.FALSE;
isClassification =Boolean.FALSE;
isScore =Boolean.FALSE;
isQuestionAnalyze =Boolean.FALSE;
isQuestionAnswerAppraise =Boolean.FALSE;
}
if(elFor.text().contains(LabelEnum.QUESTION_ANSWER_APPRAISE.getKey())){
isQuestionAnswerAppraise =Boolean.TRUE;
isTitleContent =Boolean.FALSE;
isModule =Boolean.FALSE;
isClassification =Boolean.FALSE;
isScore =Boolean.FALSE;
isQuestionAnalyze =Boolean.FALSE;
isQuestionOptionAppraise =Boolean.FALSE;
}
if(isTitleContent){
titleContentList.add(elFor);
}
if(isModule){
moduleList.add(elFor);
}
if(isClassification){
classificationList.add(elFor);
}
if(isScore){
scoreList.add(elFor);
}
if(isQuestionAnalyze){
questionAnalyzeList.add(elFor);
}
if(isQuestionOptionAppraise){
questionOptionAppraiseList.add(elFor);
}
if(isQuestionAnswerAppraise){
questionAnswerAppraiseList.add(elFor);
}
}
//获得数据信息
List<String> labellist =new ArrayList<>();
//题目
labellist.add(LabelEnum.QUESTION_CONTENT.getKey());
String description =this.deleteLabelString(titleContentList , labellist);
//模块
labellist.clear();
labellist.add(LabelEnum.QUESTION_MODULE.getKey());
String module =this.deleteLabelString(moduleList , labellist);
//分类
labellist.clear();
labellist.add(LabelEnum.QUESTION_CLASSIFICATION.getKey());
String classification =this.deleteLabelString(classificationList , labellist);
//分值
List<Integer> elementScore = this.getElementScore(scoreList.get(0).text());
Integer topScore =0;
if(elementScore.size()>1){
for(Integer doFor : elementScore){
if(topScore < doFor){
topScore=doFor;
}
}
}else{
topScore =elementScore.get(0);
}
//试题解析
String questionAnalyze ="";
if(questionAnalyzeList.size()>0){
labellist.clear();
labellist.add(LabelEnum.QUESTION_ANALYZE.getKey());
questionAnalyze =this.deleteLabelString(questionAnalyzeList , labellist);
}
//答案
String questionAnswerAppraise = "";
if(questionAnswerAppraiseList.size()>0){
labellist.clear();
labellist.add(LabelEnum.QUESTION_ANSWER_APPRAISE.getKey());
questionAnswerAppraise =this.deleteLabelString(questionAnswerAppraiseList , labellist);
}
//开始对表
//QtQuestion
QtQuestion insertQtQuestion = this.insertQtQuestion(tVo, isObjective?Constants.QuestionTypeChoice:Constants.QuestionTypeEssay, description, orderNumber,"", level, topScore);
//QtQuestionAttribute //四个
List<QtQuestionAttribute> attributes = new ArrayList<>(); // 试题属性
if(questionAnalyzeList.size()>0){
QtQuestionAttribute insertQtQuestionAttribute = this.insertQtQuestionAttribute("stsxAnalysis", questionAnalyze);//试题解析
attributes.add(insertQtQuestionAttribute);
}
if(questionAnswerAppraiseList.size()>0){
QtQuestionAttribute insertQtQuestionAttribute2 = this.insertQtQuestionAttribute("stsxEvaluate", questionAnswerAppraise);//答案评析
attributes.add(insertQtQuestionAttribute2);
}
if(StringUtils.isNotBlank(module)){
Integer modules = charAtStringReturnDouble(module);
if(null != modules){
QtQuestionAttribute insertQtQuestionAttribute3 = this.insertQtQuestionAttribute("stsxModule", modules.toString());//所属模块
attributes.add(insertQtQuestionAttribute3);
}
}
if(StringUtils.isNotBlank(classification)){
QtQuestionAttribute insertQtQuestionAttribute4 = this.insertQtQuestionAttribute("stsxType", classification);//所属分类
attributes.add(insertQtQuestionAttribute4);
}
//insertQtQuestionChoice
if(isObjective){
List<QtQuestionAnswer> choices = this.insertQtQuestionChoice(questionOptionAppraiseList, elementScore, topScore);
insertQtQuestion.setAnswers(choices);
}
//返回类信息
insertQtQuestion.setAttributes(attributes);
return insertQtQuestion;
}
11.下面贴几个自己写的方法类
/**
* 去除标签的方法 string删除标签
*
* @param elementList
* @param labellist
* @return
*/
public String deleteLabelString(List<Element> elementList ,List<String> labellist){
String returnData ="";
//判断是否为空
if(elementList==null || elementList.size()<1 || labellist==null || labellist.size()<1){
return returnData;
}
boolean isTitle = false;
for(String leFor : labellist){
if(leFor.contains(LabelEnum.PAPERS_NAME.getKey()) || leFor.contains(LabelEnum.QUESTION_MODULE.getKey())|| leFor.contains(LabelEnum.QUESTION_CLASSIFICATION.getKey())){
isTitle = true;
}
}
for(Element elFor : elementList){
String resultData =null;
//对图片进行内容处理
boolean isConstains = false;
for(String laFor : labellist){
if(elFor.text().contains(laFor)){
isConstains =true;
}
}
if(isConstains){
//删除第一个>以前的东西
String[] split = elFor.text().split(">");
int length = split[0].length();
if(elFor.text().length()>length){
resultData = elFor.text().substring((length+1), elFor.text().length());
}
}else{
resultData = elFor.text();
}
//判断是否含有图片
Elements elementsByTag = elFor.getElementsByTag("img");
if(elementsByTag.size()>0){//有图片
//判断是不是只有图片
boolean isNull= true;
if(StringUtils.isNotBlank(resultData)){//不是空
char[] charArray = resultData.toCharArray();
charArrayFor:
for(char charFor : charArray){
if(!Character.isSpaceChar(charFor) ){
isNull=false;
break charArrayFor;
}
}
}
if(isNull){//只有图片的
returnData=returnData+"<p>";
for(Element imgFor : elementsByTag){
returnData=returnData+imgFor.toString();
}
returnData=returnData+"</p>";
}else{//图片和文字都有的
//将图片处替换
for(int i =0;i<elementsByTag.size() ;i++){
Element parent = elementsByTag.get(i).parent();
parent.appendElement("span").text("<Ф图片"+i+"Ф>");
// elementsByTag.get(i).html("<Ф图片"+i+"Ф>");
}
//将el变文字
if(isConstains){
//删除第一个>以前的东西
String[] split = elFor.text().split(">");
int length = split[0].length();
if(elFor.text().length()>length){
resultData = elFor.text().substring((length+1), elFor.text().length());
}
}else{
resultData = elFor.text();
}
//将img去替换
for(int i =0;i<elementsByTag.size() ;i++){
resultData = resultData.replaceAll("<Ф图片"+i+"Ф>", elementsByTag.get(i).toString());
}
returnData=returnData+"<P>"+resultData+"</P>";
}
}else{//没有图片
if(StringUtils.isNotBlank(resultData)){
if(isTitle){
returnData=returnData+resultData;
}else{
returnData=returnData+"<P>"+resultData+"</P>";
}
}
}
}
return returnData;
}
此方法中,判断有图片后的for循环方法,charArrayFor,其实是判断是否有中文空格啥的
/**
* 分解字符串得到数字和.的方法
*
* @param str
* @return
*/
public Integer charAtStringReturnDouble(String str){
String returnStr = new String();
str = str.trim();
Pattern p=Pattern.compile("(\\d+\\.\\d+)");
Matcher m=p.matcher(str);
if(m.find()){
returnStr=m.group(1);
}else{
p= Pattern.compile("(\\d+)");
m=p.matcher(str);
if(m.find()){
returnStr=m.group(1);
}
}
if(StringUtils.isNotBlank(returnStr)){
return Integer.parseInt(returnStr);
}else{
return null;
}
}
先写到这,下片文章,写一下关于jsoup的方法,假如以后用,能用到嫩