由于很多学员在发表帖子的时候从word将内容粘贴过来。带来了很多word样式,其中包含JSON中不允许的半角双引号。造成帖子内容超过系统限定字符,并且对帖子数据的现实产生了影响。
解决办法:利用正则表达式将帖子中的样式进行过滤。代码如下
public static String tagsProcessor(String inputString){
String processedString = "" ;
Pattern js_pattern,style_pattern, html_pattern ,particular_pattern;
Matcher js_matcher,style_matcher,html_matcher,particular_matcher;
//过滤js的正则表达式
String js_str = "<script.*>X*.*</script>X*";
//过滤样式的正则表达式
String style_str = "<style.*>X*.*</style>X*";
//过滤html标签的正则表达式
String html_str = "<[^>]+>" ;
String particular_str = ">|&| |"";
//处理js标签
js_pattern = Pattern.compile(js_str, Pattern.CASE_INSENSITIVE);
js_matcher = js_pattern.matcher(inputString);
processedString = js_matcher.replaceAll("");
//处理样式标签
style_pattern = Pattern.compile(style_str, Pattern.CASE_INSENSITIVE);
style_matcher = style_pattern.matcher(processedString);
processedString = style_matcher.replaceAll("");
//处理html标签
html_pattern = Pattern.compile(html_str, Pattern.CASE_INSENSITIVE);
html_matcher = html_pattern.matcher(processedString);
processedString = html_matcher.replaceAll("");
particular_pattern = Pattern.compile(particular_str,Pattern.CASE_INSENSITIVE);
particular_matcher = particular_pattern.matcher(processedString);
processedString = particular_matcher.replaceAll("");
return processedString;
}