Java过滤任意(script,html,style)标签符,返回纯文本--封装类

最新推荐文章于 2022-09-15 15:38:18 发布

su苏哥

最新推荐文章于 2022-09-15 15:38:18 发布

阅读量447

点赞数

分类专栏： Java漏洞

本文链接：https://blog.csdn.net/qq_35174791/article/details/79541148

版权

Java漏洞专栏收录该内容

3 篇文章 0 订阅

订阅专栏

 
 import java.util.regex.Pattern;  
  
**  
* 过滤标签字符串，返回纯文本  
*  
*/  
ublic class ChangePlainText {  
  
   public static void main(String[] args) {  
  
       String test="<b>hi</b></br><h1>hello~</h1><哈哈>";  
  
       String b=ChangePlainText.Html2Text(test);  
  
       System.out.println(b);  
   }  
  
   public static String Html2Text(String inputString) {  
       String htmlStr = inputString; // 含html标签的字符串  
       String textStr = "";  
       java.util.regex.Pattern p_script;  
       java.util.regex.Matcher m_script;  
       java.util.regex.Pattern p_style;  
       java.util.regex.Matcher m_style;  
       java.util.regex.Pattern p_html;  
       java.util.regex.Matcher m_html;  
  
       try {  
           String regEx_script = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>"; // 定义script的正则表达式{或<script[^>]*?>[\\s\\S]*?<\\/script>  
                                                                                                       // }  
           String regEx_style = "<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>"; // 定义style的正则表达式{或<style[^>]*?>[\\s\\S]*?<\\/style>  
                                                                                                   // }  
           String regEx_html = "<[^>]+>"; // 定义HTML标签的正则表达式  
  
           p_script = Pattern.compile(regEx_script, Pattern.CASE_INSENSITIVE);  
           m_script = p_script.matcher(htmlStr);  
           htmlStr = m_script.replaceAll(""); // 过滤script标签  
  
           p_style = Pattern.compile(regEx_style, Pattern.CASE_INSENSITIVE);  
           m_style = p_style.matcher(htmlStr);  
           htmlStr = m_style.replaceAll(""); // 过滤style标签  
  
           p_html = Pattern.compile(regEx_html, Pattern.CASE_INSENSITIVE);  
           m_html = p_html.matcher(htmlStr);  
           htmlStr = m_html.replaceAll(""); // 过滤html标签  
  
           textStr = htmlStr;  
  
       } catch (Exception e) {  
           System.err.println("Html2Text: " + e.getMessage());  
       }  
  
       return textStr;// 返回文本字符串  
   }  
 

su苏哥

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Java过滤任意(script,html,style)标签符,返回纯文本--封装类

import java.util.regex.Pattern; ** * 过滤标签字符串，返回纯文本 * */ ublic class ChangePlainText { public static void main(String[] args) { String test="&lt;b&gt;hi&lt;/b&gt;&lt;/br&gt;&lt;h...
复制链接

扫一扫