清除字符串中HTML标签

最新推荐文章于 2023-06-20 10:31:38 发布

蒙奇奇的故事

最新推荐文章于 2023-06-20 10:31:38 发布

阅读量938

点赞数

分类专栏： java 文章标签： html java 字符串

本文链接：https://blog.csdn.net/weixin_39330443/article/details/81186866

版权

java 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

在使用Editor(所见即所得编辑器)的时候，有时候可能会碰到需要在后台修改编辑器的内容，而且这些内容还都是HTML格式的数据，这个时候如果需要定位到某个标签的话就比较困难，毕竟是后台语言，不像前端的js一样可以在页面上直接取到某个标签，今天就分析一下使用Java如何操作HTML格式的字符串。

    /**
   * 清除html标签
   * @param line
   * @return
   */
   private static String removeHtmlTag(String line) {
       if (line == null) {
           return null;
       }
       String htmlStr = line;
       String textStr = "";
       try {
           // 定义script的正则表达式{或<script[^>]*?>[\\s\\S]*?<\\/script>
           String regEx_script = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>";
           // 定义style的正则表达式{或<style[^>]*?>[\\s\\S]*?<\\/style>
           String regEx_style = "<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>";
           // 定义HTML标签的正则表达式
           String regEx_html = "<[^>]+>";
           // 定义一些特殊字符的正则表达式如：    &nbsp
           String regEx_special = "\\&[a-zA-Z]{1,10};";
           // 过滤script标签
           Pattern p_script = Pattern.compile(regEx_script, 2);
           Matcher m_script = p_script.matcher(htmlStr);
           // 过滤style标签
           htmlStr = m_script.replaceAll("");
           Pattern p_style = Pattern.compile(regEx_style, 2);
           Matcher m_style = p_style.matcher(htmlStr);
           // 过滤html标签
           htmlStr = m_style.replaceAll("");
           Pattern p_html = Pattern.compile(regEx_html, 2);
           Matcher m_html = p_html.matcher(htmlStr);
           // 过滤特殊标签
           htmlStr = m_html.replaceAll("");
           Pattern p_special = Pattern.compile(regEx_special, 2);
           Matcher m_special = p_special.matcher(htmlStr);
           htmlStr = m_special.replaceAll("");
           textStr = htmlStr.replace("\\n", "").replace("\\t", " ").replace("\t", " ");
       } catch (Exception e) {
           e.printStackTrace();
       }
       return textStr;
   }

蒙奇奇的故事

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
清除字符串中HTML标签

在使用Editor(所见即所得编辑器)的时候，有时候可能会碰到需要在后台修改编辑器的内容，而且这些内容还都是HTML格式的数据，这个时候如果需要定位到某个标签的话就比较困难，毕竟是后台语言，不像前端的js一样可以在页面上直接取到某个标签，今天就分析一下使用Java如何操作HTML格式的字符串。 /** * 清除html标签 * @param li...
复制链接

扫一扫