java 正则表达式过滤html元素

下面是一个过滤html元素的程序,也许对大家有点帮助!
/**
      * filter all html element.
      * For example:<a href="www.sohu.com/test">hello!</a>
      * The filter result is :hello!
      * Notice:This method filter the text between "<" and ">"
      * @param element
      * @return
      */
     public static String getTxtWithoutHTMLElement (String element)
     {
//       String reg="<[^<|^>]+>";
//       return   element.replaceAll(reg,"");
       
         if(null==element||"".equals(element.trim()))
         {
             return element;
         }

         Pattern pattern=Pattern.compile("<[^<|^>]*>");
         Matcher matcher=pattern.matcher(element);
         StringBuffer txt=new StringBuffer();
         while(matcher.find())
         {
             String group=matcher.group();
             if(group.matches("<[//s]*>"))
             {
                 matcher.appendReplacement(txt,group);   
             }
             else
             {
                 matcher.appendReplacement(txt,"");
             }
         }
         matcher.appendTail(txt);
         repaceEntities(txt,"&","&");
         repaceEntities(txt,"<","<");       
         repaceEntities(txt,">",">");
         repaceEntities(txt,""","/"");
         repaceEntities(txt," ","");
       
         return txt.toString();
     }

 

下面是测试用例:
public void testGetTxtWithoutHTMLElement ()
     {
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<a href='a/test'>test</a>"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<a href='a/test'>test"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<input type='text'>test</input>"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<p>test"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<table><tr><td>test</td></tr></table>"));
       
         assertEquals("te<st",ExcelHssfView.getTxtWithoutHTMLElement("<p>te<st"));
       
         assertEquals("te>st",ExcelHssfView.getTxtWithoutHTMLElement("<p>te>st"));
       
         assertEquals("tst",ExcelHssfView.getTxtWithoutHTMLElement("<p>t<e>st"));
       
         assertEquals("t<st",ExcelHssfView.getTxtWithoutHTMLElement("<p>t<<e>st"));
       
         assertEquals("<>test",ExcelHssfView.getTxtWithoutHTMLElement("<p><>test"));
       
         assertEquals("< >test",ExcelHssfView.getTxtWithoutHTMLElement("<p>< >test"));
       
         assertEquals("<<>test",ExcelHssfView.getTxtWithoutHTMLElement("<p><<>test"));
       
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<table><tr><td> test</td></tr></table>"));
       
     }

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值