java 正则表达式过滤html元素

转自:http://pmh905001.iteye.com/blog/239900

/**
      * filter all html element. 
      * For example:<a href="www.sohu.com/test">hello!</a>
      * The filter result is :hello!
      * Notice:This method filter the text between "<" and ">"
      * @param element
      * @return
      */
     public static String getTxtWithoutHTMLElement (String element)
     {
//       String reg="<[^<|^>]+>";
//       return   element.replaceAll(reg,"");
        
         if(null==element||"".equals(element.trim()))
         {
             return element;
         }

         Pattern pattern=Pattern.compile("<[^<|^>]*>");
         Matcher matcher=pattern.matcher(element);
         StringBuffer txt=new StringBuffer();
         while(matcher.find())
         {
             String group=matcher.group();
             if(group.matches("<[\\s]*>"))
             {
                 matcher.appendReplacement(txt,group);    
             }
             else
             {
                 matcher.appendReplacement(txt,"");
             }
         }
         matcher.appendTail(txt);
         repaceEntities(txt,"&amp;","&");
         repaceEntities(txt,"&lt;","<");        
         repaceEntities(txt,"&gt;",">");
         repaceEntities(txt,"&quot;","\"");
         repaceEntities(txt,"&nbsp;","");        
         return txt.toString();
     }
private static void repaceEntities ( StringBuffer txt,String entity,String replace)
     {
         int pos=-1;
         while(-1!=(pos=txt.indexOf(entity)))
         {
             txt.replace(pos,pos+entity.length(),replace);
         }
     }

 

下面是测试用例:
public void testGetTxtWithoutHTMLElement ()
     {
        
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<a href='a/test'>test</a>"));
        
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<a href='a/test'>test"));
        
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<input type='text'>test</input>"));
        
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<p>test"));
        
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<table><tr><td>test</td></tr></table>"));
        
         assertEquals("te<st",ExcelHssfView.getTxtWithoutHTMLElement("<p>te<st"));
        
         assertEquals("te>st",ExcelHssfView.getTxtWithoutHTMLElement("<p>te>st"));
        
         assertEquals("tst",ExcelHssfView.getTxtWithoutHTMLElement("<p>t<e>st"));
        
         assertEquals("t<st",ExcelHssfView.getTxtWithoutHTMLElement("<p>t<<e>st"));
        
         assertEquals("<>test",ExcelHssfView.getTxtWithoutHTMLElement("<p><>test"));
        
         assertEquals("< >test",ExcelHssfView.getTxtWithoutHTMLElement("<p>< >test"));
        
         assertEquals("<<>test",ExcelHssfView.getTxtWithoutHTMLElement("<p><<>test"));
        
         assertEquals("test",ExcelHssfView.getTxtWithoutHTMLElement("<table><tr><td> test</td></tr></table>"));
        
     }

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值