HtmlUnit实现的网站登录

最近坛子里接连出现基于httpclient登录网站的帖子,也凑个热闹,分享一点基于htmlunit的登录经验

谨以此文祭奠我刚刚逝去的鼠标

----------------------------------------------分割线---------------------------------------------------
HtmlUnit 目前最新版本2.7(2010-04-15 Foxswily本人确认)
基于httpclient封装(甚至已经做好启用httpclient4的准备),模拟浏览器操作,JavaScript支持较全面,包括主流的jQuery类库,这也是它的强大之处,一般网站的JS屏蔽可以轻松突破。

举例说明

    //创建浏览器,可以选择IE、FF等等
    WebClient client = new WebClient(BrowserVersion.INTERNET_EXPLORER_7);
   
    //获取某网站页面
    HtmlPage page = client.getPage("http://xxx.com");
   
    //获取某页面元素,可通过id或name,(具体方式很多 --Foxswily)
    HtmlElement elmt = page.getElementById("someid");
    //HtmlElement elmt = page.getElementByName("somename");
   
    //此例以文本框为例,先点击,再输入,完全跟真浏览器行为一致
    elmt.click();
    elmt.type("somewords");
   
    //获取按钮
    HtmlButton loginBtn = (HtmlButton)page.getElementById("btnId");
    //点击并获得返回结果
    Page resultPage = loginBtn.click();
    //结果拿到了,想干啥您随意
    log.debug(resultPage.getWebResponse().getContentAsString());
    

 


沿着这个思路展开一下,模拟登录不再需要破解什么js逻辑,用户实际做什么代码就模拟什么,轻松多了   

额外的友情提示,Foxswily本人曾在登录用户量众多的discuz论坛时发现个小问题(已提交bug)

造成登录后跳转失效,如有雷同参照解决吧

问题描述
    HtmlPage.executeRefreshIfNeeded()
when html header has meta like "<META HTTP-EQUIV="Refresh" CONTENT="3 URL=h
ttp://www.some.org/some.html">" it throws NumberFormatException.
cause there is no ";" after "3" in the content.
some forum sites have this bad writting html page.

大意就是,自动跳转格式有问题,htmlunit解析不了,直接Exception了,改写HtmlPage的一个方法后通过。

    private void executeRefreshIfNeeded() throws IOException {
        // If this page is not in a frame then a refresh has already happened,
        // most likely through the JavaScript onload handler, so we don't do a
        // second refresh.
        final WebWindow window = getEnclosingWindow();
        if (window == null) {
            return;
        }

        final String refreshString = getRefreshStringOrNull();
        if (refreshString == null || refreshString.length() == 0) {
            return;
        }

        final double time;
        final URL url;

        int index = refreshString.indexOf(";");
        final boolean timeOnly = (index == -1);

        if (timeOnly && refreshString.indexOf(" ") == -1) {
            // Format: <meta http-equiv='refresh' content='10'>
            try {
                time = Double.parseDouble(refreshString);
            } catch (final NumberFormatException e) {
                if (LOG.isErrorEnabled()) {
                    LOG.error("Malformed refresh string (no ';' but not a number): "
                            + refreshString, e);
                }
                return;
            }
            url = getWebResponse().getRequestSettings().getUrl();
        } else {
            if (refreshString.indexOf(";") == -1) {
                index = refreshString.indexOf(" ");
            }
            // Format: <meta http-equiv='refresh'
            // content='10;url=http://www.blah.com'>
            try {
                time = Double.parseDouble(refreshString.substring(0, index).trim());
            } catch (final NumberFormatException e) {
                if (LOG.isErrorEnabled()) {
                    LOG.error("Malformed refresh string (no valid number before ';') "
                            + refreshString, e);
                }
                return;
            }
            index = refreshString.toLowerCase().indexOf("url=", index);
            if (index == -1) {
                if (LOG.isErrorEnabled()) {
                    LOG.error("Malformed refresh string (found ';' but no 'url='): "
                            + refreshString);
                }
                return;
            }
            final StringBuilder buffer = new StringBuilder(refreshString
                    .substring(index + 4));
            if (buffer.toString().trim().length() == 0) {
                // content='10; URL=' is treated as content='10'
                url = getWebResponse().getRequestSettings().getUrl();
            } else {
                if (buffer.charAt(0) == '"' || buffer.charAt(0) == 0x27) {
                    buffer.deleteCharAt(0);
                }
                if (buffer.charAt(buffer.length() - 1) == '"'
                        || buffer.charAt(buffer.length() - 1) == 0x27) {
                    buffer.deleteCharAt(buffer.length() - 1);
                }
                final String urlString = buffer.toString();
                try {
                    url = getFullyQualifiedUrl(urlString);
                } catch (final MalformedURLException e) {
                    if (LOG.isErrorEnabled()) {
                        LOG.error("Malformed URL in refresh string: " + refreshString, e);
                    }
                    throw e;
                }
            }
        }

        final int timeRounded = (int) time;
        getWebClient().getRefreshHandler().handleRefresh(this, url, timeRounded);
    }
 




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值