用户操作
[即时聊天] [发私信] [加为好友]
夏纯中ID:danny_xcz
856436次访问,排名41好友2人,关注者79
danny_xcz的文章
原创 296 篇
翻译 3 篇
转载 25 篇
评论 636 篇
纯月的公告
最近评论
henhaohll:有些意思啊!呵呵 ..
sap99:www.sap99.com/,SAP99资料多多

SAP免费资料下载
http://www.sap99.com

有很多的学习资料,推荐一下,
snowring:你好,能留下联系方式吗?
我也在研究MULE,QQ:7766284
谢谢了。
make6b:不懂L
http://www.pvpsale.com
changyanping:您好,我最近也在研究mule,希望能和你交流交流,我的QQ:121201384
文章分类
收藏
    相册
    Blog用途
    我的相册
    Java Desktop
    Open Source
    友情链接
    存档
    软件项目交易
    订阅我的博客
    XML聚合  FeedSky
    订阅到鲜果
    订阅到Google
    订阅到抓虾
    订阅到BlogLines
    订阅到Yahoo
    订阅到GouGou
    订阅到飞鸽
    订阅到Rojo
    订阅到newsgator
    订阅到netvibes

    原创 HttpClient和HtmlParser配合实现自动登陆系统抽取页面信息收藏

    新一篇: Visual WebGui 基于ajax的界面框架 | 旧一篇: 基于Linux的虚拟主机搭建

     

    HtmlParser代码接口变化比较多,因此写一个最新的。废话不多说,贴代码共大家享用!

    /*
     * Main.java
     *
     * Created on 2007年1月19日, 上午9:14
     *
     * To change this template, choose Tools | Template Manager
     * and open the template in the editor.
     */

    package wapproxy;

    import org.apache.commons.httpclient.*;
    import org.apache.commons.httpclient.methods.*;
    import org.apache.commons.httpclient.params.HttpMethodParams;

    import java.io.*;

    import org.htmlparser.Node;
    import org.htmlparser.NodeFilter;
    import org.htmlparser.Parser;
    import org.htmlparser.filters.TagNameFilter;
    import org.htmlparser.tags.*;
    import org.htmlparser.util.NodeList;
    import org.htmlparser.util.ParserException;

     

    /**
     *
     * @author xcz
     */
    public class Main {
       
       
        /** Creates a new instance of Main */
        public Main() {
        }
       
        /**
         * @param args the command line arguments
         */
        public static void main(String[] args) throws Exception  {
            // Create an instance of HttpClient.
            HttpClient client = new HttpClient();
           
            // Create a method instance.
            PostMethod post_method = new PostMethod("http://localhost/rcpq/");
           
            NameValuePair[] data = {
                new NameValuePair("username", "admin"),
                new NameValuePair("password", "admin"),
                new NameValuePair("dologin", "1"),
            };
           
            post_method.setRequestBody(data);
           
           
           
            try {
                // Execute the method.
                int statusCode = client.executeMethod(post_method);
               
                if (statusCode != HttpStatus.SC_OK) {
                    System.err.println("Method failed: " + post_method.getStatusLine());
                }
               
                // Read the response body.
                //byte[] responseBody = post_method.getResponseBody();
               
                // Deal with the response.
                // Use caution: ensure correct character encoding and is not binary data
                //System.out.println(new String(responseBody));
               
            } catch (HttpException e) {
                System.err.println("Fatal protocol violation: " + e.getMessage());
                e.printStackTrace();
            } catch (IOException e) {
                System.err.println("Fatal transport error: " + e.getMessage());
                e.printStackTrace();
            } finally {
                // Release the connection.
                post_method.releaseConnection();
            }
           
            byte[] responseBody = null;
           
            GetMethod get_method = new GetMethod("http://localhost/rcpq/unit.php");
           
            // Provide custom retry handler is necessary
            get_method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
                    new DefaultHttpMethodRetryHandler(3, false));
           
            try {
                // Execute the method.
                int statusCode = client.executeMethod(get_method);
               
                if (statusCode != HttpStatus.SC_OK) {
                    System.err.println("Method failed: " + get_method.getStatusLine());
                }
               
                // Read the response body.
                //responseBody = get_method.getResponseBody();

                //这里用流来读页面
                
                InputStream in = get_method.getResponseBodyAsStream();
                if (in != null) {
                    byte[] tmp = new byte[4096];
                    int bytesRead = 0;
                    ByteArrayOutputStream buffer = new ByteArrayOutputStream(1024);
                    while ((bytesRead = in.read(tmp)) != -1) {
                        buffer.write(tmp, 0, bytesRead);
                    }
                    responseBody = buffer.toByteArray();
                }
               
               
                // Deal with the response.
                // Use caution: ensure correct character encoding and is not binary data
                //System.out.println(new String(responseBody));
               
            } catch (HttpException e) {
                System.err.println("Fatal protocol violation: " + e.getMessage());
                e.printStackTrace();
            } catch (IOException e) {
                System.err.println("Fatal transport error: " + e.getMessage());
                e.printStackTrace();
            } finally {
                // Release the connection.
                get_method.releaseConnection();
            }
           
            Parser parser;
           
            parser = Parser.createParser(new String(responseBody, "GBK"), "GBK");
           
           
            String filterStr = "table";
            NodeFilter filter = new TagNameFilter(filterStr);
           
            NodeList tables = parser.extractAllNodesThatMatch(filter);
           
            //System.out.println(tables.elementAt(17).toString());

           //找到单位列表所在的表格
            
            TableTag tabletag = (TableTag) tables.elementAt(17);
           
            TableRow row = tabletag.getRow(3);
           
            TableColumn[] cols = row.getColumns();
            //System.out.println("单位名称:" + cols[2].toHtml());
            System.out.println("单位名称:" + cols[2].childAt(0).getText());
           
        }
       
    }

    发表于 @ 2007年01月19日 13:55:00|评论(loading...)|编辑

    新一篇: Visual WebGui 基于ajax的界面框架 | 旧一篇: 基于Linux的虚拟主机搭建

    评论

    #gehantao 发表于2007-01-19 14:53:41  IP: 210.77.134.*
    纯月老师您好!我是CSDN社区编辑葛涵涛,关于社区风云榜的事情,希望能和您沟通一下,我的邮件是ght@csdn.net,希望您能尽快给我一个回信,谢谢!
    #sukyboor 发表于2007-01-19 15:59:39  IP: 59.40.98.*
    去年我用这种方式,到处偷气象数据
    哈哈
    #porsper 发表于2007-10-12 15:13:41  IP: 60.28.32.*
    您好 我最近在做这方面的程序 需要抓取网页上的 数据 您上面的程序中 parser 那一块不是很明白 可以加一下注解么
    2007-10-15 15:23:38作者回复
    parsar这一块用来很多,我那个只是一个取巧的方法,实际使用中,你可用正则来取数据,非常方便
    发表评论  


    当前用户设置只有注册用户才能发表评论。如果你没有登录,请点击登录
    Csdn Blog version 3.1a
    Copyright © 纯月