HttpClient模拟用户登录，抓取服务器文件

最新推荐文章于 2022-01-28 14:42:55 发布

FutureMet

最新推荐文章于 2022-01-28 14:42:55 发布

阅读量1.7k

点赞数

分类专栏：学习笔记

本文链接：https://blog.csdn.net/futuremet/article/details/80687572

版权

学习笔记专栏收录该内容

4 篇文章 0 订阅

订阅专栏

需求说明：将两个不相关的系统中的电子档案合并，包含数据库信息（数据库表格式各不相同，而且不能修改项目源码），并且每隔一段时间可以自动同步一次。之前是通过人工手动录入，效率低而且容易遗漏。现在决定用HttpClient开发一个小项目来实现上述功能。

一、HttpClient简单的介绍一下

　　HttpClient 是 Apache Jakarta Common 下的子项目，可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包，并且它支持 HTTP 协议最新的版本和建议。官方站点：http://hc.apache.org/　　　

　　Maven地址：

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.5</version>
</dependency>

　　HTTP 协议可能是现在 Internet 上使用得最多、最重要的协议了，越来越多的 Java 应用程序需要直接通过 HTTP 协议来访问网络资源。虽然在 JDK 的 java net包中

　　已经提供了访问 HTTP 协议的基本功能，但是对于大部分应用程序来说，JDK 库本身提供的功能还不够丰富和灵活。HttpClient 是 Apache Jakarta Common 下的子

　　项目，用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包，并且它支持 HTTP 协议最新的版本和建议。HttpClient 已经应用在很多的项目中，

　　比如 Apache Jakarta 上很著名的另外两个开源项目 Cactus 和 HTMLUnit 都使用了 HttpClient。现在HttpClient最新版本为 HttpClient 4.5.5（2018-06-14）。

二、使用HttpClient获取网页内容

import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;


import java.io.IOException;


public class GetWebPageContent {
    /**
     * 抓取网页信息使用get请求
     * @param args
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        //创建httpClient实例
        CloseableHttpClient httpClient = HttpClients.createDefault();
        //创建httpGet实例
        HttpGet httpGet = new HttpGet("http://www.baidu.com");
        CloseableHttpResponse response = httpClient.execute(httpGet);
        if (response != null){
            HttpEntity entity =  response.getEntity();  //获取网页内容
            String result = EntityUtils.toString(entity, "UTF-8");
            System.out.println("网页内容:"+result);
        }
        if (response != null){
            response.close();
        }
        if (httpClient != null){
            httpClient.close();
        }
    }
}

　　通过上述代码就可以抓取到baidu主页的内容了，这里需要注意的是抓取的部分网页可能带有中文会产生乱码，在抓取到内容之后要进行相对应的编码转换。

三、模拟浏览器登录获取到登录后的Session值

　　先通过浏览器F12在网页中截取到登录的请求地址，以及所需的参数列表List<NameValuePair>，这里我项目用到的是地址是公司的内部的服务器所以不需要验证码，如果需要验证码登录的话还需要解析验证码等操作，这里就不详细介绍了。

    /**
     * 获得JesessioneID
     * @param loginUrl  登录地址
     * @param username  用户名
     * @param password  密码
     * @return  JesessioneID
     * @throws IOException IO异常
     */
    public static String getJeseeion(String loginUrl, String username, String password) throws IOException {

        HttpClient httpclient = new DefaultHttpClient();
        HttpPost httpost = new HttpPost(loginUrl);
        List<NameValuePair> nvp = new ArrayList<NameValuePair>();
        nvp.add(new BasicNameValuePair("username", username));
        nvp.add(new BasicNameValuePair("password", password));
        String sCharSet = "UTF-8";
        httpost.setEntity(new UrlEncodedFormEntity(nvp,sCharSet));
        httpost.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
        httpost.setHeader("Accept-Encoding", "gzip, deflate, sdch");
        httpost.setHeader("Accept-Language", "zh-CN,zh;q=0.8");
        httpost.setHeader("Connection", "keep-alive");
        httpost.setHeader("Cache-Control", "max-age=0");
        httpost.setHeader("Upgrade-Insecure-Requests", "1");
        httpost.setHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.61 Safari/537.36");
        HttpResponse response = httpclient.execute(httpost);
        httpost.abort();
        if(response!=null)
        {
            String cookie = response.getFirstHeader("Set-Cookie").getValue();
            String[] cookies = cookie.split(";");
            String jession = cookies[0];    //本系统权限只需要jessionid即可
            System.out.println(jession);
            return jession;
        }
        return null;
    }

四、下载服务器的文件保存到本地硬盘

　　将刚刚请求到的jesession值模拟保存到请求头中然后访问服务器中的文件。

    /**
     * 根据url下载文件，保存到filepath中
     * @param strUrl    文件在服务器中的地址
     * @param filepath    保存本地路径
     * @param jession    要模拟的Cookie值
     * @return
     */
    public static String download(String strUrl, String filepath, String jession) {
        try {
            HttpClient client = new DefaultHttpClient();
            HttpGet httpget = new HttpGet(strUrl);
            if(jession!=null && !jession.equals("")){
                httpget.setHeader("Cookie",jession);
            }
            HttpResponse response = client.execute(httpget);
            HttpEntity entity = response.getEntity();
            InputStream is = entity.getContent();
            if (filepath == null)
                filepath = getFilePath(response);
            File file = new File(filepath);
            file.getParentFile().mkdirs();
            FileOutputStream fileout = new FileOutputStream(file);
            /**
             * 根据实际运行效果 设置缓冲区大小
             */
            byte[] buffer=new byte[cache];
            int ch = 0;
            while ((ch = is.read(buffer)) != -1) {
                fileout.write(buffer,0,ch);
            }
            is.close();
            fileout.flush();
            fileout.close();
            System.out.println("文件保存成功，保存路径为："+filepath);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }
    /**
     * 获取response要下载的文件的默认路径
     * @param response
     * @return
     */
    public static String getFilePath(HttpResponse response) {
        String filepath = root + splash;
        String filename = getFileName(response);

        if (filename != null) {
            filepath += filename;
        } else {
            filepath += getRandomFileName();
        }
        return filepath;
    }

    /**
     * 获取response header中Content-Disposition中的filename值
     * @param response HttpResponse
     * @return filename
     */
    public static String getFileName(HttpResponse response) {
        Header contentHeader = response.getFirstHeader("Content-Disposition");
        String filename = null;
        if (contentHeader != null) {
            HeaderElement[] values = contentHeader.getElements();
            if (values.length == 1) {
                NameValuePair param = values[0].getParameterByName("filename");
                if (param != null) {
                    try {
                        //filename = new String(param.getValue().toString().getBytes(), "utf-8");
                        //filename=URLDecoder.decode(param.getValue(),"utf-8");
                        filename = param.getValue();
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            }
        }
        return filename;
    }
    /**
     * 获取随机文件名
     * @return 文件名
     */
    public static String getRandomFileName() {
        return String.valueOf(System.currentTimeMillis());
    }

　　运行一遍看看效果

    public static void main(String[] args) throws IOException, URISyntaxException {
        //登录地址
        String loginUrl = "http://192.168.3.215:8080/system/login/loginUserAjax";

        //文件下载链接
        String strUrl = "http://192.168.3.215:8080/system/report/projectFileManage/downLoadFileAjax?aliasName=20180611223232.xlsx&directory=201806&fileName=1126.xlsx&r=1528778758118";

        strUrl = new String(strUrl.getBytes("gbk"),"utf-8");

        //存放路径
        String filepath = "D:\\test\\111.xlsx";

        String username = "admin";

        String password = "123456";

        String sessionId = getJeseeion(loginUrl,username,password);

        download(strUrl,filepath,sessionId);

    }

　　控制台输出

JSESSIONID=D812E3E4E6E4A124079F154ADD160095
文件保存成功，保存路径为：D:\test\111.xlsx

　　本地文件

　　模拟管理员登陆并从服务器抓取电子文件的功能大致实现了。

FutureMet

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
HttpClient模拟用户登录，抓取服务器文件

需求说明：将两个不相关的系统中的电子档案合并，包含数据库信息（数据库表格式各不相同，而且不能修改项目源码），并且每隔一段时间可以自动同步一次。之前是通过人工手动录入，效率很慢而且容易遗漏。现在决定用HttpClient开发一个小项目来实现上述功能。一、HttpClient简单的介绍一下　　HttpClient 是 Apache Jakarta Common 下的子项目，可以用来提供高效的、最新的、...
复制链接

扫一扫