模拟京东登陆 java_java-爬虫部分:关于京东模拟登陆的两种实现 | 学步园

本文介绍使用Java进行京东模拟登录的两种方法,包括利用httpClient和htmlunit库,以及使用selenium2。文章详细阐述了每种方法的实现过程,并探讨了可能遇到的问题及解决方案。此外,作者分享了相关代码示例和资源获取方式,欢迎对爬虫感兴趣的读者交流讨论。
摘要由CSDN通过智能技术生成

最近要做一个爬虫,需要网站数据,先拿京东开刀。

因为我是java开发的,所以最开始的时候,想到了httpClient和htmlunit两个东东,于是开始做实验。

注意:京东会根据你的IP,多次登陆后,设置验证码,我目前正在解决这块,希望同志们如果有解决了的,能留言告诉我一下,最好加下我Q:369768231,非常感谢。

PS:后来想到一个解决方案:购买一个电话,然后ADSL上网,找一个IP切换的软件,设置多长时间切换一次。这样就不受影响了。

网上很久以前流传着一个登陆人人网的例子,我就拿过来照搬了一下,发现不灵,后来才发现是自己没理解人家的精髓。然后用htmlunit去模拟,发现京东的js比较复杂,一位多年爬虫经验的哥们告诉我说htmlunit对js支持的不好,有些网站就是不灵的。没办法,自己想吧。

(1)打开京东的登陆页面,看他的源码,发现是执行了一个ajax,具体链接是:https://passport.jd.com/uc/loginService?uuid=f5c0dd5a-762c-4230-b8c0-f70589b7dbdb&ReturnUrl=http://order.jd.com/center/list.action&r=0.66408410689742&loginname=username&nloginpwd=xxxxxx&loginpwd=xxxxxx&machineNet=&machineCpu=&machineDisk=&authcode=&saHrhnkIIX=GXgVo

每次刷新页面,uuid和最后一个参数都是不一样的。然后在火狐打开登陆页,把参数拼在一起后,直接访问火狐,没问题,登陆成功;但是在火狐打开登陆页,把参数拼起来后,在IE却不能打开。OK,看来是在cookie里存了一些东西后面做验证了。

基于以上分析,做了第一套代码:

核心代码如下:

package com.lkb.test;

import java.util.ArrayList;

import java.util.HashMap;

import java.util.Iterator;

import java.util.List;

import java.util.Map;

import org.apache.http.HttpResponse;

import org.apache.http.client.ResponseHandler;

import org.apache.http.client.entity.UrlEncodedFormEntity;

import org.apache.http.client.methods.HttpGet;

import org.apache.http.client.methods.HttpPost;

import org.apache.http.impl.client.BasicResponseHandler;

import org.apache.http.impl.client.DefaultHttpClient;

import org.apache.http.message.BasicNameValuePair;

import org.apache.http.message.BufferedHeader;

import org.apache.http.protocol.HTTP;

public class JD {

// The configuration items

private static String userName = "xxx";

private static String password = "yyy";

private static String redirectURL = "http://order.jd.com/center/list.action";

private static String loginUrl = "http://passport.jd.com/uc/login";

// Don't change the following URL

private static String renRenLoginURL = "https://passport.jd.com/uc/loginService";

// The HttpClient is used in one session

private HttpResponse response;

private DefaultHttpClient httpclient = new DefaultHttpClient();

public  Map getParams(){

Map map = new HashMap();

String str = getText(loginUrl);

String strs1[] = str.split("name=\"uuid\" value=\"");

String strs2[] = strs1[1].split("\"/>");

String uuid = strs2[0];

map.put("uuid", uuid);

System.out.println(strs2[0]);

String str3s[] = strs1[1].split("

String strs4[] = str3s[1].split("/>");

String strs5[] = strs4[0].trim().split("\"");

String key = strs5[0];

String value = strs5[2];

map.put(key, value);

return map;

}

private boolean login() {

Map map = getParams();

HttpPost httpost = new HttpPost(renRenLoginURL);

// All the parameters post to the web site

List nvps = new ArrayList();

nvps.add(new BasicNameValuePair("ReturnUrl", redirectURL));

nvps.add(new BasicNameValuePair("loginname", userName));

nvps.add(new BasicNameValuePair("nloginpwd", password));

nvps.add(new BasicNameValuePair("loginpwd", password));

Iterator it = map.keySet().iterator();

while(it.hasNext()) {

String key = it.next().toString();

String value = map.get(key).toString();

nvps.add(new BasicNameValuePair(key, value));

}

try {

httpost.setEntity(new UrlEncodedFormEntity((List extends org.apache.http.NameValuePair>) nvps, HTTP.UTF_8));

response = httpclient.execute(httpost);

} catch (Exception e) {

e.printStackTrace();

return false;

} finally {

httpost.abort();

}

return true;

}

private String getRedirectLocation() {

BufferedHeader locationHeader =  (BufferedHeader) response.getFirstHeader("Location");

if (locationHeader == null) {

return null;

}

return locationHeader.getValue();

}

private String getText(String redirectLocation) {

HttpGet httpget = new HttpGet(redirectLocation);

ResponseHandler responseHandler = new BasicResponseHandler();

String responseBody = "";

try {

responseBody = httpclient.execute(httpget, responseHandler);

} catch (Exception e) {

e.printStackTrace();

responseBody = null;

} finally {

httpget.abort();

//httpclient.getConnectionManager().shutdown();

}

return responseBody;

}

public void printText() {

if (login()) {

System.out.println(getText(redirectURL));

String redirectLocation = getRedirectLocation();

if (redirectLocation != null) {

System.out.println(getText(redirectLocation));

}

}

}

public static void main(String[] args) {

JD renRen = new JD();

//renRen.getParams();

renRen.printText();

}

}

(2)后来在实践的过程又在想,如果每个网站都这么复杂,如果人家要是改了实现方式怎么办,于是又找到了selenuim2,发现这个东东是个好东东,可以实现模拟登陆,但是有一个缺点是要弹出页面,因为刚开始试验这个,所以还不熟悉。还有一点是你的操作需要设置sleep时间,不然会出问题。关于这一点还需要大家帮我改进一下,核心代码如下:

package com.lkb;

import org.openqa.selenium.By;

import org.openqa.selenium.WebDriver;

import org.openqa.selenium.WebDriver.Navigation;

import org.openqa.selenium.WebElement;

import org.openqa.selenium.firefox.FirefoxDriver;

public class JDTest {

public static void main(String[] args) {

JDTest jd = new JDTest();

jd.connection();

}

public void connection(){

WebDriver driver = new FirefoxDriver();

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

Navigation navigation = driver.navigate();

navigation.to("https://passport.360buy.com/new/login.aspx");

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

WebElement loginName = driver.findElement(By.id("loginname"));

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

loginName.sendKeys(Constant.USERNAME);

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

WebElement loginPwd = driver.findElement(By.id("nloginpwd"));

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

loginPwd.sendKeys(Constant.password);

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

WebElement loginButton = driver.findElement(By.id("loginsubmit"));

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

waitForSecond();

loginButton.click();

waitForSecond();

navigation.to("http://order.jd.com/center/list.action");

System.out.println(driver.getPageSource());

//driver.close();

}

public void waitForSecond()

{

try

{

Thread. sleep(1000);

}

catch (InterruptedException e)

{

e.printStackTrace();

}

}

}

以上的jar包和源码大家需要的话,可以联系我,QQ:369768231

对爬虫感兴趣的同学,请加我的Q群:101526096

后续还要做验证码的解决方案,有做过或者即将做的,也请加入Q群,一起讨论下。

开源才能进步,希望大家互相帮助,互相进步。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值