解决htmlunit里SSL一个异常

关于htmlunit,用java写过爬虫的大概都用到过。也许你有天会碰到一个异常让你懵逼,不禁问自己,为什么自己本地调试通过的爬虫可以跑起来,但是相同的代码在别人的环境可能会跑不起来。是电脑系统问题?环境问题?网络问题?

先看看异常吧:
avax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
    at com.sun.net.ssl.internal.ssl.SSLSessionImpl.getPeerCertificates(SSLSessionImpl.java:352)
    at org.apache.http.conn.ssl.AbstractVerifier.verify(AbstractVerifier.java:126)
    at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:572)
    at com.gargoylesoftware.htmlunit.HtmlUnitSSLSocketFactory.connectSocket(HtmlUnitSSLSocketFactory.java:171)
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
    at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
    at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:172)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1486)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1536)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1403)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:305)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:374)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:359)
    at org.sunvalley.app.service.util.WebClientService.getPage(WebClientService.java:88)
    at test.TestUrl.Test4(TestUrl.java:17)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:73)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:46)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
    at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:46)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

乍一看,是一个SSL的证书问题。不禁发问,htmlunit是构造了一个浏览器的模拟行为,如何再构造证书?还是jar包不兼容还是有疏漏?需要重构httpclient?那为什么有的系统不会抛这个异常?是别人的系统问题吗?这可就扩大了。当然,两个完全一样的环境执行相同的代码,所得结果多半一样;但是两个不同环境执行相同代码,结果可能不一样。

看看我的解决思路吧:

1.重构htmlunit,因为htmlunit也是基于httpclient,可以争对httpclient下手,谷歌之

SSLContext sslContext = SSLContext.getInstance("SSL");

// set up a TrustManager that trusts everything
sslContext.init(null, new TrustManager[] { new X509TrustManager() {
            public X509Certificate[] getAcceptedIssuers() {
                    System.out.println("getAcceptedIssuers =============");
                    return null;
            }

            public void checkClientTrusted(X509Certificate[] certs,
                            String authType) {
                    System.out.println("checkClientTrusted =============");
            }

            public void checkServerTrusted(X509Certificate[] certs,
                            String authType) {
                    System.out.println("checkServerTrusted =============");
            }
} }, new SecureRandom());

SSLSocketFactory sf = new SSLSocketFactory(sslContext);
Scheme httpsScheme = new Scheme("https", 443, sf);
SchemeRegistry schemeRegistry = new SchemeRegistry();
schemeRegistry.register(httpsScheme);

// apache HttpClient version >4.2 should use BasicClientConnectionManager
ClientConnectionManager cm = new SingleClientConnManager(schemeRegistry);
HttpClient httpClient = new DefaultHttpClient(cm);

结果是无用的。。。

2.继续谷歌、百度之,找到如下   

how to ignore ssl certificate error

解决方案:

For future reference if someone wants to do the same thing its fairly 
straight forward. 

1)Create a package called 
org.apache.commons.httpclient.contrib.ssl 
add the two files. 
EasySSLProtocolSocketFactory.java 
EasyX509TrustManager.java 
Compile. 

2)Add the following lines of code to your htmlclient before making any 
htmlunit calls 

Protocol easyhttps = new Protocol("https", new 
EasySSLProtocolSocketFactory(), 443); 
Protocol.registerProtocol("https", easyhttps); 



对应两个java文件如下

http://svn.apache.org/viewvc/httpcomponents/oac.hc3x/trunk/src/contrib/org/apache/commons/httpclient/contrib/ssl/EasySSLProtocolSocketFactory.java?view=markup

http://svn.apache.org/viewvc/httpcomponents/oac.hc3x/trunk/src/contrib/org/apache/commons/httpclient/contrib/ssl/EasySSLProtocolSocketFactory.java?view=markup


按照修改后依旧不起作用

最后看到作者描述

FYI, starting with version 1.14 you can use WebClient.setUseInsecureSSL(true) instead, and it will take care of all the HttpClient configuration behind the scenes.

好激动………………

3.使用是发现webclient没有此方法,原来已被提取自webclientoptions类中。



/**
 * If set to <code>true</code>, the client will accept connections to any host, regardless of
 * whether they have valid certificates or not. This is especially useful when you are trying to
 * connect to a server with expired or corrupt certificates.
 * @param useInsecureSSL whether or not to use insecure SSL
 */
public void setUseInsecureSSL(final boolean useInsecureSSL) {
    useInsecureSSL_ = useInsecureSSL;
}

所以在init()时,也就是构造一个webclient时 ,打开最后的注释行即可解决这个异常……

public void init() throws Exception {
        webclient = new WebClient(BrowserVersion.FIREFOX_17);
        webclient.getOptions().setJavaScriptEnabled(true);
        webclient.getOptions().setThrowExceptionOnScriptError(false);
        webclient.getOptions().setCssEnabled(false);
        webclient.getCookieManager().clearCookies();
        webclient.getCache().clear();
        webclient.setRefreshHandler(new ImmediateRefreshHandler());
        webclient.getOptions().setTimeout(600*1000);
        webclient.setJavaScriptTimeout(600*1000);   
        webclient.setAjaxController(new NicelyResynchronizingAjaxController());  
        webclient.getOptions().setJavaScriptEnabled(true);   
        webclient.setJavaScriptTimeout(600*1000);   
        webclient.getOptions().setRedirectEnabled(true);
        webclient.waitForBackgroundJavaScript(60*1000);
        webclient.getOptions().setThrowExceptionOnScriptError(false);   
        webclient.getOptions().setThrowExceptionOnFailingStatusCode(false);
//        webclient.getOptions().setUseInsecureSSL(true);
          
    }

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值