建立https链接的SLL验证证书失效问题

爬取网页遇到的目标站点证书不合法问题。

使用jsoup爬取解析网页时,出现了如下的异常情况。

javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
        at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1627)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:204)
        at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:198)
        at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:994)
        at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:142)
        at sun.security.ssl.Handshaker.processLoop(Handshaker.java:533)
        at sun.security.ssl.Handshaker.process_record(Handshaker.java:471)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:904)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1132)
        at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:643)

查明是无效的SSL证书问题。由于现在很多网站由http全站升级到https,可能是原站点SSL没有部署好,导致证书无效,也有可能是其证书本身就不被认可。 对于爬取其网页就会出现证书验证出错的问题。
对于使用Jsoup自带接口来下载网页的,最新版本的1.9.2有validateTLSCertificates(boolean false)接口即可。
Jsoup.connect(url).timeout(30000).userAgent(UA).validateTLSCertificates(false).get()
java默认的证书集合里面不存在对于多数自注册的证书,对于不使用第三方库来做http请求的话,我们可以手动
创建 TrustManager 来解决。确定要建立的链接的站点,否则不推荐这种方式
public static InputStream getByDisableCertValidation(String url) {
		TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {
			public X509Certificate[] getAcceptedIssuers() {
				return new X509Certificate[0];
			}
			public void checkClientTrusted(X509Certificate[] certs, String authType) {
			}
			public void checkServerTrusted(X509Certificate[] certs, String authType) {
			}
		} };

		HostnameVerifier hv = new HostnameVerifier() {
			public boolean verify(String hostname, SSLSession session) {
				return true;
			}
		};

		try {
			SSLContext sc = SSLContext.getInstance("SSL");
			sc.init(null, trustAllCerts, new SecureRandom());
			HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
			HttpsURLConnection.setDefaultHostnameVerifier(hv);

			URL uRL = new URL(url);
			HttpsURLConnection urlConnection = (HttpsURLConnection) uRL.openConnection();
			InputStream is = urlConnection.getInputStream();
			return is;
		} catch (Exception e) {
		}
		return null;
	}


refer:

http://snowolf.iteye.com/blog/391931

http://stackoverflow.com/questions/1828775/how-to-handle-invalid-ssl-certificates-with-apache-httpclient

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值