背景:公司用webmagic爬取数据,在下载图片,报错
sun.security.validator.ValidatorException: PKIX path building failed
1.排除网站防止图片下载后,搜索该问题得到的回答大多数是这样的
2.思路,尝试下载其他网站图片,我随便在网上找了一张HTTP,一张HTTPS图片尝试下载
String url1 = "http://mogicwula.com/img/mogic.jpg";
String url2 = "https://ss3.bdstatic.com/70cFv8Sh_Q1YnxGkpoWK1HF6hhy/it/u=2534506313,1688529724&fm=26&gp=0.jpg";
都可以下载,此时排除上面答案说的证书问题(这么说不准确)
3.既然不是服务器证书有问题,那么跟对方网站的服务器证书可能有关系,于是尝试通过wget下载该图片
4.果然是对方服务器证书比较诡异导致的,对方服务器证书是一个特殊的服务商签发的
5.那么问题就简单了,要么需要先到签发网站上把他们的证书下载下来,然后用那个证书来验证这个证书,要么关闭证书验证
我选择的是关闭证书验证
package javatest;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.Paths;
import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLSession;
import org.apache.commons.io.FileUtils;
public class Img {
public static void main(String[] args) throws Exception {
String url = "xxx.jpg";
slovePKIX();
//下载图片。。。
}
/**
* 访问url之前调用此方法
* @throws Exception
*/
private static void slovePKIX() throws Exception {
trustAllHttpsCertificates();
HttpsURLConnection.setDefaultHostnameVerifier(hv);
}
protected final String retrieveResponseFromServer(final URL validationUrl, final String ticket) {
HttpURLConnection connection = null;
try {
connection = (HttpURLConnection) validationUrl.openConnection();
final BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
final StringBuffer stringBuffer = new StringBuffer(255);
synchronized (stringBuffer) {
while ((line = in.readLine()) != null) {
stringBuffer.append(line);
stringBuffer.append("\n");
}
return stringBuffer.toString();
}
} catch (final IOException e) {
return null;
} catch (final Exception e1) {
return null;
} finally {
if (connection != null) {
connection.disconnect();
}
}
}
static HostnameVerifier hv = new HostnameVerifier() {
public boolean verify(String urlHostName, SSLSession session) {
System.out.println("Warning: URL Host: " + urlHostName + " vs. " + session.getPeerHost());
return true;
}
};
private static void trustAllHttpsCertificates() throws Exception {
javax.net.ssl.TrustManager[] trustAllCerts = new javax.net.ssl.TrustManager[1];
javax.net.ssl.TrustManager tm = new miTM();
trustAllCerts[0] = tm;
javax.net.ssl.SSLContext sc = javax.net.ssl.SSLContext.getInstance("SSL");
sc.init(null, trustAllCerts, null);
javax.net.ssl.HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
}
static class miTM implements javax.net.ssl.TrustManager, javax.net.ssl.X509TrustManager {
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return null;
}
public boolean isServerTrusted(java.security.cert.X509Certificate[] certs) {
return true;
}
public boolean isClientTrusted(java.security.cert.X509Certificate[] certs) {
return true;
}
public void checkServerTrusted(java.security.cert.X509Certificate[] certs, String authType) throws java.security.cert.CertificateException {
return;
}
public void checkClientTrusted(java.security.cert.X509Certificate[] certs, String authType) throws java.security.cert.CertificateException {
return;
}
}
}
总结
网上很多教程没说清楚原理,只是给出了代码,这里再强调下,如果是其他图片可以下载,某个网站图片不能下载,可以通过wget来判是否该网站证书有问题,如果是可以考虑关闭证书验证
tips:如果一定要验证的话,需要先到签发网站上把他们的证书下载下来,然后用那个证书来验证这个证书,网站的证书会导致访问缓慢