使用java 爬虫

本文介绍了一个Java程序在访问HTTPS网站时遇到的SSL证书问题,通过自定义TrustManager和HostnameVerifier,实现了对SSL的信任并忽略主机名验证。示例代码展示了如何配置HttpsURLConnection以进行不受限制的HTTPS连接。
摘要由CSDN通过智能技术生成

 Caused by: java.lang.RuntimeException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

该报错问题  是证书问题导致。

使用HttpURLConnection访问https协议请求时.对SSL信任

参考链接:https://blog.csdn.net/zz153417230/article/details/80271155

 

 

package novel.spider.impl;

import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.security.SecureRandom;
import java.util.ArrayList;
import java.util.List;

import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLSession;
import javax.net.ssl.TrustManager;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import novel.spider.entitys.Chapter;
import novel.spider.interfaces.IChapterSpider;
import novel.spider.junit.MyX509TrustManager;

public class AbstractChapterSpider implements IChapterSpider {
	protected String crawl(String url) throws Exception {
		
		
		SSLContext sslcontext = SSLContext.getInstance("SSL", "SunJSSE");//第一个参数为协议,第二个参数为提供者(可以缺省)
		TrustManager[] tm = {new MyX509TrustManager()};
		sslcontext.init(null, tm, new SecureRandom());
		HostnameVerifier ignoreHostnameVerifier = new HostnameVerifier() {
			public boolean verify(String s, SSLSession sslsession) {
				System.out.println("WARNING: Hostname is not matched for cert.");
					return true;
			}
		};
		HttpsURLConnection.setDefaultHostnameVerifier(ignoreHostnameVerifier);
		HttpsURLConnection.setDefaultSSLSocketFactory(sslcontext.getSocketFactory());
		URL url2 = new URL(url);
		HttpURLConnection conn = (HttpURLConnection) url2.openConnection();
		InputStream in = (InputStream) conn.getInputStream();
//	       String encoding = conn.getContentEncoding();
//		   encoding = encoding == null ? "UTF-8" : encoding;
		   String resp="";
		   java.io.BufferedReader breader = new BufferedReader(
				     new InputStreamReader(in, "UTF-8"));
				   String str = breader.readLine();
				   while (str != null) {
				  resp+=str;
				    str= breader.readLine();
				   }

			return resp;
	}
	 
	
	
	@Override
	public List<Chapter> getsChapter(String url) {
		try {
			String result = crawl(url);
			Document doc = Jsoup.parse(result);
			//System.err.println(doc);
			Elements as = doc.select("div li a");
			List<Chapter> chapters = new ArrayList<>();
			for (Element a : as) {
				Chapter chapter = new Chapter();
				chapter.setTitle(a.text());
				chapter.setUrl("http://www.bxwx8.org" + a.attr("href"));
				chapters.add(chapter);
			}
			return chapters;
		} catch (Exception e) {
			throw new RuntimeException(e);
		}
	}

}
package novel.spider.junit;

import java.util.List;

import org.junit.Test;

import novel.spider.entitys.Chapter;
import novel.spider.impl.DefaultChapterSpider;
import novel.spider.interfaces.IChapterSpider;

public class Testcase {
	
	
	 
	
	@Test
	public void test1() throws Exception {
		IChapterSpider spider = new DefaultChapterSpider();
		List<Chapter>  chapters  = spider.getsChapter("https://www.266ks.com/0_5/");
		for (Chapter chapter : chapters) {
			System.out.println(chapter);
		}
	}
}

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值