第一步构建忽略https验证的httpclient
public static CloseableHttpClient getHttpClient() throws Exception {
SSLConnectionSocketFactory sslsf = null;
PoolingHttpClientConnectionManager cm = null;
SSLContextBuilder builder = null;
builder = new SSLContextBuilder();
//全部信任 不做身份鉴定
builder.loadTrustMaterial(null, new TrustStrategy() {
@Override
public boolean isTrusted(X509Certificate[] x509Certificates, String s) throws CertificateException {
return true;
}
});
sslsf = new SSLConnectionSocketFactory(builder.build(), new String[]{"SSLv2Hello", "SSLv3", "TLSv1", "TLSv1.2"}, null, NoopHostnameVerifier.INSTANCE);
CloseableHttpClient httpClient = HttpClients.custom()
.setSSLSocketFactory(sslsf)
.setConnectionManager(cm)
.setConnectionManagerShared(true)
.setDefaultCookieStore(cookieStore)
.build();
return httpClient;
}
第二步:第一次访问获取cookie;
public static void getCookie() throws Exception{
HttpClient httpClient = getHttpClient();
// HTTP请求
HttpUriRequest request = new HttpGet("http://888.by3322.com:8088/Login");
setHeader(request);
HttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
}
第三步:为避免目标网站的过滤,添加请求头相关信息
public static void setHeader(HttpUriRequest request){
request.setHeader("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0");
request.setHeader("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
request.setHeader("Accept-Encoding","gzip, deflate");
request.setHeader("Accept-Language","zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3");
request.setHeader("Cache-Control", "max-age=0");
request.setHeader("Connection","keep-alive");
request.setHeader("Host","888.by3322.com:8088");
request.setHeader("Upgrade-Insecure-Requests","1");
}
第四步:下载验证码图片;
public static String getAuthNum()throws Exception{
String authnum = "";
HttpGet httpGet = new HttpGet("http://888.by3322.com:8088/Login/authnum");
setHeader(httpGet);
httpGet.setHeader("Referer","http://888.by3322.com:8088/Login");
HttpResponse response = getHttpClient().execute(httpGet);
if(response.getStatusLine().getStatusCode() == 200){
//下载图片
String filepath = "E:"+ File.separator+"authNum.png";
OutputStream outputStream = new FileOutputStream(filepath);
response.getEntity().writeTo(outputStream);
EntityUtils.consume(response.getEntity());
//识别图片上的字母数字
authnum = ImageUtil.readImgText(new File(filepath));
}
return authnum;
}
第五步:
public static String readImgText(File file){
String result = "";
ITesseract instance = new Tesseract();
File tessDataFolder = LoadLibs.extractTessResources("tessdata");
instance.setLanguage("eng");//英文库识别数字比较准确
instance.setDatapath(tessDataFolder.getAbsolutePath());
try {
result = instance.doOCR(file);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
return result;
}
备注:
在httpclient 请求url时,可能遇到norespones 之类没有响应,而用浏览器有相应的问题,问题可能是请求头的问题,或者是refener的参数设置
识别图片文字采用tess4j ;其中遇到的问题:找不到指定的模块,主要原因是在Windows环境下,gsdll64.dll,liblept170.dll,libtesseract304.dll等三个文件是通过vc2013编译的,所以需要相应地依赖库函数;这个地址:https://www.microsoft.com/zh-cn/download/default.aspx 下载,安装;