在java代码中,使用HttpClient爬取https页面时,遇到了这个bug:javax.net.ssl.SSLException: Received fatal alert: protocol_version
先奉上初始的代码:
1 /** 2 * 3 */ 4 package com.tcl.mibc.weathercrawler; 5 6 import org.apache.http.HttpEntity; 7 import org.apache.http.HttpException; 8 import org.apache.http.HttpResponse; 9 import org.apache.http.client.HttpClient; 10 import org.apache.http.client.methods.HttpGet; 11 import org.apache.http.client.methods.HttpRequestBase; 12 import org.apache.http.impl.client.HttpClients; 13 import org.apache.http.protocol.BasicHttpContext; 14 import org.apache.http.protocol.HttpContext; 15 import org.apache.http.util.EntityUtils; 16 17 /** 18 * @author yanzhou 19 * 20 */ 21 public class PageOld { 22 23 /** 24 * @param args 25 */ 26 public static void main(String[] args) { 27 System.setProperty("javax.net.debug", "all"); 28 String url = "https://www.timeanddate.com/weather/"; 29 HttpClient client = HttpClients.createDefault(); 30 HttpRequestBase http = new HttpGet(url); 31 HttpContext context = new BasicHttpContext(); 32 try { 33 HttpResponse response = client.execute(http, context); 34 int statusCode = response.getStatusLine().getStatusCode(); 35 36 switch (statusCode) { 37 case 200: 38 case 400:// 业务异常 39 break; 40 default: 41 throw new HttpException(url + " Status Code:" + statusCode); 42 } 43 44 HttpEntity entity = response.getEntity(); 45 String reStr = EntityUtils.toString(entity); 46 System.out.println(reStr); 47 } catch (Exception e) { 48 System.out.println(e.toString()); 49 } 50 } 51 52 }
注:加上System.setProperty("javax.net.debug", "all");这一行是为了查看调试信息。
调试信息如下:
1 trigger seeding of SecureRandom 2 done seeding SecureRandom 3 16:21:43.798 [main] DEBUG org.apache.http.client.protocol.RequestAddCookies - CookieSpec selected: default 4 16:21:43.810 [main] DEBUG org.apache.http.client.protocol.RequestAuthCache - Auth cache not set in the context 5 16:21:43.810 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection request: [route: {s}->https://www.timeanddate.com:443][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20] 6 16:21:43.821 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection leased: [id: 0][route: {s}->https://www.timeanddate.com:443][total kept alive: 0; route allocated: 1 of 2; total allocated: 1 of 20] 7 16:21:43.823 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Opening connection {s}->https://www.timeanddate.com:443 8 16:21:43.831 [main] DEBUG org.apache.http.impl.conn.DefaultHttpClientConnectionOperator - Connecting to www.timeanddate.com/151.101.228.69:443 9 16:21:43.831 [main] DEBUG org.apache.http.conn.ssl.SSLConnectionSocketFactory - Connecting socket to www.timeanddate.com/151.101.228.69:443 with timeout 0 10 Ignoring unavailable cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA 11 Ignoring unavailable cipher suite: TLS_DHE_RSA_WITH_AES_256_CBC_SHA 12 Ignoring unavailable cipher suite: TLS_ECDH_RSA_WITH_AES_256_CBC_SHA 13 Ignoring unsupported cipher suite: TLS_DHE_DSS_WITH_AES_128_CBC_SHA256 14 Ignoring unsupported cipher suite: TLS_DHE_DSS_WITH_AES_256_CBC_SHA256 15 Ignoring unsupported cipher suite: TLS_DHE_RSA_WITH_AES_128_CBC_SHA256 16 Ignoring unsupported cipher suite: TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256 17 Ignoring unsupported cipher suite: TLS_DHE_RSA_WITH_AES_256_CBC_SHA256 18 Ignoring unsupported cipher suite: TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 19 Ignoring unsupported cipher suite: TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384 20 Ignoring unsupported cipher suite: TLS_RSA_WITH_AES_256_CBC_SHA256 21 Ignoring unavailable cipher suite: TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA 22 Ignoring unsupported cipher suite: TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 23 Ignoring unsupported cipher suite: TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 24 Ignoring unavailable cipher suite: TLS_DHE_DSS_WITH_AES_256_CBC_SHA 25 Ignoring unsupported cipher suite: TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384 26 Ignoring unsupported cipher suite: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 27 Ignoring unsupported cipher suite: TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256 28 Ignoring unavailable cipher suite: TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA 29 Ignoring unavailable cipher suite: TLS_RSA_WITH_AES_256_CBC_SHA 30 Ignoring unsupported cipher suite: TLS_RSA_WITH_AES_128_CBC_SHA256 31 Allow unsafe renegotiation: false 32 Allow legacy hello messages: true 33 Is initial handshake: true 34 Is secure renegotiation: false 35 16:21:44.048 [main] DEBUG org.apache.http.conn.ssl.SSLConnectionSocketFactory - Enabled protocols: [TLSv1] 36 16:21:44.048 [main] DEBUG org.apache.http.conn.ssl.SSLConnectionSocketFactory - Enabled cipher suites:[TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDH_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA, TLS_ECDHE_ECDSA_WITH_RC4_128_SHA, TLS_ECDHE_RSA_WITH_RC4_128_SHA, SSL_RSA_WITH_RC4_128_SHA, TLS_ECDH_ECDSA_WITH_RC4_128_SHA, TLS_ECDH_RSA_WITH_RC4_128_SHA, TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA, SSL_RSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA, SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA, SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA, SSL_RSA_WITH_RC4_128_MD5, TLS_EMPTY_RENEGOTIATION_INFO_SCSV] 37 16:21:44.048 [main] DEBUG org.apache.http.conn.ssl.SSLConnectionSocketFactory - Starting handshake 38 %% No cached client session 39 *** ClientHello, TLSv1 40 RandomCookie: GMT: 1513239448 bytes = { 31, 89, 18, 56, 97, 0, 186, 78, 114, 129, 23, 167, 49, 218, 158, 250, 131, 200, 216, 78, 186, 70, 7, 144, 6, 254, 239, 98 } 41 Session ID: {} 42 Cipher Suites: [TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_128_CBC_SHA, TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDH_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_RSA_WITH_AES_128_CBC_SHA, TLS_DHE_DSS_WITH_AES_128_CBC_SHA, TLS_ECDHE_ECDSA_WITH_RC4_128_SHA, TLS_ECDHE_RSA_WITH_RC4_128_SHA, SSL_RSA_WITH_RC4_128_SHA, TLS_ECDH_ECDSA_WITH_RC4_128_SHA, TLS_ECDH_RSA_WITH_RC4_128_SHA, TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA, SSL_RSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA, TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA, SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA, SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA, SSL_RSA_WITH_RC4_128_MD5, TLS_EMPTY_RENEGOTIATION_INFO_SCSV] 43 Compression Methods: { 0 } 44 Extension elliptic_curves, curve names: {secp256r1, sect163k1, sect163r2, secp192r1, secp224r1, sect233k1, sect233r1, sect283k1, sect283r1, secp384r1, sect409k1, sect409r1, secp521r1, sect571k1, sect571r1, secp160k1, secp160r1, secp160r2, sect163r1, secp192k1, sect193r1, sect193r2, secp224k1, sect239k1, secp256k1} 45 Extension ec_point_formats, formats: [uncompressed] 46 Extension server_name, server_name: [host_name: www.timeanddate.com] 47 *** 48 [write] MD5 and SHA1 hashes: len = 177 49 0000: 01 00 00 AD 03 01 5A 32 34 98 1F 59 12 38 61 00 ......Z24..Y.8a. 50 0010: BA 4E 72 81 17 A7 31 DA 9E FA 83 C8 D8 4E BA 46 .Nr...1......N.F 51 0020: 07 90 06 FE EF 62 00 00 2A C0 09 C0 13 00 2F C0 .....b..*...../. 52 0030: 04 C0 0E 00 33 00 32 C0 07 C0 11 00 05 C0 02 C0 ....3.2......... 53 0040: 0C C0 08 C0 12 00 0A C0 03 C0 0D 00 16 00 13 00 ................ 54 0050: 04 00 FF 01 00 00 5A 00 0A 00 34 00 32 00 17 00 ......Z...4.2... 55 0060: 01 00 03 00 13 00 15 00 06 00 07 00 09 00 0A 00 ................ 56 0070: 18 00 0B 00 0C 00 19 00 0D 00 0E 00 0F 00 10 00 ................ 57 0080: 11 00 02 00 12 00 04 00 05 00 14 00 08 00 16 00 ................ 58 0090: 0B 00 02 01 00 00 00 00 18 00 16 00 00 13 77 77 ..............ww 59 00A0: 77 2E 74 69 6D 65 61 6E 64 64 61 74 65 2E 63 6F w.timeanddate.co 60 00B0: 6D m 61 main, WRITE: TLSv1 Handshake, length = 177 62 [Raw write]: length = 182 63 0000: 16 03 01 00 B1 01 00 00 AD 03 01 5A 32 34 98 1F ...........Z24.. 64 0010: 59 12 38 61 00 BA 4E 72 81 17 A7 31 DA 9E FA 83 Y.8a..Nr...1.... 65 0020: C8 D8 4E BA 46 07 90 06 FE EF 62 00 00 2A C0 09 ..N.F.....b..*.. 66 0030: C0 13 00 2F C0 04 C0 0E 00 33 00 32 C0 07 C0 11 .../.....3.2.... 67 0040: 00 05 C0 02 C0 0C C0 08 C0 12 00 0A C0 03 C0 0D ................ 68 0050: 00 16 00 13 00 04 00 FF 01 00 00 5A 00 0A 00 34 ...........Z...4 69 0060: 00 32 00 17 00 01 00 03 00 13 00 15 00 06 00 07 .2.............. 70 0070: 00 09 00 0A 00 18 00 0B 00 0C 00 19 00 0D 00 0E ................ 71 0080: 00 0F 00 10 00 11 00 02 00 12 00 04 00 05 00 14 ................ 72 0090: 00 08 00 16 00 0B 00 02 01 00 00 00 00 18 00 16 ................ 73 00A0: 00 00 13 77 77 77 2E 74 69 6D 65 61 6E 64 64 61 ...www.timeandda 74 00B0: 74 65 2E 63 6F 6D te.com 75 [Raw read]: length = 5 76 0000: 15 03 01 00 02 ..... 77 [Raw read]: length = 2 78 0000: 02 46 .F 79 main, READ: TLSv1 Alert, length = 2 80 main, RECV TLSv1 ALERT: fatal, protocol_version 81 main, called closeSocket() 82 main, handling exception: javax.net.ssl.SSLException: Received fatal alert: protocol_version 83 16:21:45.478 [main] DEBUG org.apache.http.impl.conn.DefaultManagedHttpClientConnection - http-outgoing-0: Shutdown connection 84 16:21:45.478 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Connection discarded 85 16:21:45.478 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection released: [id: 0][route: {s}->https://www.timeanddate.com:443][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20] 86 javax.net.ssl.SSLException: Received fatal alert: protocol_version
重点是这一行:
*** ClientHello, TLSv1
这个表示我们使用的默认的http协议班本是:TLSv1,错误提示意思是http协议版本不正确。
于是,我按照stackoverflow上的分析,增加协议版本:"SSLv2Hello"
改版后的代码:
1 /** 2 * 3 */ 4 package com.tcl.mibc.weathercrawler; 5 6 import java.text.SimpleDateFormat; 7 import java.util.Date; 8 9 import javax.net.ssl.SSLContext; 10 11 import org.apache.http.HttpEntity; 12 import org.apache.http.client.methods.CloseableHttpResponse; 13 import org.apache.http.client.methods.HttpPost; 14 import org.apache.http.conn.ssl.SSLConnectionSocketFactory; 15 import org.apache.http.impl.client.CloseableHttpClient; 16 import org.apache.http.impl.client.HttpClientBuilder; 17 import org.apache.http.ssl.SSLContexts; 18 import org.apache.http.util.EntityUtils; 19 20 /** 21 * @author yanzhou 22 * 23 */ 24 public class PageNew { 25 26 /** 27 * @param args 28 */ 29 public static void main(String[] args) { 30 System.setProperty("javax.net.debug", "all"); 31 String url = "https://www.timeanddate.com/weather/"; 32 CloseableHttpClient httpclient; 33 try { 34 SSLContext ctx = SSLContexts.createSystemDefault(); 35 SSLConnectionSocketFactory fac = 36 new SSLConnectionSocketFactory(ctx, new String[] 37 {"SSLv2Hello", "TLSv1"}, null, 38 SSLConnectionSocketFactory.ALLOW_ALL_HOSTNAME_VERIFIER); 39 httpclient = HttpClientBuilder.create().setSSLSocketFactory(fac).build(); 40 HttpPost httpPost = new HttpPost(url); 41 CloseableHttpResponse resp = httpclient.execute(httpPost); 42 HttpEntity entity = resp.getEntity(); 43 String reStr = EntityUtils.toString(entity); 44 SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss"); 45 Date now = new Date(); 46 String nowStr = sdf.format(now); 47 System.out 48 .println("======>WeatherCrawler getCurrentWeather time=" + nowStr + ", body=" + reStr); 49 } catch (Exception e) { 50 System.out.println("======>WeatherCrawler getCurrentWeather error" + e); 51 } 52 } 53 }
调试信息:
1 main, READ: TLSv1 Alert, length = 2 2 main, RECV TLSv1 ALERT: fatal, protocol_version 3 main, called closeSocket() 4 main, handling exception: javax.net.ssl.SSLException: Received fatal alert: protocol_version 5 16:43:28.205 [main] DEBUG org.apache.http.impl.conn.DefaultManagedHttpClientConnection - http-outgoing-0: Shutdown connection 6 16:43:28.205 [main] DEBUG org.apache.http.impl.execchain.MainClientExec - Connection discarded 7 16:43:28.205 [main] DEBUG org.apache.http.impl.conn.PoolingHttpClientConnectionManager - Connection released: [id: 0][route: {s}->https://www.timeanddate.com:443][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20] 8 ======>WeatherCrawler getCurrentWeather errorjavax.net.ssl.SSLException: Received fatal alert: protocol_version
到了这里,问题还是没有解决,报错信息跟之前一样,没有进展。
于是,我对照着下面表格,一直尝试协议的组合,直到尝试到{"SSLv2Hello", "TLSv1.2"}的组合,才抓取到页面的html代码。后来发现只用 "TLSv1.2"即可抓取到页面。
SSLContext
Algorithms
The algorithm names in this section can be specified when generating an instance of SSLContext
.
Algorithm Name | Description |
---|---|
SSL | Supports some version of SSL; may support other versions |
SSLv2 | Supports SSL version 2 or later; may support other versions |
SSLv3 | Supports SSL version 3; may support other versions |
TLS | Supports some version of TLS; may support other versions |
TLSv1 | Supports RFC 2246: TLS version 1.0 ; may support other versions |
TLSv1.1 | Supports RFC 4346: TLS version 1.1 ; may support other versions |
TLSv1.2 | Supports RFC 5246: TLS version 1.2 ; may support other versions |
SSLv2Hello | Currently, the SSLv3, TLSv1, and TLSv1.1 protocols allow you to send SSLv3, TLSv1, and TLSv1.1 hellos encapsulated in an SSLv2 format hello. For more details on the reasons for allowing this compatibility in these protocols, see Appendix E in the appropriate RFCs (previously listed). Note that some SSL/TLS servers do not support the v2 hello format and require that client hellos conform to the SSLv3 or TLSv1 client hello formats. The SSLv2Hello option controls the SSLv2 encapsulation. If SSLv2Hello is disabled on the client, then all outgoing messages will conform to the SSLv3/TLSv1 client hello format. If SSLv2Hello is disabled on the server, then all incoming messages must conform to the SSLv3/TLSv1 client hello format. |
最终版代码:
1 /** 2 * 3 */ 4 package com.tcl.mibc.weathercrawler; 5 6 import java.text.SimpleDateFormat; 7 import java.util.Date; 8 9 import javax.net.ssl.SSLContext; 10 11 import org.apache.http.HttpEntity; 12 import org.apache.http.client.methods.CloseableHttpResponse; 13 import org.apache.http.client.methods.HttpPost; 14