java jsoupip设置,使用IP和主机与Jsoup

I would like to access a webpage using ip and host, in order to save DNS lookup times by having stored values for domains. If via sockets, it'd be done by using sockets, transmitting a GET request of the following syntax:

Socket s = new Socket([string_ip_address], 80);

Then transmitting:

Get [file_name] HTTP/1.1\r\n

Host: [some_name]

But I would like to use Jsoup.

The equivalent command to retrieve a page, saw www.google.com, in Jsoup is:

Jsoup.connect("http://www.google.com").get();

But the provided site name must be the actual name, not IP (because, if my limited understanding is correct, many domains can reside in the same ip address). So, I figured I might try and alter the request made by Jsoup, to include both site name and ip. Since Jsoup uses HttpUrlConnection in it's underlying code (here's a code scrap from the Jsoup library itself, as found here: https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/helper/HttpConnection.java):

HttpURLConnection conn = (HttpURLConnection) req.url().openConnection();

conn.setRequestMethod(req.method().name());

conn.setInstanceFollowRedirects(false);

conn.setConnectTimeout(req.timeout());

conn.setReadTimeout(req.timeout());

if (conn instanceof HttpsURLConnection) {

if (!req.validateTLSCertificates()) {

initUnSecureTSL();

((HttpsURLConnection)conn).setSSLSocketFactory(sslSocketFactory);

((HttpsURLConnection)conn).setHostnameVerifier(getInsecureVerifier());

}

}

if (req.method().hasBody())

conn.setDoOutput(true);

if (req.cookies().size() > 0)

conn.addRequestProperty("Cookie", getRequestCookieString(req));

for (Map.Entry header : req.headers().entrySet()) {

conn.addRequestProperty(header.getKey(), header.getValue());

}

I thought about writing something like this:

Jsoup.connect(ip).header("Host", host);

But this doesn't seem to work.

So, is there a known way to use ip + host in Jsoup requests (to spare DNS lookups), or is there some other way to skip the DNS lookup using Jsoup?

Thanks!

EDIT -

Just to be clear:

Using sockets with IP and host name - works. For example, trying to fetch the main page of buzzfeed via IP in the following way:

Socket s = new Socket("23.34.229.118", 80);

BufferedReader reader = new BufferedReader(new InputStreamReader(s.getInputStream()));

PrintStream writer = new PrintStream(s.getOutputStream());

writer.println("GET / HTTP/1.0\r\nHost: www.buzzfeed.com\r\n");

String line;

while((line = reader.readLine()) != null)

{

System.out.println(line);

}

s.close();

Works perfectly fine. But I am unable to access the page via

Jsoup.connect("http://23.34.229.118");

And I am quite sure that's because I need to specify the host somehow, if that's even possible. My attempt with

Jsoup.connect("http://23.34.229.118").header("Host", "buzzfeed.com");

failed and I got a 400 error.

解决方案

I believe I have found the solution.

The following line needs to be added to the code -

System.setProperty("sun.net.http.allowRestrictedHeaders", "true");

This is closely related to this question, since the implementation of Jsoup uses HttpURLConnection:

Can I override the Host header where using java's HttpUrlConnection class?

Apparently, java simply blocks (by default) the ability to change some headers, one of which is the host header.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值