1.问题描述
最近因研究需要,尝试使用SPARQL协议访问Endpoint,在使用Jena的ARQ组件访问DBPedia的Endpoint时出现了Connection Reset异常。在放弃Jena ARQ组件,直接使用Http请求时依然异常,异常的查询语句为 "Select ?p ?o where { <http://dbpedia.org/resource/Alabama> ?p ?o . }" 然而比较奇怪的是,如果将查询换成Endpoint的范例查询"select * where { ?s ?p ?o .} LIMIT 100" 时 程序没有报错,能将结果正常打印出来。有意思的是,如果在网页的endpoint使用"Select ?p ?o where { <http://dbpedia.org/resource/Alabama> ?p ?o . }" 这个查询,结果也能执行。DBPedia的Endpoint链接为http://dbpedia.org/sparql.
以下是使用HTTP发生请求的代码
public class HttpTest {
public static String sendGet(String url, String param) {
String result = "";
BufferedReader in = null;
try {
String urlNameString = url + "?" + param;
URL realUrl = new URL(urlNameString);
// 打开和URL之间的连接
URLConnection connection = realUrl.openConnection();
// 设置通用的请求属性
connection.setRequestProperty("accept", "*/*");
connection.setRequestProperty("connection", "Keep-Alive");
connection.setRequestProperty("user-agent",
"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36");
// 建立实际的连接
connection.connect();
// 获取所有响应头字段
Map<String, List<String>> map = connection.getHeaderFields();
// 遍历所有的响应头字段
for (String key : map.keySet()) {
System.out.println(key + "--->" + map.get(key));
}
// 定义 BufferedReader输入流来读取URL的响应
in = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
String line;
while ((line = in.readLine()) != null) {
result += line;
}
} catch (Exception e) {
System.out.println("发送GET请求出现异常!" + e);
e.printStackTrace();
}
// 使用finally块来关闭输入流
finally {
try {
if (in != null) {
in.close();
}
} catch (Exception e2) {
e2.printStackTrace();
}
}
return result;
}
public static void main(String[] args) throws Exception {
HttpTest httpTest = new HttpTest();
// String query="query="+URLEncoder.encode("select * where { ?s ?p ?o .} LIMIT 100","UTF-8");
String query="query="+URLEncoder.encode("select ?p ?o where { <http://dbpedia.org/resource/Alabama> ?p ?o .}","UTF-8");
System.out.println(httpTest.sendGet("http://dbpedia.org/sparql",query));
//System.out.println(URLEncoder.encode("query= select distinct ?Concept where {[] a ?Concept} LIMIT 100","UTF-8"));
}
}
程序的异常
发送GET请求出现异常!java.net.SocketException: Software caused connection abort: recv failed java.net.SocketException: Software caused connection abort: recv failed at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1890) at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1885) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1884) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1457) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) at name.dxliu.test.HttpTest.sendGet(HttpTest.java:39) at name.dxliu.test.HttpTest.main(HttpTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) Caused by: java.net.SocketException: Software caused connection abort: recv failed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:188) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:675) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441) at sun.net.www.protocol.http.HttpURLConnection.getHeaderFields(HttpURLConnection.java:2966) at name.dxliu.test.HttpTest.sendGet(HttpTest.java:32) ... 6 more Process finished with exit code 0
2.问题解决。
在咨询了实验室同学无果之后,尝试向DBPedia的Support咨询。问题依然没有解决。后来实验室一同学提醒有可能是GW防火墙
的问题,于是我将程序打包到Google Cloud Shell(Google cloud shell的使用,这里不作说明,大概需要在GAE上申请账
号,创建GAE应用,然后方可使用。一个可能有用的连接为https://console.cloud.google.com/)上执行,发现异常查询
能够正常执行。于是下结论,是防火墙锅。
后来将结论发布在了DBPedia的Support社区,社区人员提示我可以测试下,是查询中的某个实体触发了防火墙屏蔽还是
DBPedia的名空间被屏蔽了。后来我将查询改成如下两句:
"select ?p ?o where { <http://dbpedia.org/resource/China> ?p ?o .}"
和"select ?s where { ?s a <http://www.w3.org/2002/07/owl#Thing> } LIMIT 100" 第一条语句里面将
'Alabama'换成了'China'第二条语句里面没有了DBPedia的名空间`http://dbpedia.org/resource/`
结果显示,第一条仍然跑出Connection Reset异常,而第二条语句正常执行。可见GW是把DBPedia的名空间都给屏蔽了。
参考链接贴出我在DBPedia Support中提出的问题链接:https://dbpedia.atlassian.net/wiki/questions/7441078/problem-occurred-during-making-http-request-to-dbpedia-endpoint