使用BufferedReader处理HttpURLConnection.getInputStream()出现阻塞的问题

最新推荐文章于 2024-08-08 16:27:55 发布

想改名的小雄鹿

最新推荐文章于 2024-08-08 16:27:55 发布

阅读量4.7k

点赞数 2

分类专栏： NLP 临时类别文章标签： java bufferedreader阻塞 HttpURLConnection连接

本文链接：https://blog.csdn.net/u013300579/article/details/79084180

版权

在尝试通过HttpURLConnection访问百度百科并处理响应时，遇到BufferedReader的readLine()方法因网络原因导致的阻塞问题。文章探讨了无法预知数据长度、网络稳定性不确定以及设置超时解决方案的尝试与失败，并分享了使用HttpClient作为替代方案的成功经验。作者还计划使用HtmlUnit来解析动态渲染后的页面。

摘要由CSDN通过智能技术生成

业务流程：

我有一个语词列表，想查看在百度百科中是否有对应的词条。需要访问含有中文的指定URL。（题外说一句，由于URL中含有中文，直接访问会乱码，所以需要对中文部分进行编码解决。）由于百科对词条有大量的重定向（301、302等），所以也要对这部分处理。（这部分不是本文重点，所以忽略）。我使用BufferedReader包裹得到的输入流，但是由于readline()方法是阻塞方法。由于网络原因，可能会导致readline()无法得到终止符从而出现阻塞。

比如：

		URL url = new URL("https://baike.baidu.com/search/word?word=Lamy");
		HttpURLConnection httpUrlConn = (HttpURLConnection) url.openConnection();
		httpUrlConn.connect();		
		InputStream inputStream = httpUrlConn.getInputStream();
		InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "utf-8");
		BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
		StringBuilder sb = new StringBuilder();
		String str = "";
		while ((str = bufferedReader.readLine()) != null) {//readline()是阻塞方法，如果接受不到换行符等会抑制阻塞，导致程序停留在while所在行
			sb.append(str);
		}
		bufferedReader.close();
		inputStreamReader.close();
		inputStream.close();
		httpUrlConn.disconnect();
		String res = sb.toString();

网上之前很多人说将while循环去掉，只读取一次，或者服务器关闭套接字通知客户端的BufferedReader。但是，针对我目前的需求，是不合适的。

首先，无法确定从百度百科获取数据的长度，所以，无法只运行一次readline()完成功能。（网上的经验大多数是针对自己的服务器，客户端与服务器之间就数据长度已事先沟通好。）

其次，对于关闭套接字，由于网络不稳定，我无法确定是链接已断开还是传输速度过慢。

第三，针对网上设置超时选项的解决办法如：

httpUrlConn.setConnectTimeout(300);//设置连接超时

httpUrlConn.setReadTimeout(100);//设置建立连接后，到得到数据前的等待超时时间

对于我来说是无用的，因为出现的阻塞是在已经获取到一部分数据后，所以ReadTimeout无效。

void java.net.URLConnection.setReadTimeout(int timeout)


Sets the read timeout to a specified timeout, in milliseconds. A non-zero value specifies the timeout when reading from Input stream when a connection is established to a resource. If the timeout expires before there is data available for read, a java.net.SocketTimeoutException is raised. A timeout of zero is interpreted as an infinite timeout. 

Some non-standard implementation of this method ignores the specified timeout. To see the read timeout set, please call getReadTimeout().
Parameters:timeout an int that specifies the timeout value to be used in millisecondsThrows:IllegalArgumentException -