您必须设置正确的编码.您可以在HTTP标头中找到编码:
Content-Type: text/html; charset=ISO-8859-1
我可以想象您必须考虑许多其他问题才能解析网页而不会出现错误.但是有多种Java可用的HTTP客户端库,例如org.apache.httpcomponents.该代码将如下所示:
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpGet = new HttpGet("http://www.spiegel.de");
try
{
HttpResponse response = httpclient.execute(httpGet);
HttpEntity entity = response.getEntity();
if (entity != null)
{
System.out.println(EntityUtils.toString(entity));
}
}
catch (ClientProtocolException e) {e.printStackTrace();}
catch (IOException e) {e.printStackTrace();}
这是Maven工件:
org.apache.httpcomponents
httpclient
4.1.1
jar
compile