因为sohu部分页面内容返回格式为 gzip,所以在得到返回结果的是要判断此内容,再进行对内容的解析。 下面是以sohu主页的数据 HttpClient client = new HttpClient(); GetMethod method = new GetMethod("http://www.sohu.com"); try{ int statusCode = client.executeMethod(method); if(method.getResponseHeader("Content-Encoding") != null && method.getResponseHeader("Content-Encoding").getValue() != null && method.getResponseHeader("Content-Encoding").getValue().toLowerCase().indexOf("gzip") != -1){ GZIPInputStream gin = new GZIPInputStream(method.getResponseBodyAsStream()); ByteArrayOutputStream os = new ByteArrayOutputStream(); byte[] bs = new byte[1024]; int len = -1; while((len = gin.read(bs)) != -1){ os.write(bs, 0, len); } System.out.println(os.toString("gb2312")); }else{ System.out.println(method.getResponseBodyAsString()); } }catch(Exception e){ e.printStackTrace(); } finally{ try{ method.releaseConnection(); }catch(Exception e){ e.printStackTrace(); } } 下面是访问sohu主页时服务器返回的响应及其头信息: HTTP/1.1 200 OK Content-Type: text/html Connection: keep-alive Date: Sat, 04 Dec 2010 15:22:41 GMT Server: Apache Vary: Accept-Encoding,X-Up-Calling-Line-id,X-Source-ID,X-Up-Bearer-Type Cache-Control: max-age=105 Expires: Sat, 04 Dec 2010 15:24:26 GMT Last-Modified: Sat, 04 Dec 2010 15:01:20 GMT Content-Encoding: gzip Content-Length: 64524 FSS-Cache: HIT from 10231717.19079087.10957868