首先,向一个Web站点发送POST请求只需要简单的几步:
注意,这里不需要导入任何第三方包
- package com.test;
- import java.io.BufferedReader;
- import java.io.IOException;
- import java.io.InputStream;
- import java.io.InputStreamReader;
- import java.io.OutputStreamWriter;
- import java.net.URL;
- import java.net.URLConnection;
- public class TestPost {
- public static void testPost() throws IOException {
- /**
- * 首先要和URL下的URLConnection对话。 URLConnection可以很容易的从URL得到。比如: // Using
- * java.net.URL and //java.net.URLConnection
- */
- URL url = new URL("http://www.faircanton.com/message/check.asp");
- URLConnection connection = url.openConnection();
- /**
- * 然后把连接设为输出模式。URLConnection通常作为输入来使用,比如下载一个Web页。
- * 通过把URLConnection设为输出,你可以把数据向你个Web页传送。下面是如何做:
- */
- connection.setDoOutput(true);
- /**
- * 最后,为了得到OutputStream,简单起见,把它约束在Writer并且放入POST信息中,例如: ...
- */
- OutputStreamWriter out = new OutputStreamWriter(connection
- .getOutputStream(), "8859_1");
- out.write("username=kevin&password=*********"); //post的关键所在!
- // remember to clean up
- out.flush();
- out.close();
- /**
- * 这样就可以发送一个看起来象这样的POST:
- * POST /jobsearch/jobsearch.cgi HTTP 1.0 ACCEPT:
- * text/plain Content-type: application/x-www-form-urlencoded
- * Content-length: 99 username=bob password=someword
- */
- // 一旦发送成功,用以下方法就可以得到服务器的回应:
- String sCurrentLine;
- String sTotalString;
- sCurrentLine = "";
- sTotalString = "";
- InputStream l_urlStream;
- l_urlStream = connection.getInputStream();
- // 传说中的三层包装阿!
- BufferedReader l_reader = new BufferedReader(new InputStreamReader(
- l_urlStream));
- while ((sCurrentLine = l_reader.readLine()) != null) {
- sTotalString += sCurrentLine + "/r/n";
- }
- System.out.println(sTotalString);
- }
- public static void main(String[] args) throws IOException {
- testPost();
- }
- }
执行的结果:(果真是返回了验证后的html阿!神奇!)
- <html>
- <head>
- <meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
- <title>账户已经冻结</title>
- <style type="text/css">
- <!--
- .temp {
- font-family: Arial, Helvetica, sans-serif;
- font-size: 14px;
- font-weight: bold;
- color: #666666;
- margin: 10px;
- padding: 10px;
- border: 1px solid #999999;
- }
- .STYLE1 {color: #FF0000}
- -->
- </style>
- </head>
- <body>
- <p> </p>
- <p> </p>
- <p> </p>
- <table width="700" border="0" align="center" cellpadding="0" cellspacing="0" class="temp">
- <tr>
- <td width="135" height="192"><div align="center"><img src="images/err.jpg" width="54" height="58"></div></td>
- <td width="563"><p><span class="STYLE1">登录失败</span><br>
- <br>
- 您的帐户活跃指数低于系统限制,您的帐户已经被暂时冻结。<br>
- 请您联系网络主管或者人事主管重新激活您的帐户。</p>
- </td>
- </tr>
- </table>
- <p> </p>
- </body>
- </html>
一些Web站点用POST形式而不是GET,这是因为POST能够携带更多的数据,而且不用URL,这使得它看起来不那么庞大。使用上面列出的大致的代码,Java代码可以和这些站点轻松的实现对话。
得到html以后,分析内容就显得相对轻松了。现在就可以使用htmlparser了,下面是一个简单的示例程序,过多的解释我就不说了,相信代码能够说明一切的!
- package com.test;
- import org.htmlparser.Node;
- import org.htmlparser.NodeFilter;
- import org.htmlparser.Parser;
- import org.htmlparser.filters.TagNameFilter;
- import org.htmlparser.tags.TableTag;
- import org.htmlparser.util.NodeList;
- /**
- * 标题:利用htmlparser提取网页纯文本的例子
- */
- public class TestHTMLParser {
- public static void testHtml() {
- try {
- String sCurrentLine;
- String sTotalString;
- sCurrentLine = "";
- sTotalString = "";
- java.io.InputStream l_urlStream;
- java.net.URL l_url = new java.net.URL("http://www.ideagrace.com/html/doc/2006/07/04/00929.html");
- java.net.HttpURLConnection l_connection = (java.net.HttpURLConnection) l_url.openConnection();
- l_connection.connect();
- l_urlStream = l_connection.getInputStream();
- java.io.BufferedReader l_reader = new java.io.BufferedReader(new java.io.InputStreamReader(l_urlStream));
- while ((sCurrentLine = l_reader.readLine()) != null) {
- sTotalString += sCurrentLine+"/r/n";
- // System.out.println(sTotalString);
- }
- String testText = extractText(sTotalString);
- System.out.println( testText );
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- public static String extractText(String inputHtml) throws Exception {
- StringBuffer text = new StringBuffer();
- Parser parser = Parser.createParser(new String(inputHtml.getBytes(),"GBK"), "GBK");
- // 遍历所有的节点
- NodeList nodes = parser.extractAllNodesThatMatch(new NodeFilter() {
- public boolean accept(Node node) {
- return true;
- }
- });
- System.out.println(nodes.size()); //打印节点的数量
- for (int i=0;i<nodes.size();i++){
- Node nodet = nodes.elementAt(i);
- //System.out.println(nodet.getText());
- text.append(new String(nodet.toPlainTextString().getBytes("GBK"))+"/r/n");
- }
- return text.toString();
- }
- public static void test5(String resource) throws Exception {
- Parser myParser = new Parser(resource);
- myParser.setEncoding("GBK");
- String filterStr = "table";
- NodeFilter filter = new TagNameFilter(filterStr);
- NodeList nodeList = myParser.extractAllNodesThatMatch(filter);
- TableTag tabletag = (TableTag) nodeList.elementAt(11);
- }
- public static void main(String[] args) throws Exception {
- // test5("http://www.ggdig.com");
- testHtml();
- }
- }
最常用的Http请求无非是get和post,get请求可以获取静态页面,也可以把参数放在URL字串后面,传递给servlet。post与get的不同之处在于post的参数不是放在URL字串里面,而是放在http请求的正文内。
在Java中可以使用HttpURLConnection发起这两种请求,了解此类,对于了解soap,和编写servlet的自动测试代码都有很大的帮助。
下面的代码简单描述了如何使用HttpURLConnection发起这两种请求,以及传递参数的方法:
readContentFromPost() 函数产生了一个post请求,传给servlet一个firstname参数,值为"一个大肥人"。
HttpURLConnection.connect函数,实际上只是建立了一个与服务器的 tcp连接,并没有实际发送http请求。无论是post还是get,http请求实际上直到 HttpURLConnection .getInputStream()这个函数里面才正式发送出去。
在 readContentFromPost() 中,顺序是重中之重,对connection对象的一切配置(那一堆set函数)都必须要在connect()函数执行之前完成。而对 outputStream的写操作,又必须要在inputStream的读操作之前。这些顺序实际上是由http请求的格式决定的。
http请求实际上由两部分组成,一个是 http头(head),所有关于此次http请求的配置都在http头里面定义,一个是正文(content),在connect()函数里面,会根据 HttpURLConnection对象的配置值生成http头,因此在调用connect函数之前,就必须把所有的配置准备好。
紧接着http头的是http请求的正文,正文的内容通过outputStream写入,实际上outputStream不是一个网络流,充其量是个字符串流,往里面写入的东西不会立即发送到网络,而是在流关闭后,根据输入的内容生成http正文。
至此,http请求的东西已经准备就绪。在 getInputStream()函数调用的时候,就会把准备好的http请求正式发送到服务器了,然后返回一个输入流,用于读取服务器对于此次http 请求的返回信息。由于http请求在getInputStream的时候已经发送出去了(包括http头和正文),因此在 getInputStream()函数之后对connection对象进行设置(对http头的信息进行修改)或者写入outputStream(对正文进行修改)都是没有意义的了,甚至执行这些操作可能会导致异常的发生
JDK 中提供了一些对无状态协议请求(HTTP )的支持,下面我就将我所写的一个小例子(组件)进行描述:
首先让我们先构建一个请求类(HttpRequester )。
该类封装了 JAVA 实现简单请求的代码,如下:
- import java.io.BufferedReader;
- import java.io.IOException;
- import java.io.InputStream;
- import java.io.InputStreamReader;
- import java.net.HttpURLConnection;
- import java.net.URL;
- import java.nio.charset.Charset;
- import java.util.Map;
- import java.util.Vector;
- /**
- * HTTP请求对象
- *
- * @author YYmmiinngg
- */
- public class HttpRequester {
- private String defaultContentEncoding;
- public HttpRequester() {
- this.defaultContentEncoding = Charset.defaultCharset().name();
- }
- /**
- * 发送GET请求
- *
- * @param urlString
- * URL地址
- * @return 响应对象
- * @throws IOException
- */
- public HttpRespons sendGet(String urlString) throws IOException {
- return this.send(urlString, "GET", null, null);
- }
- /**
- * 发送GET请求
- *
- * @param urlString
- * URL地址
- * @param params
- * 参数集合
- * @return 响应对象
- * @throws IOException
- */
- public HttpRespons sendGet(String urlString, Map<String, String> params)
- throws IOException {
- return this.send(urlString, "GET", params, null);
- }
- /**
- * 发送GET请求
- *
- * @param urlString
- * URL地址
- * @param params
- * 参数集合
- * @param propertys
- * 请求属性
- * @return 响应对象
- * @throws IOException
- */
- public HttpRespons sendGet(String urlString, Map<String, String> params,
- Map<String, String> propertys) throws IOException {
- return this.send(urlString, "GET", params, propertys);
- }
- /**
- * 发送POST请求
- *
- * @param urlString
- * URL地址
- * @return 响应对象
- * @throws IOException
- */
- public HttpRespons sendPost(String urlString) throws IOException {
- return this.send(urlString, "POST", null, null);
- }
- /**
- * 发送POST请求
- *
- * @param urlString
- * URL地址
- * @param params
- * 参数集合
- * @return 响应对象
- * @throws IOException
- */
- public HttpRespons sendPost(String urlString, Map<String, String> params)
- throws IOException {
- return this.send(urlString, "POST", params, null);
- }
- /**
- * 发送POST请求
- *
- * @param urlString
- * URL地址
- * @param params
- * 参数集合
- * @param propertys
- * 请求属性
- * @return 响应对象
- * @throws IOException
- */
- public HttpRespons sendPost(String urlString, Map<String, String> params,
- Map<String, String> propertys) throws IOException {
- return this.send(urlString, "POST", params, propertys);
- }
- /**
- * 发送HTTP请求
- *
- * @param urlString
- * @return 响映对象
- * @throws IOException
- */
- private HttpRespons send(String urlString, String method,
- Map<String, String> parameters, Map<String, String> propertys)
- throws IOException {
- HttpURLConnection urlConnection = null;
- if (method.equalsIgnoreCase("GET") && parameters != null) {
- StringBuffer param = new StringBuffer();
- int i = 0;
- for (String key : parameters.keySet()) {
- if (i == 0)
- param.append("?");
- else
- param.append("&");
- param.append(key).append("=").append(parameters.get(key));
- i++;
- }
- urlString += param;
- }
- URL url = new URL(urlString);
- urlConnection = (HttpURLConnection) url.openConnection();
- urlConnection.setRequestMethod(method);
- urlConnection.setDoOutput(true);
- urlConnection.setDoInput(true);
- urlConnection.setUseCaches(false);
- if (propertys != null)
- for (String key : propertys.keySet()) {
- urlConnection.addRequestProperty(key, propertys.get(key));
- }
- if (method.equalsIgnoreCase("POST") && parameters != null) {
- StringBuffer param = new StringBuffer();
- for (String key : parameters.keySet()) {
- param.append("&");
- param.append(key).append("=").append(parameters.get(key));
- }
- urlConnection.getOutputStream().write(param.toString().getBytes());
- urlConnection.getOutputStream().flush();
- urlConnection.getOutputStream().close();
- }
- return this.makeContent(urlString, urlConnection);
- }
- /**
- * 得到响应对象
- *
- * @param urlConnection
- * @return 响应对象
- * @throws IOException
- */
- private HttpRespons makeContent(String urlString,
- HttpURLConnection urlConnection) throws IOException {
- HttpRespons httpResponser = new HttpRespons();
- try {
- InputStream in = urlConnection.getInputStream();
- BufferedReader bufferedReader = new BufferedReader(
- new InputStreamReader(in));
- httpResponser.contentCollection = new Vector<String>();
- StringBuffer temp = new StringBuffer();
- String line = bufferedReader.readLine();
- while (line != null) {
- httpResponser.contentCollection.add(line);
- temp.append(line).append("\r\n");
- line = bufferedReader.readLine();
- }
- bufferedReader.close();
- String ecod = urlConnection.getContentEncoding();
- if (ecod == null)
- ecod = this.defaultContentEncoding;
- httpResponser.urlString = urlString;
- httpResponser.defaultPort = urlConnection.getURL().getDefaultPort();
- httpResponser.file = urlConnection.getURL().getFile();
- httpResponser.host = urlConnection.getURL().getHost();
- httpResponser.path = urlConnection.getURL().getPath();
- httpResponser.port = urlConnection.getURL().getPort();
- httpResponser.protocol = urlConnection.getURL().getProtocol();
- httpResponser.query = urlConnection.getURL().getQuery();
- httpResponser.ref = urlConnection.getURL().getRef();
- httpResponser.userInfo = urlConnection.getURL().getUserInfo();
- httpResponser.content = new String(temp.toString().getBytes(), ecod);
- httpResponser.contentEncoding = ecod;
- httpResponser.code = urlConnection.getResponseCode();
- httpResponser.message = urlConnection.getResponseMessage();
- httpResponser.contentType = urlConnection.getContentType();
- httpResponser.method = urlConnection.getRequestMethod();
- httpResponser.connectTimeout = urlConnection.getConnectTimeout();
- httpResponser.readTimeout = urlConnection.getReadTimeout();
- return httpResponser;
- } catch (IOException e) {
- throw e;
- } finally {
- if (urlConnection != null)
- urlConnection.disconnect();
- }
- }
- /**
- * 默认的响应字符集
- */
- public String getDefaultContentEncoding() {
- return this.defaultContentEncoding;
- }
- /**
- * 设置默认的响应字符集
- */
- public void setDefaultContentEncoding(String defaultContentEncoding) {
- this.defaultContentEncoding = defaultContentEncoding;
- }
- }
其次我们来看看响应对象(HttpRespons )。 响应对象其实只是一个数据BEAN ,由此来封装请求响应的结果数据,如下:
- import java.util.Vector;
- /**
- * 响应对象
- */
- public class HttpRespons {
- String urlString;
- int defaultPort;
- String file;
- String host;
- String path;
- int port;
- String protocol;
- String query;
- String ref;
- String userInfo;
- String contentEncoding;
- String content;
- String contentType;
- int code;
- String message;
- String method;
- int connectTimeout;
- int readTimeout;
- Vector<String> contentCollection;
- public String getContent() {
- return content;
- }
- public String getContentType() {
- return contentType;
- }
- public int getCode() {
- return code;
- }
- public String getMessage() {
- return message;
- }
- public Vector<String> getContentCollection() {
- return contentCollection;
- }
- public String getContentEncoding() {
- return contentEncoding;
- }
- public String getMethod() {
- return method;
- }
- public int getConnectTimeout() {
- return connectTimeout;
- }
- public int getReadTimeout() {
- return readTimeout;
- }
- public String getUrlString() {
- return urlString;
- }
- public int getDefaultPort() {
- return defaultPort;
- }
- public String getFile() {
- return file;
- }
- public String getHost() {
- return host;
- }
- public String getPath() {
- return path;
- }
- public int getPort() {
- return port;
- }
- public String getProtocol() {
- return protocol;
- }
- public String getQuery() {
- return query;
- }
- public String getRef() {
- return ref;
- }
- public String getUserInfo() {
- return userInfo;
- }
- }
最后,让我们写一个应用类,测试以上代码是否正确
- import com.yao.http.HttpRequester;
- import com.yao.http.HttpRespons;
- public class Test {
- public static void main(String[] args) {
- try {
- HttpRequester request = new HttpRequester();
- HttpRespons hr = request.sendGet("http://www.csdn.net");
- System.out.println(hr.getUrlString());
- System.out.println(hr.getProtocol());
- System.out.println(hr.getHost());
- System.out.println(hr.getPort());
- System.out.println(hr.getContentEncoding());
- System.out.println(hr.getMethod());
- System.out.println(hr.getContent());
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
- }