HTTP编程

最新推荐文章于 2024-07-23 17:30:00 发布

掉头发的胡程序员

最新推荐文章于 2024-07-23 17:30:00 发布

阅读量251

点赞数 7

文章标签： http 网络协议网络

本文链接：https://blog.csdn.net/m0_46502752/article/details/125857921

版权

http协议概述

什么是HTTP?HTTP就是目前使用最广泛的Web应用程序使用的基础协议，例如，浏览器访问网站，手机APP访问后台服务器，都是通过HTTP协议实现的。HTTP是HyperText Transfer Protocol的缩写，翻译为超文本传输协议。它是基于TCP协议之上的一种请求-响应协议。

我们来看一下浏览器请求访问某个网站发送的HTTP请求-响应。当浏览器希望访问某个网站时，浏览器和网站服务器之间先建立TCP连接，且服务器总是使用80端口和加密端口443，然后，浏览器向服务器发送一个HTTP请求，服务器收到后，返回一个HTTP响应，并且在响应中包含了HTML的网页内容，这样，浏览器解析HTML后就可以给用户显示网页了。一个完整的HTTP请求-响应如下：

            GET / HTTP/1.1
            Host: www.sina.com.cn
            User-Agent: Mozilla/5 MSIE
            Accept: */*                ┌────────┐
┌─────────┐ Accept-Language: zh-CN,en  │░░░░░░░░│
│O ░░░░░░░│───────────────────────────>├────────┤
├─────────┤<───────────────────────────│░░░░░░░░│
│         │ HTTP/1.1 200 OK            ├────────┤
│         │ Content-Type: text/html    │░░░░░░░░│
└─────────┘ Content-Length: 133251     └────────┘
  Browser   <!DOCTYPE html>              Server
            <html><body>
            <h1>Hello</h1>
            ...

HTTP请求的格式是固定的，他由HTTP Header和HTTP Body两部分构成。第一行总是请求方法路径HTTP版本：例如，GET/HTTP/1.1 表示使用GET请求，路径是/,版本是HTTP/1.1。

后续的每一行都是固定的Header:Value格式，我们称为HTTP Header,服务器依靠某一些特定的Header来识别客户端请求，例如:

Host:表示请求的域名，因为一台服务器上可能有多个网站，因此有必要依靠Host来识别请求是发给哪个网站的；

User-Agent:表示客户端自身标识信息，不同的浏览器有不同的标识，服务器依靠User-Agent判断客户端类型是IE还是Chrome,是Firefox还是一个Python爬虫；

Accept:表示客户端能处理的HTTP响应格式，*/*表示任意格式，text/*表示任意文本，image/png表示PNG格式的图片。

Accept-Language:表示客户端接受的语言，多种语言按优先级排序，服务器依靠该字段给用户返回特定语言的网页版本。

如果是GET请求，那么该HTTP请求只有HTTP Header,没有HTTP Body.如果是POST请求，那么该HTTP请求带有Body。

POST /login HTTP/1.1
Host: www.example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 30

username=hello&password=123456

POST请求通常要设置Content-Type表示Body的类型,Content-Length表示Body的长度，这样服务器就可以根据请求的Header和Body做出正确的响应.

POST /login HTTP/1.1
Content-Type: application/json
Content-Length:38

{"username":"bob","passwork":"123456"}

HTTP响应也是由Header和Body两部分组成，一个典型的HTTP响应如下:

POST /1.1 200 OK
Content-Type:text/html
Content-Length:133251

<!DOCTYPE html>
<html><body>
<h1>hello</h1>
...

响应的第一行总是HTTP版本响应代码相应说明：例如：HTTP/1.1 200 ok表示版本是HTTP/1.1,响应代码是200，相应说明是OK.客户端只依赖响应代码判断HTTP响应是否成功。HTTP有固定的响应代码：

1xx:表示一个提示性响应，例如101表示将切换协议，常见于WebSocket连接;

2xx:表示一个成功的响应，例如200表示成功，206表示只发送了部分内容；

3xx:表示一个重定向的响应，例如301表示永久重定向，303表示客户端应该按指定路径重新发送请求;

404:表示一个因为客户端问题导致的错误响应，例如400表示因为Content-Type等各种原因导致的无效请求，404表示指定的路径不存在；

5xx:表示一个因为服务器问题导致的错误响应，例如500表示服务器内部故障，503表示服务器暂时无法响应。

对于最早期的HTTP/1.0协议，每次发送一个HTTP请求，客户端都需要先创建一个新的TCP连接，然后，收到服务器响应后，关闭这个TCP连接。由于建立TCP连接就比较耗时，因此，为了提高效率，HTTP/1.1协议允许在一个TCP连接中反复发送-响应，这样就能大大提高效率。

                       ┌─────────┐
┌─────────┐            │░░░░░░░░░│
│O ░░░░░░░│            ├─────────┤
├─────────┤            │░░░░░░░░░│
│         │            ├─────────┤
│         │            │░░░░░░░░░│
└─────────┘            └─────────┘
     │      request 1       │
     │─────────────────────>│
     │      response 1      │
     │<─────────────────────│
     │      request 2       │
     │─────────────────────>│
     │      response 2      │
     │<─────────────────────│
     │      request 3       │
     │─────────────────────>│
     │      response 3      │
     │<─────────────────────│
     ▼                      ▼

因为HTTP协议是一个请求-响应协议，客户端在发送一个HTTP请求后，必须等待服务器响应后，才能发送下一个请求，这样一来，如果某个响应太慢，它就会堵住后面的请求。所以，为了进一步提速，HTTP/2.0允许客户端在没有收到响应的时候，发送多个HTTP请求，服务器返回响应的时候，不一定按顺序返回，只要双方能识别出哪个响应对应哪个请求，就可以做到并行发送和接收。

                       ┌─────────┐
┌─────────┐            │░░░░░░░░░│
│O ░░░░░░░│            ├─────────┤
├─────────┤            │░░░░░░░░░│
│         │            ├─────────┤
│         │            │░░░░░░░░░│
└─────────┘            └─────────┘
     │      request 1       │
     │─────────────────────>│
     │      request 2       │
     │─────────────────────>│
     │      response 1      │
     │<─────────────────────│
     │      request 3       │
     │─────────────────────>│
     │      response 3      │
     │<─────────────────────│
     │      response 2      │
     │<─────────────────────│
     ▼                      ▼

案例：根据服务器使用TCP连接处理客户端的HTTP请求

try(ServerSocket server=new ServerSocket(8080)){
        while(true){
            //获取客户浏览器的连接
            Socket borwerClent=server.accept();
            //读取客户端的请求(request)
            BufferedReader reader=new BufferedReader(new InputStreamReader(browserClient.getInputStream()));
        String line=null;
         while((line=reader.readLine())!=null){
            System.out.println(line);
}catch(Exception e) {
				e.printStackTrace();
			}

模拟服务器的响应(response)

BufferedWriter writer=new BufferedWriter(
        new OutputStreamWriter(
                browserClient.getOutputStream()));){
        writer.write("HTTP/1.1 200 OK");
        writer.newLine();
        writer.write(UUID.randomUUID().toString());
        }
}

爬虫

爬取网页某一张图片

try {
			URL imageUrl=new URL("https://img9.doubanio.com/view/photo/l/public/p2580910036.webp");
			HttpURLConnection connection=(HttpURLConnection)imageUrl.openConnection();
			connection.setRequestMethod("GET");
			connection.setRequestProperty("user-agent"," Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36 SE 2.X MetaSr 1.0");
			try (BufferedInputStream bis=new BufferedInputStream(connection.getInputStream());
				BufferedOutputStream bos=new BufferedOutputStream(new FileOutputStream("c:\\test\\img\\doubanjiang\\"+System.currentTimeMillis()+".webp"));){
				
				byte[] buff=new byte[1024];
				int len=-1;
				while((len=bis.read(buff))!=-1) {
					bos.write(buff,0,len);
				}
			} catch (IOException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		} catch (MalformedURLException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

爬取一张网页内所有图片

try {
			URL imageUrl=new URL("https://movie.douban.com/");
			HttpURLConnection connection=(HttpURLConnection)imageUrl.openConnection();
			connection.setRequestMethod("GET");
			connection.setRequestProperty("User-Agent"," Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36 SE 2.X MetaSr 1.0");
			try(BufferedReader reader=new BufferedReader(new InputStreamReader(connection.getInputStream(),StandardCharsets.UTF_8));){
				
			
			String line=null;
			while((line=reader.readLine())!=null) {
				line=line.trim();
				//使用jsoup解析html
				//JSOUP类的作用进行原始解析
				//Document类：网页文档(包含解析到的所有标签)
				//Elements类：若干元素Element形成的集合(继承自ArrayList)
				//Element类:某一个HTML元素
				//提取图片的路径src,电影名称alt
				String src="",alt="";
				Document doc= Jsoup.parse(line);
				Element imgElement=doc.getElementsByTag("img").first();
				src=imgElement.attr("src");
				alt=imgElement.attr("alt");
				
//				if((line.startsWith("<img")&&line.contains("https://img")&&line.contains(".jpg"))){
//					
//				int beginIndex=line.indexOf("https://img");
//				int endIndex=line.indexOf(".jpg")+4;
//				String src=line.substring(beginIndex, endIndex);
//				beginIndex=line.indexOf("alt")+5;
//				endIndex=line.indexOf("\"",beginIndex);
//				String alt=line.substring(beginIndex,endIndex);
//				System.out.println(src);
//				System.out.println(alt);
				URL imageUrlsrc=new URL(src);
				HttpURLConnection imageUrlConnection=(HttpURLConnection)imageUrlsrc.openConnection();
				try(BufferedInputStream in=new BufferedInputStream(imageUrlConnection.getInputStream());
				BufferedOutputStream out=new BufferedOutputStream(new FileOutputStream("c:\\test\\img\\doubanjiang\\"+alt+".jpg"));){
					byte[] buff=new byte[1024];
					int len=-1;
					while((len=in.read(buff))!=-1) {
						out.write(buff,0,len);
					}
				}catch (Exception e) {
					// TODO: handle exception
				}
				}
			}}catch (Exception e) {
				// TODO: handle exception
			}}
		
//		} catch (MalformedURLException e) {
//			// TODO Auto-generated catch block
//			e.printStackTrace();

掉头发的胡程序员

关注

7
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
HTTP编程

我们来看一下浏览器请求访问某个网站发送的HTTP请求-响应。当浏览器希望访问某个网站时，浏览器和网站服务器之间先建立TCP连接，且服务器总是使用80端口和加密端口443，然后，浏览器向服务器发送一个HTTP请求，服务器收到后，返回一个HTTP响应，并且在响应中包含了HTML的网页内容，这样，浏览器解析HTML后就可以给用户显示网页了。因为HTTP协议是一个请求-响应协议，客户端在发送一个HTTP请求后，必须等待服务器响应后，才能发送下一个请求，这样一来，如果某个响应太慢，它就会堵住后面的请求。...
复制链接

扫一扫