请求处理是路由分发之后的业务逻辑,比如 HTTP 请求处理,包括请求参数的解析、表单验证、文件上传等,以及 HTTP 响应发送,包括响应头设置、文件下载、视图模板等。
Go Web编程 |请求处理
Go 语言通过 Request 对象读取 HTTP 请求报文
首先来看 HTTP 请求。了解 HTTP 协议的同学应该都知道,一个完整的 HTTP GET 请求报文结构如下:
包含请求行、请求头(首部字段)和请求实体(请求主体)三部分,请求行中包含了请求方法、URL 和 HTTP 协议版本,请求头中包含了 HTTP 请求首部字段,对于 GET 请求来说,没有提交表单数据,所以请求实体为空,对于 POST 请求来说,会包含包括表单数据的请求实体,对这块不够了解的同学可以网上看下 HTTP 协议或者阅读HTTP 报文简介及组成结构深入探索 HTTP 协议底层原理。
Request 结构体
Go 通过一个 Request
结构体来表示 HTTP 请求报文,这一点,我们在前面的处理器编写时已经看到了,这个结构体位于内置的 net/http
包中,其中包含了 HTTP 请求的所有信息,包括请求 URL、请求头、请求实体、表单信息等,平时常用的、比较重要的一些字段如下所示:
- URL:请求 URL
- Method:请求方法
- Proto:HTTP 协议版本
- Header:请求头(字典类型的键值对集合)
- Body:请求实体(实现了
io.ReadCloser
接口的只读类型) - Form、PostForm、MultipartForm:请求表单相关字段,可用于存储表单请求信息
另外还有很多其他字段,比如 Host、From、ContentLength 等,这里就不一一列举了,感兴趣的同学可以自行去查看。这里你不需要详细的了解Request
结构体,仅需要扫一眼Request
结构体有哪些字段在里面,这样你在写代码的时候会有点印象。
// A Request represents an HTTP request received by a server
// or to be sent by a client.
//
// The field semantics differ slightly between client and server
// usage. In addition to the notes on the fields below, see the
// documentation for Request.Write and RoundTripper.
type Request struct {
// Method specifies the HTTP method (GET, POST, PUT, etc.).
// For client requests, an empty string means GET.
//
// Go's HTTP client does not support sending a request with
// the CONNECT method. See the documentation on Transport for
// details.
Method string
// URL specifies either the URI being requested (for server
// requests) or the URL to access (for client requests).
//
// For server requests, the URL is parsed from the URI
// supplied on the Request-Line as stored in RequestURI. For
// most requests, fields other than Path and RawQuery will be
// empty. (See RFC 7230, Section 5.3)
//
// For client requests, the URL's Host specifies the server to
// connect to, while the Request's Host field optionally
// specifies the Host header value to send in the HTTP
// request.
URL *url.URL
// The protocol version for incoming server requests.
//
// For client requests, these fields are ignored. The HTTP
// client code always uses either HTTP/1.1 or HTTP/2.
// See the docs on Transport for details.
Proto string // "HTTP/1.0"
ProtoMajor int // 1
ProtoMinor int // 0
// Header contains the request header fields either received
// by the server or to be sent by the client.
//
// If a server received a request with header lines,
//
// Host: example.com
// accept-encoding: gzip, deflate
// Accept-Language: en-us
// fOO: Bar
// foo: two
//
// then
//
// Header = map[string][]string{
// "Accept-Encoding": {"gzip, deflate"},
// "Accept-Language": {"en-us"},
// "Foo": {"Bar", "two"},
// }
//
// For incoming requests, the Host header is promoted to the
// Request.Host field and removed from the Header map.
//
// HTTP defines that header names are case-insensitive. The
// request parser implements this by using CanonicalHeaderKey,
// making the first character and any characters following a
// hyphen uppercase and the rest lowercase.
//
// For client requests, certain headers such as Content-Length
// and Connection are automatically written when needed and
// values in Header may be ignored. See the documentation
// for the Request.Write method.
Header Header
// Body is the request's body.
//
// For client requests, a nil body means the request has no
// body, such as a GET request. The HTTP Client's Transport
// is responsible for calling the Close method.
//
// For server requests, the Request Body is always non-nil
// but will return EOF immediately when no body is present.
// The Server will close the request body. The ServeHTTP
// Handler does not need to.
//
// Body must allow Read to be called concurrently with Close.
// In particular, calling Close should unblock a Read waiting
// for input.
Body io.ReadCloser
// GetBody defines an optional func to return a new copy of
// Body. It is used for client requests when a redirect requires
// reading the body more than once. Use of GetBody still
// requires setting Body.
//
// For server requests, it is unused.
GetBody func() (io.ReadCloser, error)
// ContentLength records the length of the associated content.
// The value -1 indicates that the length is unknown.
// Values >= 0 indicate that the given number of bytes may
// be read from Body.
//
// For client requests, a value of 0 with a non-nil Body is
// also treated as unknown.
ContentLength int64
// TransferEncoding lists the transfer encodings from outermost to
// innermost. An empty list denotes the "identity" encoding.
// TransferEncoding can usually be ignored; chunked encoding is
// automatically added and removed as necessary when sending and
// receiving requests.
TransferEncoding []string
// Close indicates whether to close the connection after
// replying to this request (for servers) or after sending this
// request and reading its response (for clients).
//
// For server requests, the HTTP server handles this automatically
// and this field is not needed by Handlers.
//
// For client requests, setting this field prevents re-use of
// TCP connections between requests to the same hosts, as if
// Transport.DisableKeepAlives were set.
Close bool
// For server requests, Host specifies the host on which the
// URL is sought. For HTTP/1 (per RFC 7230, section 5.4), this
// is either the value of the "Host" header or the host name
// given in the URL itself. For HTTP/2, it is the value of the
// ":authority" pseudo-header field.
// It may be of the form "host:port". For international domain
// names, Host may be in Punycode or Unicode form. Use
// golang.org/x/net/idna to convert it to either format if
// needed.
// To prevent DNS rebinding attacks, server Handlers should
// validate that the Host header has a value for which the
// Handler considers itself authoritative. The included
// ServeMux supports patterns registered to particular host
// names and thus protects its registered Handlers.
//
// For client requests, Host optionally overrides the Host
// header to send. If empty, the Request.Write method uses
// the value of URL.Host. Host may contain an international
// domain name.
Host string
// Form contains the parsed form data, including both the URL
// field's query parameters and the PATCH, POST, or PUT form data.
// This field is only available after ParseForm is called.
// The HTTP client ignores Form and uses Body instead.
Form url.Values
// PostForm contains the parsed form data from PATCH, POST
// or PUT body parameters.
//
// This field is only available after ParseForm is called.
// The HTTP client ignores PostForm and uses Body instead.
PostForm url.Values
// MultipartForm is the parsed multipart form, including file uploads.
// This field is only available after ParseMultipartForm is called.
// The HTTP client ignores MultipartForm and uses Body instead.
MultipartForm *multipart.Form
// Trailer specifies additional headers that are sent after the request
// body.
//
// For server requests, the Trailer map initially contains only the
// trailer keys, with nil values. (The client declares which trailers it
// will later send.) While the handler is reading from Body, it must
// not reference Trailer. After reading from Body returns EOF, Trailer
// can be read again and will contain non-nil values, if they were sent
// by the client.
//
// For client requests, Trailer must be initialized to a map containing
// the trailer keys to later send. The values may be nil or their final
// values. The ContentLength must be 0 or -1, to send a chunked request.
// After the HTTP request is sent the map values can be updated while
// the request body is read. Once the body returns EOF, the caller must
// not mutate Trailer.
//
// Few HTTP clients, servers, or proxies support HTTP trailers.
Trailer Header
// RemoteAddr allows HTTP servers and other software to record
// the network address that sent the request, usually for
// logging. This field is not filled in by ReadRequest and
// has no defined format. The HTTP server in this package
// sets RemoteAddr to an "IP:port" address before invoking a
// handler.
// This field is ignored by the HTTP client.
RemoteAddr string
// RequestURI is the unmodified request-target of the
// Request-Line (RFC 7230, Section 3.1.1) as sent by the client
// to a server. Usually the URL field should be used instead.
// It is an error to set this field in an HTTP client request.
RequestURI string
// TLS allows HTTP servers and other software to record
// information about the TLS connection on which the request
// was received. This field is not filled in by ReadRequest.
// The HTTP server in this package sets the field for
// TLS-enabled connections before invoking a handler;
// otherwise it leaves the field nil.
// This field is ignored by the HTTP client.
TLS *tls.ConnectionState
// Cancel is an optional channel whose closure indicates that the client
// request should be regarded as canceled. Not all implementations of
// RoundTripper may support Cancel.
//
// For server requests, this field is not applicable.
//
// Deprecated: Set the Request's context with NewRequestWithContext
// instead. If a Request's Cancel field and context are both
// set, it is undefined whether Cancel is respected.
Cancel <-chan struct{}
// Response is the redirect response which caused this request
// to be created. This field is only populated during client
// redirects.
Response *Response
// ctx is either the client or server context. It should only
// be modified via copying the whole Request using WithContext.
// It is unexported to prevent people from using Context wrong
// and mutating the contexts held by callers of the same request.
ctx context.Context
}
请求 URL
对于一个客户端 HTTP 请求来说,请求行中的最重要的当属 URL 信息,否则无法对服务器发起请求,比如我们访问 Google 首页进行搜索,需要现在浏览器地址栏输入 Google 首页的 URL:
这里的
https://www.google.com
就是请求 URL。
在 Go 语言的 http.Request
对象中,用于表示请求 URL 的 URL 字段是一个 url.URL
类型的指针:
// A URL represents a parsed URL (technically, a URI reference).
//
// The general form represented is:
//
// [scheme:][//[userinfo@]host][/]path[?query][#fragment]
//
// URLs that do not start with a slash after the scheme are interpreted as:
//
// scheme:opaque[?query][#fragment]
//
// Note that the Path field is stored in decoded form: /%47%6f%2f becomes /Go/.
// A consequence is that it is impossible to tell which slashes in the Path were
// slashes in the raw URL and which were %2f. This distinction is rarely important,
// but when it is, the code should use RawPath, an optional field which only gets
// set if the default encoding is different from Path.
//
// URL's String method uses the EscapedPath method to obtain the path. See the
// EscapedPath method for more details.
type URL struct {
Scheme string
Opaque string // encoded opaque data
User *Userinfo // username and password information
Host string // host or host:port
Path string // path (relative paths may omit leading slash)
RawPath string // encoded path hint (see EscapedPath method)
OmitHost bool // do not emit empty host (authority)
ForceQuery bool // append a query ('?') even if RawQuery is empty
RawQuery string // encoded query values, without '?'
Fragment string // fragment for references, without '#'
RawFragment string // encoded fragment hint (see EscapedFragment method)
}
我们来简单介绍下其中常见的字段:
Scheme
表示 HTTP 协议是 HTTPS 还是 HTTP,在上面的例子中是https
;- 对于一些需要认证才能访问的应用,需要提供
User
信息; Host
字段表示域名/主机信息,如果服务器监听端口不是默认的 80 端口的话,还需要通过:端口号
的方式补充端口信息,在上面的例子中是www.google.com
;Path
表示 HTTP 请求路径,一般应用首页是空字符串,或者/
;Query
相关字段表示 URL 中的查询字符串,也就是 URL 中?
之后的部分;Fragment
表示 URL 中的锚点信息,也就是 URL 中#
之后的部分。
因此,常见的 URL 完整格式如下:
scheme://[user@]host/path[?query][#fragment]
如果不包含 /
的话,URL 解析后的结果如下:
scheme:opaque[?query][#fragment]
例如,对 https://xueyuanjun.com/books/golang-tutorials?page=2#comments
而言,Scheme 值是 https
,Host
值是 xueyuanjun.com
,Path
值是 /books/golang-tutorials
,RawQuery
值是 page=2
,我们后面还会演示如何通过 Form
来解析并获取查询字符串中的参数值,Fragment
值是 comments
。
有趣的是,如果请求是从浏览器发送的话,我们无法获取 URL 中的 Fragment
信息,这不是 Go 的问题,而是浏览器根本没有将其发送到服务端。那为什么还要提供这个字段呢?因为不是所有的请求都是从浏览器发送的,而且 Request
也可以在客户端库中使用。
我们可以编写一段测试代码进行演示,还是以 github.com/xueyuanjun/goblog
为例,在 routes/router.go
中,新增如下测试代码:
package routes
import (
"encoding/json"
"github.com/gorilla/mux"
"log"
"net/http"
)
// 返回一个 mux.Router 类型指针,从而可以当作处理器使用
func NewRouter() *mux.Router {
// 创建 mux.Router 路由器示例
router := mux.NewRouter().StrictSlash(true)
// 应用请求日志中间件
router.Use(loggingRequestInfo)
// 遍历 web.go 中定义的所有 webRoutes
for _, route := range webRoutes {
// 将每个 web 路由应用到路由器
router.Methods(route.Method).
Path(route.Pattern).
Name(route.Name).
Handler(route.HandlerFunc)
}
return router
}
// 记录请求日志信息中间件
func loggingRequestInfo(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// 打印请求 URL 明细
url, _ := json.Marshal(r.URL)
log.Println(string(url))
next.ServeHTTP(w, r)
})
}
我们通过新增一个 loggingRequestInfo
中间件记录所有请求 URL 的明细,这里我们通过 JSON 对 URL 对象进行编码,以便可读性更好。
在 goblog
目录下通过 go run main.go
启动 HTTP 服务器:
然后新开一个 Terminal 窗口,通过 curl 运行几组测试请求:
然后就可以在运行 HTTP 服务器的窗口看到请求日志了:
可以看到,Scheme、Host、Fragment 信息都是空的。Fragment 为空的原因上面已经提到,Scheme 需要根据是否启用 HTTPS 进行设置,Host 为空的原因是没有通过代理访问 HTTP 服务器,并且在本地开发环境中,Host 始终为空。
请求头
请求头和响应头都通过 Header
字段表示,Header
是一个键值对字典,键是字符串,值是字符串切片。Header
提供了增删该查方法用于对请求头进行读取和设置。
读取/打印请求头
要获取某个请求头的值很简单,通过 Header
对象提供的 Get
方法,传入对应的字段名即可,比如要获取请求头中 User-Agent
字段,可以这么做:
r.Header.Get("User-Agent")
要打印完整的请求头,传入整个 r.Header
对象到打印函数即可。
我们修改 routes/router.go
中的中间件函数 loggingRequestInfo
,新增打印请求头代码,并且将原来打印 URL 结构体代码调整为打印 URL 字符串:
// 记录请求日志信息中间件
func loggingRequestInfo(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
fmt.Printf("Request URL: %s\n", r.URL)
fmt.Printf("User Agent: %s\n", r.Header.Get("User-Agent"))
fmt.Printf("Request Header: %v\n", r.Header)
next.ServeHTTP(w, r)
})
重新启动 HTTP 服务器,分别通过命令行 curl 和浏览器请求应用首页,可以看到日志信息如下:
由于 curl 访问没有设置额外请求头,所以信息很少,而浏览器会加上很多请求头,所以信息更丰富。
新增/修改/删除请求头
此外,我们还可以通过 Header
提供的 Add
方法新增请求头:
r.Header.Add("test", "value1")
通过 Header
提供的 Set
方法修改请求头:
r.Header.Set("test", "value2")
以及通过 Header
提供的 Del
方法删除请求头:
r.Header.Del("test")
请求实体
请求实体和响应实体都通过 Body
字段表示,该字段是 io.ReadCloser
接口类型。顾名思义,这个类型实现了 io.Reader
和 io.Closer
接口。
io.Reader
提供了 Read
方法,用于读取传入的字节切片并返回读取的字节数以及错误信息,io.Closer
提供了 Close
方法,因此,你可以在 Body
上调用 Read
方法读取请求实体的内容,调用 Close
方法释放资源。
对于请求实体来说,对应的 Body
访问路径是 http.Request.Body
,下面我们编写一段测试代码来演示请求实体的读取,在 goblog/handlers/post.go
中新增一个 AddPost
处理器方法:
func AddPost(w http.ResponseWriter, r *http.Request) {
len := r.ContentLength // 获取请求实体长度
body := make([]byte, len) // 创建存放请求实体的字节切片
r.Body.Read(body) // 调用 Read 方法读取请求实体并将返回内容存放到上面创建的字节切片
io.WriteString(w, string(body)) // 将请求实体作为响应实体返回
}
由于 GET 请求没有请求实体,所以需要通过 POST/PUT/DELETE 之类的请求进行测试,我们在 routes/web.go
中新增一个 Web 路由:
WebRoute{
"NewPost",
"POST",
"/post/add",
handlers.AddPost,
},
重启 HTTP 服务器,要测试这段代码,需要发起 POST 请求:
-id
是两个选项的组合,-i
表示输出 HTTP 响应的详细报文,-d
表示传递的表单数据。HTTP 响应报文与响应头通过空行进行分隔,可以看到,在响应实体中打印的正是传递的请求实体信息。
通常,我们不会一次性获取所有的请求实体信息,而是通过类似 FormValue
之类的方法获取每个请求参数,我们将在下一篇教程中详细介绍如何获取 HTTP 表单请求数据。
把请求和响应走通,就可以写个爬虫了