RESTful服务最佳实践——（八）

最新推荐文章于 2024-08-28 07:30:00 发布

cleardo

最新推荐文章于 2024-08-28 07:30:00 发布

阅读量1.1k

点赞数

分类专栏：测试

测试专栏收录该内容

31 篇文章 1 订阅

订阅专栏

查询，过滤和分页

在大数据集方面，从带宽的角度来说，限制返回的数据量是很重要的。而从UI处理的角度来说，限制数据量也同样重要，因为UI通常只能展现大数据集中的一小部分数据。在数据集的增长速度不确定的情况下，限制数据默认返回量是是很有利的。以推特为例，要获取某个用户的推文（根据个人主页的时间轴），在请求没有特别指定的情况下将只会返回20条记录，尽管系统最多可以返回200条记录。

除了返回数据量的限制外，如果不仅仅要对第一个子数据进行检索，我们还需考虑如何对大数据集进行“分页”或下拉。创造数据的“页码”，返回大数据列表的已知片段，进而能够标出数据的“前一页”和“后一页”，这一行为被称为分页。另外，我们也想要将指定的资源字段或属性加入到响应中，从而限制返回值的数量，最终能够通过特定值来进行查询操作，并能够对返回值进行排序。

有一种方法，将限制查询结果和实现分页的方法相结合。它是在以页码为导向（请求中要给出每一页的记录数及页码）或以记录为导向（请求中直接给出第一条记录和最后一条记录）中任选一个作为索引方案，来确定返回值的起始位置。举个例子，两种方法分别表示为“给出第五页（假设每页有20条记录）的记录”或“给出第100到第120条的记录”。

服务端将根据运作机制来进行切分。像Dojo Json数据存储对象这类的UI工具则选用模拟HTTP的字节范围规范。若服务端能做到UI工具和后端服务之间无需翻译开箱即用，这将会很有帮助。

下文将介绍一种方法，既能够支持Dojo这样的分页模式（在请求头中给出记录的范围），也能支持使用字符查询参数。这样一来服务端将变得更加灵活，既可以使用类似Dojo一样先进的UI工具集，也可以使用简单直接的链接、标签，无需再为此增加开发工作的复杂程度。但若服务不直接支持UI功能，可以考虑不要在请求头中给出记录范围。

要特别指出的是，查询、过滤和分页并不推荐用于所有服务。这些行为并不是默认支持所有资源，而是有特定的资源。服务和资源的文档中应告知在哪些终端能支持这些复杂的功能。

结果限制

“给我从第3到第55条的记录”这种请求数据的方法和HTTP的字节范围规范更一致，因此我们可以用来比喻Range头标识。而“从第2条记录开始，给我最多20条记录”这种写更易于阅读、阐述、理解，因此我们通常用这种比喻描述支持字符查询参数。

综上所述，推荐的做法是既支持使用HTTP Range头，也支持使用字符查询参数–偏移量（offset）和限制（limit）条件，在服务端对响应结果进行限制。要注意，为了同时支持这些选项，字符查询参数重写Range头标记。

首先你可能会问这个问题：“为什么这两种功能相似的隐喻方法请求得出的数据都不匹配？这能不让人混淆吗？”恩…这是两个问题。首先要回答你，这的确会让人混淆。重点是我们希望字符查询更加清晰易懂，人们很容易就能进行创建和分析。而根据HTTP规范，Range头更多的是由机器使用。

简而言之，解析Range头的记录值将会增加复杂度，客户端要创建请求也必须要进行一些估算。而使用单独的限制和偏移参数会好理解、易创建，且对人员因素没有什么要求。

用范围标记进行限制

当请求用HTTP头部来获取记录的范围而不是字符查询参数时，应包含如下的Range头来指定范围：

Range: items=0-24

注意记录是从0开始的连续字段，HTTP规范中说明了如何使用Range头来请求字节。换句话说，若要请求数据集中的第一条记录，范围字符应当从0开始计算。上述的请求将会返回前25个记录（假设数据集中至少有25条记录）。

在服务端，通过检查请求头中的Range来确定该返回哪些记录。一旦有Range存在，将会有一个简单的标准表达式（如”items=(\d+)-(\d+)”）对其进行解析，来检索个别范围值。

用字符查询参数进行限制

字符查询是Range标记的替代选择，它使用offset和limit作为参数名，其中offset代表要查询的第一条记录编号（与上述的用于范围标记的items第一个数字相同），limit代表记录的最大条数。以下是与上述范围标记例子一致的字符查询参数：

GET http://api.example.com/resources?offset=0&limit=25

Offset值与Range头中的记录类似，也是从0开始计算。Limit的值是返回记录的最大数量。当字符查询中的限制条件未指定时，服务端可自行设置缺省值的和最大的limit，但一定要有文档来说明这些“无形的”设置。

要注意，当有使用字符查询参数时，参数值应覆盖Range标记的值。

基于范围的响应

对一个基于范围的请求来说，无论是通过HTTP的Range头还是字符查询参数，服务端都该有一个Content-Range响应头来表明将返回多少记录以及总共还有多少条可检索的记录：

Content-Range: items 0-24/66

需注意可获取的总记录条数（如本案中的66）不是从0开始计算。因此若请求数据集中最后几条记录，响应头中的Content-Range该这样写：

Content-Range: items 40-65/66

根据HTTP规范，若响应时的记录数未知或计算成本太高，用星号（“*”）来替代总记录数（如本案中的66）也是可以的。本案中响应头也可这样写：

Content-Range: items 40-65/*

不过要注意，Dojo和其他的UI工具可能不支持这个符号。

分页

上述的响应限制机制通过允许请求方指定它们需要的数据集记录来实现分页。在上一个案例中共有66条记录，在每页大小为25的情况下检索第二“页”数据，用Range标记表示如下：

Range: items=25-49

同样的内容用字符查询参数表示如下：

GET …?offset=25&limit=25

因此服务端将返回一组数据，其中Content-Range响应头为：

Content-Range: 25-49/66

这在大部分情况下都可使用。但偶尔记录数量无法直接转换为数据集中的行。另外有一些变化极快、不断有新数据插入列表头部中的数据集，这种情况下必然会出现一些像数据重复这样的“分页问题”。

按日期排列的数据集（比如推特）就是一种常见的情况。虽然你还是可以对数据进行分页，但有时用“after”或“before”这样的关键字与Range头（或者与字符查询参数–offset和limit）的组合方式会更简洁易懂。

例如，要检索出给定时间戳的前20条评论：

GET http://www.example.com/remarks/home_timeline?after=<timestamp>
Range: items=0-19
GET http://www.example.com/remarks/home_timeline?before=<timestamp>
Range: items=0-19

用字符查询参数表示为：

GET http://www.example.com/remarks/home_timeline?after=<timestamp>&offset=0&limit=20
GET http://www.example.com/remarks/home_timeline?before=<timestamp>&offset=0&limit=20

下文的时间处理片段将介绍在不同情况下对时间戳的格式化和处理。

若在请求方没有设置范围标识的情况下，服务端返回了一组默认数据子集或要求的最大数量，那么服务端同时也应返回包含Content-Range的响应头来和客户端确认限制条件。以上文中个人主页的时间轴为例，无论客户端是否有设置Range标记，服务端每次仅返回20条记录。在该例子中，服务端响应的内容范围标记如下：

Content-Range: 0-19/4125 或 Content-Range: 0-19/*

结果的过滤和排序

针对返回结果，还需考虑如何在服务端对数据进行过滤和排列，以及如何按指定的顺序对子数据进行检索。这些想法分别和分页思想、结果限制思想相结合，使用字符查询参数–过滤（filter）和排序（sort），将会达到神奇的效果。

再强调一次，过滤和排序都是复杂的操作，无需所有资源都默认提供支持。下文将介绍提供过滤和排序的资源类型。

过滤

在本案中，过滤被定义为“通过特定的标准确定必须要返回的数据，以此实现减少返回值的数量”。如果服务端支持复杂的比较运算符和复杂的标准匹配，过滤操作将变得相当复杂。它通常使用简单的等式，“starts-with”或包含比较，来保证返回值的完整性。

在我们开始讨论过滤的字符查询参数之前，必须要先明白为什么要使用单个参数而不是多个字符查询参数。最基本的，这将会减少参数名称冲突的可能。我们已经有offset、limit和sort（见下文）参数了。如果支持的话还会有jsonp、format标识符，或许还有after和before参数，这些都是在本文中提到过的字符查询参数。因此若在字符查询中使用越多参数，将越可能导致参数名冲突和覆盖，而使用单个过滤参数则会将可能减到最小。

此外，若过滤功能只要检查单个过滤参数是否存在，服务端的工作也会简单很多。如果查询需求的复杂度增加，单个参数将更具有灵活性，可以自己建立一套功能完整的查询语法（详见下文OData注释或http://www.odata.org）。

通过实现一组常见、公认的定义符，等价式可以被直接应用。用这些定义符来设置过滤查询参数的值，这些定义符所创建的参数名/值对能够更加容易地被服务端解析并提高数据查询性能。目前已有的定义符包括用来分隔每个过滤短语的竖线（“|”）和用来分隔参数名和值的双冒号（“::”）。这提供了一组足够唯一的定义符，既可适用于大部分情况，同时也创造了一种易读的字符查询参数。下面将用一个简单的例子来介绍它的用法。假设我们想要给名为“Todd”的用户们发送请求，他们住在丹佛，有着“Grand Poobah”之称。用字符查询实现的请求URI如下：

GET http://www.example.com/users?filter=“name::todd|city::denver|title::grand poobah”

双冒号（“::”）定义符将属性名及其相应的属性值区分开来，使得属性值能够包含空格，进而服务端能更容易地从属性值中解析出定义符。

注意到参数名/值对中的属性名和将会和服务端返回的属性名称进行匹配。

简单而有效。大小写敏感的问题要在个案基础上进行讨论，但总的来说，在不用关心大小写的情况下，过滤功能可以很好地运作。若参数名/值对中的属性值未知，你也可以用星号（“*”）来代替。

除了简单的等式和未知的比较关系之外，若想要进行更复杂的查询，你必须要引进运算符。在这种情况下，运算符本身也是属性值的一部分，能够被服务端解析，而不是变为属性名的一部分。当需要复杂的语言风格查询功能时，可参考Open Data Protocol (OData) Filter System Query Option说明中的查询概念（详见http://www.odata.org/documentation/uriconventions#FilterSystemQueryOption）。

排序

在我们看来，排序的定义是决定服务端返回所负载记录的顺序。换句话说也就是对响应负载中多条记录进行排序。

再次强调，这里的惯例是要简单做事。比较推荐的方法是用排序字符查询参数，它包含了一组定界的属性名。对每个属性名按升序排列，并为每个属性加上前缀破折号（“-”），按降序排列。用竖线（“|”）分隔每个属性名，和上文的过滤功能中参数名/值对的分隔操作一致。

举个例子，若我们想要按用户的姓（升序），名（升序）和雇佣时间（降序）进行检索，请求将是这样的：

GET http://www.example.com/users?sort=last_name|first_name|-hire_date

再次注意到属性名和将会和服务端返回所负载的属性名称进行匹配。此外，由于复杂性较高，只能基于需要的资源提供排序功能。如果需要的话可以在客户端对小的资源集合进行排列。

原文如下

Querying, Filtering and Pagination

For large data sets, limiting the amount of data returned is important from a band-width standpoint. But it’s also important from a UI processing standpoint as a UI often can only display a small portion of a huge data set. In cases where the dataset grows indefinitely, it’s helpful to limit the amount of data returned by default. For instance, in the case of Twitter returning a person’s tweets (via their home timeline), it returns up to 20 items unless otherwise specified in the request and even then will return a maximum of 200.

Aside from limiting the amount of data returned, we also need to consider how to “page” or scroll through that large data set if more than that first subset needs retrieval. This is referred to as pagination—creating “pages” of data, returning known sections of a larger list and being able to page “forward” and “backward” through that large data set. Additionally, we may want to specify the fields or properties of a resource to be included in the response, thereby limiting the amount of data that comes back and we eventually want to query for specific values and/ or sort the returned data.

There are combinations of two primary ways to limit query results and perform pagination. First, the indexing scheme is either page-oriented or item-oriented. In other words, incoming requests will specify where to begin returning data with either a “page” number, specifying a number of items per page, or specify a first and last item number directly (in a range) to return. In other words the two options are, “give me page 5 assuming 20 items per page” or “give me items 100 through 120.”

Service providers are split on how this should work. However, some UI tools, such as the Dojo JSON Datastore object, chooses to mimic the HTTP specifications use of byte ranges. It’s very helpful if your services support that right out of the box so no translation is necessary between your UI toolkit and back-end services.

The recommendations below support both the Dojo model for pagination, which is to specify the range of items being requested using the Range header, and utilization of query-string parameters. By supporting both, services are more flexible—usable from both advanced UI toolkits, like Dojo, as well as by simple, straight-forward links and anchor tags. It shouldn’t add much complexity to the development effort to support both options. However, if your services don’t support UI functionality directly, consider eliminating support for the Range header option.

It’s important to note that querying, filtering and pagination are not recommended for all services. This behavior is resource specific and should not be supported on all resources by default. Documentation for the services and resources should mention which end-points support these more complex capabilities.

Limiting Results

The “give me items 3 through 55” way of requesting data is more consistent with how the HTTP spec utilizes the Range header for bytes so we use that metaphor with the Range header. However, the “starting with item 2 give me a maximum of 20 items” is easier for humans to read, formulate and understand so we use that metaphor in supporting the query-string parameters.

As mentioned above, the recommendation is to support use of both the HTTP Range header plus querystring parameters, offset and limit, in our services to limit results in responses. Note that, given support for both options, the query-string parameters should override the Range header.

One of the first questions your going to ask is, “Why are we supporting two metaphors with these similar functions as the numbers in the requests will never match? Isn’t that confusing?” Um… That’s two questions. Well, to answer your question, it may be confusing. The thing is, we want to make things in the query-string especially clear, easily-understood, human readable and easy to construct and parse. The Range header, however, is more machine-based with usage dictated to us via the HTTP specification.

In short, the Range header items value must be parsed, which increases the complexity, plus the client side has to perform some computation in order to construct the request. Using the individual limit and offset parameters are easily-understood and created, usually without much demand on the human element.

Limiting via the Range Header

When a request is made for a range of items using a HTTP header instead of query-string parameters, include a Range header specifying the range as follows:

Range: items=0-24

Note that items are zero-based to be consistent with the HTTP specification in how it uses the Range header to request bytes. In other words, the first item in the dataset would be requested by a beginning range specifier of zero (0). The above request would return the first 25 items, assuming there were at least 25 items in the data set.

On the server side, inspect the Range header in the request to know which items to return. Once a Range header is determined to exist, it can be simply parsed using a regular expression (e.g. “items=(\d+)-(\d+)”) to retrieve the individual range values.

Limiting via Query-String Parameters

For the query-string alternative to the Range header, use parameter names of offset and limit, where offset is the beginning item number (matches the first digit in the items string for the Range header above) and limit is the maximum number of items to return. A request using query-string parameters that matches the example in the Range Header section above is:

GET http://api.example.com/resources?offset=0&limit=25

The offset value is zero-based, just like the items in the Range header. The value for limit is the maximum number of items to return. Services can impose their own default and maximum values for limit for when it’s not specified in the query string. But please document those “invisible” settings.

Note that when the query-string parameters are used, the values should override those provided in the Range header.

Range-Based Responses

For a range-based request, whether via Range HTTP header or query-string parameters, the server should respond with a Content-Range header to indicate how many items are being returned and how many total items exist yet to be retrieved:

Content-Range: items 0-24/66

Note that the total items available (e.g. 66 in this case) is not zero-based. Hence, requesting the last few items in this data set would return a Content-Range header as follows:

Content-Range: items 40-65/66

According to the HTTP specification, it is also valid to replace the total items available (66 in this case) with an asterisk (“*”) if the number of items is unknown at response time, or if the calculation of that number is too expensive. In this case the response header would look like this:

Content-Range: items 40-65/*

However, note that Dojo or other UI tools may not support this notation.

Pagination

The above response-limiting schemes works for pagination by allowing requesters to specify the items within a dataset in which they’re interested. Using the above example where 66 total items are available, retrieving the second “page” of data using a page size of 25 would use a Range header as follows:

Range: items=25-49

Via query-string parameters, this would be equivalent to:

GET …?offset=25&limit=25

Whereupon, the server (given our example) would return the data, along with a Content-Range header as follows:

Content-Range: 25-49/66

This is works great for most things. However, occasionally there are cases where item numbers don’t translate directly to rows in the data set. Also, for an extremely active data set where new items are regularly added to the top of the list, apparent “paging issues” with what look like duplicates can occur.

Date-ordered data sets are a common case like a Twitter feed. While you can still page through the data using item numbers, sometimes it’s more beneficial and understandable to use an “after” or “before” query-string parameter, optionally in conjunction with the Range header (or query-string parameters, offset and limit).

For example, to retrieve up to 20 remarks around a given timestamp:

GET http://www.example.com/remarks/home_timeline?after=<timestamp>
Range: items=0-19
GET http://www.example.com/remarks/home_timeline?before=<timestamp>
Range: items=0-19

Equivalently, using query-string parameters:

GET http://www.example.com/remarks/home_timeline?after=<timestamp>&offset=0&limit=20
GET http://www.example.com/remarks/home_timeline?before=<timestamp>&offset=0&limit=20

For timestamp formatting and handling in different cases, please see the Date Handling section below.

If a service returns a subset of data by default or a maximum number of arguments even when the requester does not set a Range header, have the server respond with a Content-Range header to communicate the limit to the client. For example, in the home_timeline example above, that service call may only ever return 20 items at a time whether the requester sets the Range header or not. In that case, the server should always respond with content range header such as:

Content-Range: 0-19/4125
or Content-Range: 0-19/*

Filtering and Sorting Results

Another consideration for affecting results is the act of filtering data and/or ordering it on the server, retrieving a subset of data and/or in a specified order. These concepts work in conjunction with pagination and results-limiting and utilize query-string parameters, filter and sort respectively, to do their magic.

Again, filtering and sorting are complex operations and don’t need to be supported by default on all resources. Document those resources that offer filtering and sorting.

Filtering

In this case, filtering is defined as reducing the number of results returned by specifying some criteria that must be met on the data before it is returned. Filtering can get quite complex if services support a complete set of comparison operators and complex criteria matching. However, it is quite often acceptable to keep things sane by supporting a simple equality, ‘starts-with’ or contains comparison.

Before we get started discussing what goes in the filter query-string parameter, it’s important to understand why a single parameter vs. multiple query-string parameters is used. Basically, it comes down to reducing the possibility of parameter name clashes. We’re already embracing the use of offset, limit, and sort (see below) parameters. Then there’s jsonp if you choose to support it, the format specifier and possibly after and before parameters. And that’s just the query-string parameters discussed in this document. The more parameters we use on the query-string the more possibilities we have to have name clashes or overlap. Using a single filter parameter minimizes that.

Plus, it’s easier from the server-side to determine if filtering functionality is requested by simply checking for the presence of that single filter parameter. Also, as complexity of your querying requirements increases, this single parameter option provides more flexibility in the future—for creating your own fully-functional query syntax (see OData comments below or at http://www.odata.org).

By embracing a set of common, accepted delimiters, equality comparison can be implemented in straight-forward fashion. Setting the value of the filter query-string parameter to a string using those delimiters creates a list of name/value pairs which can be parsed easily on the server-side and utilized to enhance database queries as needed. The delimiters that have worked as conventions are the vertical bar (“|”) to separate individual filter phrases and a double colon (“::”) to separate the names and values. This provides a unique-enough set of delimiters to support the majority of use cases and creates a userreadable query-string parameter. A simple example will serve to clarify the technique. Suppose we want to request users with the name “Todd” who live in Denver and have the title of “Grand Poobah”. The request URI, complete with query-string might look like this:

GET http://www.example.com/users?filter=“name::todd|city::denver|title::grand poobah”

The delimiter of the double colon (“::”) separates the property name from the comparison value, enabling the comparison value to contain spaces—making it easier to parse the delimiter from the value on the server.

Note that the property names in the name/value pairs match the name of the properties that would be returned by the service in the payload.

Simple but effective. Case sensitivity is certainly up for debate on a case-by-case basis, but in general, filtering works best when case is ignored. You can also offer wild-cards as needed using the asterisk (“*”) as the value portion of the name/value pair.

For queries that require more-than simple equality or wild-card comparisons, introduction of operators is necessary. In this case, the operators themselves should be part of the value and parsed on the server side, rather than part of the property name. When complex query-language-style functionality is needed, consider introducing query concept from the Open Data Protocol (OData) Filter System Query Option specification (see http://www.odata.org/documentation/uriconventions#FilterSystemQueryOption).

Sorting

For our purposes, sorting is defined as determining the order in which items in a payload are returned from a service. In other words, the sort order of multiple items in a response payload.

Again, convention here says to do something simple. The recommended approach is to utilize a sort query-string parameter that contains a delimited set of property names. Behavior is, for each property name, sort in ascending order, and for each property prefixed with a dash (“-”) sort in descending order. Separate each property name with a vertical bar (“|”), which is consistent with the separation of the name/value pairs in filtering, above.

For example, if we want to retrieve users in order of their last name (ascending), first name (ascending) and hire date (descending), the request might look like this:

GET http://www.example.com/users?sort=last_name|first_name|-hire_date

Note that again the property names match the name of the properties that would be returned by the service in the payload. Additionally, because of its complexity, offer sorting on a case-by-case basis for only resources that need it. Small collections of resources can be ordered on the client, if needed.