大数据标签数据校验_动态数据校验之 JSON Schema

最新推荐文章于 2023-12-22 03:05:34 发布

weixin_39525118

最新推荐文章于 2023-12-22 03:05:34 发布

阅读量262

点赞数

文章标签：大数据标签数据校验

本文链接：https://blog.csdn.net/weixin_39525118/article/details/113453087

版权

本文讲述了在应对线上服务非法请求时，通过引入 JSON Schema 和 ajv 库进行数据校验以提高效率和安全性。JSON Schema 用于定义数据结构和验证规则，确保数据的一致性和接口的安全，适用于接口校验、表单验证等场景。通过 ajv 实现了对请求参数的实时校验，避免非法数据进入业务逻辑，降低了CPU使用率。

摘要由CSDN通过智能技术生成

背景

一天, 线上 node 服务告警了, 经排查, 发现大量非法请求在扫我们的服务, 这些请求将正常 url 参数替换成各种试探参数, 有尝试 SQL 注入的, 有尝试 XSS 攻击的, 还有夹杂各种随机字符串的, 来势汹汹. 但这些请求在处理的过程中, 都被我们一步步的判断条件过滤掉了, 没有造成大问题. 唯一问题就是 CPU 使用率偏高, 因为我们没有在请求进来的第一个时间去做校验, 这些非法请求进入了我们的业务逻辑触发了一些不必要的运算.

为了不再让这些请求进入运算逻辑, 要为所有接口做参数校验, 对含有非法参数的请求直接返回, 这些参数通过 ctx.req.query 或 ctx.req.body 可以拿到, 一开始, 想着写个简单校验函数处理下, 但接口众多, 得写个通用的才行, 写着写着才发现要兼容所有情况还很麻烦, 比如深层嵌套的 object, 数组, 工作量挺大的.

本着不重复造轮子的原则, google 上一番搜索, 发现了 JSON Schema, 还有 JSON Schema 的 js 库 ajv, 很切合这个场景, 于是用 avj 在请求进来的第一时间, 对 ctx.req.query 和 ctx.req.body 做了校验, 节约了很多时间~

JSON Schema 可以用于接口的数据校验, 表单提交前的校验, 前后端同构的校验, 如果有中间数据处理层, 接收多方数据, 用 JSON Schema 来保证数据一致性也是一个比较好的实践.

JSON Schema 简介

JSON Schema 用来描述某个 JSON 数据应该有什么字段, 这些字段受哪些规则限制, 比如非空, 最大长度, 最小长度, 符合某个正则, 属于哪几个常量等.

本文会给大家介绍 JSON Schema 的基本概念, 各种常见数据类型的约束写法, 以及通过结合 ajv 库如何在 js 中使用 JSON Schema. 让我们开始吧.

先整体看一下 JSON Schema:

{  "type": "object",  "properties": {    "id": { "type": "integer" },    "name": { "type": "string" },    "phone": { "type": "string" },    "hobby": {        "type": "array",        "items": {            "type": "string"        }    }  },  "required": ["id", "name"]}

那么, 以下数据就符合上边的 schema:

{    "id": 0,    "name": "zhangsan",    "phone": "18814166666",    "hobby": ["coding", "music", "game"]}

如果没有 id 或者 name, 或者这些字段的类型不对, 这个数据就不能通过 schema 的校验了

JS 的 JSON Schema 库 ajv

ajv 是对 JSON Schema 支持最全的一个库, 性能在现有的库中也很优越, 排在第二位, 排第一位 djv 没有实现 JSON Schema 的最新特性, 而 ajv 与 djv 性能上很接近, 且 ajv star 数 6.7k, djv 才 236.

ajv 是最优的选择.

基本用法

const Ajv = require('ajv');const ajv = new Ajv()const isValidate = ajv.validate({ type: 'string' }, 123);if (!isValidate) {    console.log(ajv.errors);    console.log(ajv.errorsText(ajv.errors));}

输出:

[  {    keyword: 'type',    dataPath: '',    schemaPath: '#/type',    params: { type: 'string' },    message: 'should be string'  }]data should be string

用法简单明了, validate(schema, data), 只需要指定 schema 和 data 即可, schema 就是我们上面说到的 schema, data 就是被校验的数据了.

接下来我们来看 schema 都有哪些规则。

string

{    "type": "string",    "minLength": 1,    "maxLength": 100,}

以上 schema 表示必须为字符串类型, 最小长度为 1, 最大长度为 100. 此外, 还可以指定正则表达式来做匹配

{   "type": "string",   "pattern": "^\\w+$"}

以上正则表示只能由英文, 数字, 下划线组成

此外 string 类型还可以指定 JSON Schema 内置的类型, 如

{    "type": "string",    "format": "ipv4"}

该 scehma 限制内容为 ipv4, 此外还有如下内置类型:

format	含义
date-time	时间, 如 2019-12-08T13:19:35.327Z
email	邮件地址
hostname	主机名
uri	统一资源标识符, 如 url
regex	正则表达式

还有更多内置类型, 但不常用, 这里就不赘述, 感兴趣的可以看文后的参考链接

enum

{    "enum": ["shenzhen", "guangzhou", "beijing"]}

以上 schema, "shenzhen", "guangzhou", "beijing" 都满足规则, "" 或 "somewhere" 等则不满足规则

enum 不仅仅可以是字符串, 也可以是任意类型的, 如

{    "enum": [ 2, "foo", {"foo": "bar" }, [1, 2, 3] ]}

满足校验规则: 2, "foo", {"foo": "bar"}, [1, 2, 3]
不满足校验规则: 1, bar 等等

number

number 有两个类型, 一个是 integer, 一个是 number

{ "type": "integer" } 表示整数类型

{ "type": "number" } 表示数值类型, 可以是浮点数, 也可以是整数

{   "type": "number",   "multipleOf": 10, // 10 的倍数   "minimum": 10, // >= 10   "maximum": 100, // <= 100}

除了 minimum 和 maximum 之外, 还有 exclusiveMinimum 和 exclusiveMaximum, 后两者分别为最小值和最大值但不包含.

boolean

{    "type": "boolean"}

只能允许布尔值, 即 true 或 false, 如果是 "false" 或 0 则不符合校验规则.

null

{    "type": "null"}

类型为 null 的, 值只能为 null 了, 其它任何值都是不符合校验规则的.

object

{    "type": "object"}

如果只是指定上方的 type, 那么任何 js 中的对象字面量都能满足校验, 如:

{}

{    "key": "value",    "anotherKey": "anotherValue"}

但仅仅指定 object 类型, 就没有意义了, object 类型要配合以下配置使用

propertiesadditionalPropertiesrequiredpropertyNamesminPropertiesmaxPropertiesdependenciespatternProperties

接下来我们一一说明:

properties && additionalProperties && required

{  "type": "object",  "properties": {    "id": { "type": "integer" },    "name": { "type": "string" },    "phone": { "type": "string" }  },  "additionalProperties": false,  "required": ["id", "name"]}

如上 schema, properties 表示某个对象下, id, name 和 phone 这三个字段必须符合相应的校验规则, 即 integer, string, string, required 表示 id 和 name 这两个字段是必须出现的.

这里特别说明, 如果不指定 required 和 additionalProperties 的情况下, 即使 properties 声明了三个字段的校验规则, 被校验的对象字面量, 也可以不出现那三个字段, 或者出现其他的字段, 但一旦出现了那三个字段, 就必须符合校验规则, 否则校验不通过.

additionalProperties 表示是否可以出现 properties 之外的字段, 默认为 true

我们来看看数据

{    "id": 0,    "name": "zhangsan",}

以上数据是符合规则的

{    "id": 0,    "name": "zhangsan",    "someKey": "someValue" // 有多余的字段, additionalProperties 做了限制}

以上数据是不符合规则的, additionalProperties 做了限制, 不能有 someKey 这个字段

propertyNames

{    "type": "object",    "propertyNames": {        "pattern": "/^\w+$/"    }}

以上 schema 声明了某个对象的属性, 必须是英文, 字母或下划线组成, 如:

{    "some_key_00": "someValue"}

以上数据符合 schema

{    "$some_key_00": "someValue"}

以上数据不符合 schema, 因为包含了 $, 在英文, 字母或下划线要求之外.

minProperties & maxProperties

{  "type": "object",  "minProperties": 1,  "maxProperties": 3}

以上 schema 声明了某个对象, 最少得有 1 个字段, 最多只能有 3 个字段

dependencies

{  "type": "object",  "properties": {    "name": { "type": "string" },    "phone": { "type": "phone" },    "operator": { "type": "string" }  },  "dependencies": {    "phone": ["operator"]  }}

以上 schema 表示, 如果某个对象中出现了 phone 字段, 则必须出现 operator 字段, 即填写了手机号, 必须填写运营商. 但这只是单项依赖, 如果需要保证填写运营商, 必须填写手机号, 就得将 schema 声明如下:

{  "type": "object",  "properties": {    "name": { "type": "string" },    "phone": { "type": "phone" },    "operator": { "type": "string" }  },  "dependencies": {    "phone": ["operator"],    "operator": ["phone"]  }}

patternProperties

{  "type": "object",  "patternProperties": {    "^a_": { "type": "string" },    "^b_": { "type": "string" }  },  "additionalProperties": false}

上述 schema 要求某个对象字面量只能出现 a_xxx, b_xxx 这样的字段, 如:

{    "a_xxx": "1",    "b_xxx": "2",    "b_yyy": "3",}

patternProperties 和 properties 可以一起使用, 如:

{  "type": "object",  "properties": {      "key1": { "type": "string" }  },  "patternProperties": {    "^a_": { "type": "string" },    "^b_": { "type": "string" }  },  "additionalProperties": false}

那么以下数据是符合校验规则的:

{    "a_xxx": "1",    "b_xxx": "2",    "key1": "3",}

array

{    "type":"array"}

以上 schema 代表数组, 可以是空数组, 或者含有任何元素的数组

items && additionalItems && contains

{    "type":"array",    "items": {        "type": "number"    }}

以上 schema 代表数组, 且数组里的字段都必须为数值类型

{    "type":"array",    "contains": {        "type": "number"    }}

以上 schema 代表数组, 且数组里的字段只要有一个包含数值类型即可

{    "type": "array",    "items": [        {            "type": "string"        },        {            "type": "number"        },    ],    "additionalItems": false,}

items 的用法也可以限定数组的具体元素的类型, 上边的 schema 代表了数组的第一个元素必须为 string 类型, 第二个必须为 number 类型

minItems && maxItems && uniqueItems

{    "type": "array",    "minItems": 3,    "maxItems": 5,    "uniqueItems": true}

以上 schema 代表数组最少需要 3 个元素, 最多只能有 5 个元素, 且每个元素都必须是唯一的

组合校验allOf

{  "allOf": [    { "type": "string" },    { "maxLength": 6 }  ]}

以上 schema 代表要同时满足 string 类型的限制和最大长度为 6 的限制

anyOf

{  "anyOf": [    { "type": "string" },    { "type": "object" }  ]}

anyOf 代表满足任意一个或多个, 如 1, { a: 1} 都是满足要求的数据

oneOf

{  "oneOf": [    { "type": "string", "maxLength": 3 },    { "type": "string", "maxLength": 5 }  ]}

oneOf 和 anyOf 不一样, oneOf 只能满足其中一项, 不能同时满足多项, 以上 schema, 字符串 "abc" 是不符合校验规则的, 因为同时满足了两项规则, "abcd" 是符合规则的

not

{    "not": {        "type": "number"    }}

以上 schema 代表除了数值, 其它类型都满足校验规则

复杂的 schema 组织方式

考虑一个场景, 我们实现定义了一个 person.json 和 fatherAndSon.json, 如下:

person.json:

{    "definitions": {        "base": {            "type": "object",            "properties": {                "firstName": {                    "type": "string"                },                "lastName": {                    "type": "string"                },                "phone": {                    "type": "string"                }            },            "required": [                "firstName",                "lastName"            ]        }    }}

fatherAndSon.json

{    "type": "object",    "properties": {        "father": {            "$ref": "person#/definitions/base"        },        "son": {            "$ref": "person#/definitions/base"        }    }}

可以看到, 这里用了 $ref 来引用 person.json 定义的 base schema, ajv 中是怎么把这两个文件串在一起做校验的呢? 我们来看代码:

const Ajv = require('ajv');const ajv = new Ajv()const personSchema = require('./person.json');const fatherAndSonSchema = require('./fatherAndSon.json');ajv.addSchema(personSchema, 'person');ajv.addSchema(fatherAndSonSchema, 'fatherAndSon');const isValidate = ajv.validate('fatherAndSon', {    "father": {        "firstName": "zhang",    },    "son": {        "firstName": "zhang",        "lastName": "sanfeng"    }})if (!isValidate) {    console.log(ajv.errors);    console.log(ajv.errorsText(ajv.errors))}

ajv.addSchema(personSchema, 'person') addSchema 的第二个参数就是 $ref 中 person 的定义.

我们来看输出:

[  {    keyword: 'required',    dataPath: '.father',    schemaPath: 'person#/definitions/base/required',    params: { missingProperty: 'lastName' },    message: "should have required property 'lastName'"  }]data.father should have required property 'lastName'

小结

本文介绍了 JSON Schema 和 JSON Schema 的 JS 库 ajv, 并介绍了一些常用的用法. JSON Schema 在前端和后端的校验中, 都能发挥很大的作用, 我们甚至可以写一份 Schema, 用在各个端上来维持数据的唯一性. 极大地提高了校验的效率.

还在等什么, 快来试一试吧~

参考资料

Understanding JSON Schema: https://json-schema.org/understanding-json-schema/index.html
ajv: https://github.com/epoberezkin/ajv

最后