0.0973585?探究ts_rank的score为什么这么低

        最近在使用postgres利用ts_rank进行排序找到最符合关键词要求得内容时发现: 即使是相似的内容,得分也是非常非常得低(其中一个case是0.0973585)。看起来很奇怪,非常不可行。于是我又做了一个简单测的测试:

SELECT ts_rank(to_tsvector('english', 'skirt'), to_tsquery('skirt'));

        讲道理,这已经是完全匹配得内容了,预期得分应该非常高(如果按匹配度来看应该有1了),但实际得rank分数却是非常得低:

        这就引出了一个问题:为什么分数这么低?这么低的分数是不是对的?ts_rank时得计算逻辑是怎样的?带着这个疑问,我们深入探究一下。

postgres全文搜索

ts_vector

        ts_vector是postgres中用于全文检索的主要结构,在这个结构中,会将文本信息转换为词位 + 位置信息的格式。

SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog'); -- 结果: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

ts_query

表示搜索的查询条件

SELECT to_tsquery('english', 'jumping & quick'); -- 结果: 'jump' & 'quick'

全文搜索流程

        基于上面的示例,我们来看一下这里全文搜索的流程。整个处理过程如上图所示。

预处理阶段

        预处理阶段首先会对文本内容进行分词,得到一个个token,然后会对这里的分词结果进行一系列标准化操作,比如转小写、移除一些停用词、词干提取等。再之后,就会生成包含词位和位置信息的ts_vector。之后,为了加速检索,还可以为这部分内容建立gin索引。

查询阶段

        查询时会先对查询内容进行解析,然后生成对应的ts_query结构,最后执行索引匹配,获取相关相关性评分,最终得到一个最终的评分结果。前文提到的ts_rank发生在相关性评分处。

ts_rank

        calc_rank是ts_rank的核心方法,它根据输入的权重 (weights)、文本向量 (TSVector)、查询 (TSQuery) 和指定的归一化方法 (method) 来计算相关性评分。顺着源码来看下

calc_rank(const float *w, TSVector t, TSQuery q, int32 method)
{
        QueryItem  *item = GETQUERY(q);
        float                res = 0.0;
        int                        len;

        if (!t->size || !q->size)
                return 0.0;

        /* XXX: What about NOT? */
        res = (item->type == QI_OPR && (item->q
相同要求 修改下列语句 :CREATE TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG" ( "F_ID" VARCHAR(100) NOT NULL, "DB_CONFIG_ID" VARCHAR(100) NOT NULL, "BIND_FIELD" VARCHAR(100), "PARAM_TYPE" VARCHAR(50), "DESCRIPTION" VARCHAR(500), "SORT_TYPE" VARCHAR(20), "STATUS" INT DEFAULT 0, "SORT_CODE" INT DEFAULT 0, "CREATE_BY" VARCHAR(100), "CREATE_TIME" TIMESTAMP DEFAULT CURRENT_TIMESTAMP, "UPDATE_BY" VARCHAR(100), "UPDATE_TIME" TIMESTAMP, "DELETE_MARK" INT DEFAULT 0, PRIMARY KEY ("F_ID") ); COMMENT ON TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG" IS '外源-排序字段配置表'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."F_ID" IS '主键ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."DB_CONFIG_ID" IS '数据库配置表ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."BIND_FIELD" IS '绑定字段'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."PARAM_TYPE" IS '参数类型'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."DESCRIPTION" IS '描述'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."SORT_TYPE" IS '排序类型 ASC:升序 DESC:降序'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."STATUS" IS '状态 0:正常 1:禁用'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."SORT_CODE" IS '排序字段'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."CREATE_BY" IS '创建人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."CREATE_TIME" IS '创建时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."UPDATE_BY" IS '更新人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."UPDATE_TIME" IS '更新时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_SORT_FIELD_CONFIG"."DELETE_MARK" IS '删除标记;1:删除 0:未删除'; -- 外源-查询字段配置表 CREATE TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG" ( "F_ID" VARCHAR(100) NOT NULL, "DB_CONFIG_ID" VARCHAR(100) NOT NULL, "PARAM_NAME" VARCHAR(100) NOT NULL, "BIND_FIELD" VARCHAR(100), "PARAM_TYPE" VARCHAR(50), "OPERATOR" VARCHAR(20), "IS_REQUIRED" INT DEFAULT 0, "DESCRIPTION" VARCHAR(500), "IS_PAGINATION" INT DEFAULT 0, "STATUS" INT DEFAULT 0, "SORT_CODE" INT DEFAULT 0, "CREATE_BY" VARCHAR(100), "CREATE_TIME" TIMESTAMP DEFAULT CURRENT_TIMESTAMP, "UPDATE_BY" VARCHAR(100), "UPDATE_TIME" TIMESTAMP , "DELETE_MARK" INT DEFAULT 0, PRIMARY KEY ("F_ID") ); COMMENT ON TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG" IS '外源-查询字段配置表'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."F_ID" IS '主键ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."DB_CONFIG_ID" IS '数据库配置表ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."PARAM_NAME" IS '参数名称'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."BIND_FIELD" IS '绑定字段'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."PARAM_TYPE" IS '参数类型'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."OPERATOR" IS '操作符'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."IS_REQUIRED" IS '是否必填 0:否 1:是'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."DESCRIPTION" IS '描述'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."IS_PAGINATION" IS '是否为分页设置 0:否 1:是'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."STATUS" IS '状态 0:正常 1:禁用'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."SORT_CODE" IS '排序字段'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."CREATE_BY" IS '创建人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."CREATE_TIME" IS '创建时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."UPDATE_BY" IS '更新人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."UPDATE_TIME" IS '更新时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_QUERY_FIELD_CONFIG"."DELETE_MARK" IS '删除标记;1:删除 0:未删除'; -- 外源-返回字段配置表 CREATE TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG" ( "F_ID" VARCHAR(100) NOT NULL, "DB_CONFIG_ID" VARCHAR(100) NOT NULL, "PARAM_NAME" VARCHAR(100) NOT NULL, "BIND_FIELD" VARCHAR(100), "PARAM_TYPE" VARCHAR(50), "DESCRIPTION" VARCHAR(500), "STATUS" INT DEFAULT 0, "SORT_CODE" INT DEFAULT 0, "CREATE_BY" VARCHAR(100), "CREATE_TIME" TIMESTAMP DEFAULT CURRENT_TIMESTAMP, "UPDATE_BY" VARCHAR(100), "UPDATE_TIME" TIMESTAMP , "DELETE_MARK" INT DEFAULT 0, PRIMARY KEY ("F_ID") ); COMMENT ON TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG" IS '外源-返回字段配置表'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."F_ID" IS '主键ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."DB_CONFIG_ID" IS '数据库配置表ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."PARAM_NAME" IS '参数名称'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."BIND_FIELD" IS '绑定字段'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."PARAM_TYPE" IS '参数类型'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."DESCRIPTION" IS '描述'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."STATUS" IS '状态 0:正常 1:禁用'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."SORT_CODE" IS '排序字段'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."CREATE_BY" IS '创建人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."CREATE_TIME" IS '创建时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."UPDATE_BY" IS '更新人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."UPDATE_TIME" IS '更新时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_FIELD_CONFIG"."DELETE_MARK" IS '删除标记;1:删除 0:未删除'; -- 外源-返回参数配置表 CREATE TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG" ( "F_ID" VARCHAR(100) NOT NULL, "DB_CONFIG_ID" VARCHAR(100) NOT NULL, "PARAM_NAME" VARCHAR(100) NOT NULL, "PARAM_CODE" VARCHAR(100), "DATA_TYPE" VARCHAR(50), "PARSE_METHOD" VARCHAR(100), "PARAM_DESC" VARCHAR(500), "STATUS" INT DEFAULT 0, "SORT_CODE" INT DEFAULT 0, "CREATE_BY" VARCHAR(100), "CREATE_TIME" TIMESTAMP DEFAULT CURRENT_TIMESTAMP, "UPDATE_BY" VARCHAR(100), "UPDATE_TIME" TIMESTAMP , "DELETE_MARK" INT DEFAULT 0, PRIMARY KEY ("F_ID") ); COMMENT ON TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG" IS '外源-返回参数配置表'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."F_ID" IS '主键ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."DB_CONFIG_ID" IS '数据库配置表ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."PARAM_NAME" IS '参数名称'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."PARAM_CODE" IS '参数编码'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."DATA_TYPE" IS '数据类型'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."PARSE_METHOD" IS '解析方式'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."PARAM_DESC" IS '参数说明'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."STATUS" IS '状态 0:正常 1:禁用'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."SORT_CODE" IS '排序字段'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."CREATE_BY" IS '创建人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."CREATE_TIME" IS '创建时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."UPDATE_BY" IS '更新人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."UPDATE_TIME" IS '更新时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_RETURN_PARAM_CONFIG"."DELETE_MARK" IS '删除标记;1:删除 0:未删除'; -- 外源-状态码配置表 CREATE TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG" ( "F_ID" VARCHAR(100) NOT NULL, "DB_CONFIG_ID" VARCHAR(100) NOT NULL, "STATUS_FIELD" VARCHAR(100), "STATUS_VALUE" VARCHAR(100), "STATUS_CATEGORY" VARCHAR(50), "STATUS_DESC" VARCHAR(500), "STATUS" INT DEFAULT 0, "SORT_CODE" INT DEFAULT 0, "CREATE_BY" VARCHAR(100), "CREATE_TIME" TIMESTAMP DEFAULT CURRENT_TIMESTAMP, "UPDATE_BY" VARCHAR(100), "UPDATE_TIME" TIMESTAMP , "DELETE_MARK" INT DEFAULT 0, PRIMARY KEY ("F_ID") ); COMMENT ON TABLE "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG" IS '外源-状态码配置表'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."F_ID" IS '主键ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."DB_CONFIG_ID" IS '数据库配置表ID'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."STATUS_FIELD" IS '状态码字段'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."STATUS_VALUE" IS '状态码值'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."STATUS_CATEGORY" IS '状态码类别'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."STATUS_DESC" IS '状态码说明'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."STATUS" IS '状态 0:正常 1:禁用'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."SORT_CODE" IS '排序字段'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."CREATE_BY" IS '创建人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."CREATE_TIME" IS '创建时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."UPDATE_BY" IS '更新人'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."UPDATE_TIME" IS '更新时间'; COMMENT ON COLUMN "LX_ZW_DATASHARE"."TS_EXTERNAL_STATUS_CODE_CONFIG"."DELETE_MARK" IS '删除标记;1:删除 0:未删除';
最新发布
09-24
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

风生水气

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值