GPtext的基本应用

前面已经讲解了greenplum与gptext的安装方式,下面对gptext进行使用进行讲解。
greenplum的安装:https://blog.csdn.net/weixin_43315211/article/details/87934840
gptext的安装:https://blog.csdn.net/weixin_43315211/article/details/86488751

1、示例表结构

traffic=# select * from people_info limit 5;
 id |     id_number      |  name  | birthday | gender |    phone    |    birth_place     
----+--------------------+--------+----------+--------+-------------+--------------------
  3 | 440981186302229802 | 苏谢曹 | 18630222 | M      | 15594777587 | 广东省茂名市高州市
  7 | 220105188308148382 | 毕蒋华 | 18830814 | M      | 18852544751 | 吉林省长春市二道区
 11 | 450327191912085209 | 柳金屈 | 19191208 | M      | 15253025503 | 广西省桂林市灌阳县
 15 | 140902190406219970 | 任苏谈 | 19040621 | F      | 13763393746 | 山西省忻州市忻府区
 19 | 511802197904181641 | 计明曹 | 19790418 | M      | 15745568282 | 四川省雅安市雨城区
(5 rows)

2、创建空的index

gptext.create_index(<schema_name>, <table_name>, <id_col_name>, 
    <def_search_col_name> [, <if_check_id_uniqueness>]);
    
SELECT * FROM gptext.create_index('public','people_info','id','name');

3、查询索引状态

traffic=# SELECT * FROM gptext.index_status('traffic.public.people_info');
 content_id |         index_name         | shard_name | shard_state | replica_name | replica_state |                     core                      |    node_name    |        base_url        | is_leader | partitioned | extern
al_index 
------------+----------------------------+------------+-------------+--------------+---------------+-----------------------------------------------+-----------------+------------------------+-----------+-------------+-------
---------
          0 | traffic.public.people_info | shard0     | active      | core_node3   | active        | traffic.public.people_info_shard0_replica_n1  | sdw2:18987_solr | http://sdw2:18987/solr | f         | f           | f
          0 | traffic.public.people_info | shard0     | active      | core_node5   | active        | traffic.public.people_info_shard0_replica_n2  | sdw2:18988_solr | http://sdw2:18988/solr | t         | f           | f
          1 | traffic.public.people_info | shard1     | active      | core_node7   | active        | traffic.public.people_info_shard1_replica_n4  | sdw1:18987_solr | http://sdw1:18987/solr | t         | f           | f
          1 | traffic.public.people_info | shard1     | active      | core_node9   | active        | traffic.public.people_info_shard1_replica_n6  | sdw1:18988_solr | http://sdw1:18988/solr | f         | f           | f
          2 | traffic.public.people_info | shard2     | active      | core_node11  | active        | traffic.public.people_info_shard2_replica_n8  | sdw2:18987_solr | http://sdw2:18987/solr | f         | f           | f
          2 | traffic.public.people_info | shard2     | active      | core_node13  | active        | traffic.public.people_info_shard2_replica_n10 | sdw2:18988_solr | http://sdw2:18988/solr | t         | f           | f
          3 | traffic.public.people_info | shard3     | active      | core_node15  | active        | traffic.public.people_info_shard3_replica_n12 | sdw1:18987_solr | http://sdw1:18987/solr | t         | f           | f
          3 | traffic.public.people_info | shard3     | active      | core_node16  | active        | traffic.public.people_info_shard3_replica_n14 | sdw1:18988_solr | http://sdw1:18988/solr | f         | f           | f
(8 rows)

4、自定义索引(可选)

gptext-config edit -i traffic.public.people_info -f managed-schema -e vim
	-i 指定修改的索引名
	-f 修改的文件名
	-e 打开文件的方式(可选)

在这里插入图片描述
stored:是否进行存储,默认false
type:分词的规则,默认text_intl

5、填充索引

traffic=# SELECT * FROM gptext.index(TABLE(SELECT * FROM "public".people_info),'traffic.public.people_info');
 dbid | num_docs 
------+----------
    5 |     3751
    3 |     3749
    4 |     3750
    2 |     3750
(4 rows)

6、提交索引

traffic=# SELECT * FROM gptext.commit_index('traffic.public.people_info');
 commit_index 
--------------
 t
(1 row)

7、查询

对默认字段(name)单独查询:

SELECT * FROM gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'traffic.public.people_info','曹魏柏', null,'rows=2') ORDER BY score desc;
id    | score
------+----------
4     |    10.359904		
124   |	3.8320003		
50    |	3.3604743		
10    |	3.2373455		
19    |	3.227208		
3	  |    3.227208		
17    |	3.2214155		
9     |	3.2214155		

关联查询:

SELECT a.id,a.name,a.birth_place,q.score FROM people_info a, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'traffic.public.people_info',
       '曹魏柏', null,'rows=2') q WHERE a.id = q.id::INT ORDER BY q.score DESC;
id  name    birth_place         score
 4	曹魏柏	浙江省宁波市镇海区	10.359904
124	柏蒋唐	广西省北海市北海	    3.8320003
50	曹秦金	河南省洛阳市吉利区	3.3604743
10	酆豫魏	云南省昆明市东川区	3.2373455
19	计明曹	四川省雅安市雨城区	3.227208
3	苏谢曹	广东省茂名市高州市	3.227208
17	云凤曹	黑龙江省牡丹江市阳明区	3.2214155
9	曹接郎	福建省厦门市海沧区	3.2214155

查看分词效果:

SELECT * FROM gptext.analyzer('traffic.public.people_info','query','text_zhsmart','曹魏柏');
class					tokens
HMMChineseTokenizer		{{"曹"},{"魏"},{"柏"}}
LowerCaseFilter			{{"曹"},{"魏"},{"柏"}}
StopFilter				{{"曹"},{"魏"},{"柏"}}
SetKeywordMarkerFilter	{{"曹"},{"魏"},{"柏"}}
PorterStemFilter		{{"曹"},{"魏"},{"柏"}}

指定字段查询

SELECT a.id,a.name,a.birth_place,q.score FROM people_info a, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'traffic.public.people_info',
       'birth_place:湖南省郴州市北湖区', null) q WHERE a.id = q.id::INT ORDER BY q.score DESC;
    id      name    birth_place         score
    1095	赵接冯	湖南省郴州市北湖区	20.21499
    847	    云孔奚	湖南省郴州市北湖区	19.68612
    10043	毕邹项	湖南省郴州市北湖区	19.68612
    13056	郝毛豫	湖南省郴州市北湖区	19.52099
    924	    皮金屈	湖南省郴州市北湖区	19.52099
    10718	齐豫茅	湖南省郴州市北湖区	19.52099
    9896	史酆袁	湖南省郴州市永兴县	10.6771965
    12964	邹施柏	湖南省郴州市永兴县	10.6771965
    14210	潘狄飞	湖南省郴州市桂东县	10.159038
    14861	金豫王	湖南省郴州市桂东县	10.159038
    10692	汪贝王	湖南省郴州市郴州	10.159038
    8877	花葛郑	湖南省郴州市桂东县	10.159038

进行关联查询时,会消耗大量的资源去原表进行关联,影响查询效率,可直接返回存储字段。

SELECT * FROM gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'traffic.public.people_info',
       'birth_place:重庆渝北', null,'fl=id,name') ORDER BY score desc
id      score                   rf
5298	13.248789		{"columnValue":[{"name":"id","value":"5298"},{"name":"name","value":"葛臧秦"}]}
1787	13.150803		{"columnValue":[{"name":"id","value":"1787"},{"name":"name","value":"卫云唐"}]}
8774	9.434903		{"columnValue":[{"name":"id","value":"8774"},{"name":"name","value":"史三严"}]}
14795	9.434903		{"columnValue":[{"name":"id","value":"14795"},{"name":"name","value":"严地分"}]}
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值