前面已经讲解了greenplum与gptext的安装方式,下面对gptext进行使用进行讲解。
greenplum的安装:https://blog.csdn.net/weixin_43315211/article/details/87934840
gptext的安装:https://blog.csdn.net/weixin_43315211/article/details/86488751
1、示例表结构
traffic=# select * from people_info limit 5;
id | id_number | name | birthday | gender | phone | birth_place
----+--------------------+--------+----------+--------+-------------+--------------------
3 | 440981186302229802 | 苏谢曹 | 18630222 | M | 15594777587 | 广东省茂名市高州市
7 | 220105188308148382 | 毕蒋华 | 18830814 | M | 18852544751 | 吉林省长春市二道区
11 | 450327191912085209 | 柳金屈 | 19191208 | M | 15253025503 | 广西省桂林市灌阳县
15 | 140902190406219970 | 任苏谈 | 19040621 | F | 13763393746 | 山西省忻州市忻府区
19 | 511802197904181641 | 计明曹 | 19790418 | M | 15745568282 | 四川省雅安市雨城区
(5 rows)
2、创建空的index
gptext.create_index(<schema_name>, <table_name>, <id_col_name>,
<def_search_col_name> [, <if_check_id_uniqueness>]);
SELECT * FROM gptext.create_index('public','people_info','id','name');
3、查询索引状态
traffic=# SELECT * FROM gptext.index_status('traffic.public.people_info');
content_id | index_name | shard_name | shard_state | replica_name | replica_state | core | node_name | base_url | is_leader | partitioned | extern
al_index
------------+----------------------------+------------+-------------+--------------+---------------+-----------------------------------------------+-----------------+------------------------+-----------+-------------+-------
---------
0 | traffic.public.people_info | shard0 | active | core_node3 | active | traffic.public.people_info_shard0_replica_n1 | sdw2:18987_solr | http://sdw2:18987/solr | f | f | f
0 | traffic.public.people_info | shard0 | active | core_node5 | active | traffic.public.people_info_shard0_replica_n2 | sdw2:18988_solr | http://sdw2:18988/solr | t | f | f
1 | traffic.public.people_info | shard1 | active | core_node7 | active | traffic.public.people_info_shard1_replica_n4 | sdw1:18987_solr | http://sdw1:18987/solr | t | f | f
1 | traffic.public.people_info | shard1 | active | core_node9 | active | traffic.public.people_info_shard1_replica_n6 | sdw1:18988_solr | http://sdw1:18988/solr | f | f | f
2 | traffic.public.people_info | shard2 | active | core_node11 | active | traffic.public.people_info_shard2_replica_n8 | sdw2:18987_solr | http://sdw2:18987/solr | f | f | f
2 | traffic.public.people_info | shard2 | active | core_node13 | active | traffic.public.people_info_shard2_replica_n10 | sdw2:18988_solr | http://sdw2:18988/solr | t | f | f
3 | traffic.public.people_info | shard3 | active | core_node15 | active | traffic.public.people_info_shard3_replica_n12 | sdw1:18987_solr | http://sdw1:18987/solr | t | f | f
3 | traffic.public.people_info | shard3 | active | core_node16 | active | traffic.public.people_info_shard3_replica_n14 | sdw1:18988_solr | http://sdw1:18988/solr | f | f | f
(8 rows)
4、自定义索引(可选)
gptext-config edit -i traffic.public.people_info -f managed-schema -e vim
-i 指定修改的索引名
-f 修改的文件名
-e 打开文件的方式(可选)
stored:是否进行存储,默认false
type:分词的规则,默认text_intl
5、填充索引
traffic=# SELECT * FROM gptext.index(TABLE(SELECT * FROM "public".people_info),'traffic.public.people_info');
dbid | num_docs
------+----------
5 | 3751
3 | 3749
4 | 3750
2 | 3750
(4 rows)
6、提交索引
traffic=# SELECT * FROM gptext.commit_index('traffic.public.people_info');
commit_index
--------------
t
(1 row)
7、查询
对默认字段(name)单独查询:
SELECT * FROM gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'traffic.public.people_info','曹魏柏', null,'rows=2') ORDER BY score desc;
id | score
------+----------
4 | 10.359904
124 | 3.8320003
50 | 3.3604743
10 | 3.2373455
19 | 3.227208
3 | 3.227208
17 | 3.2214155
9 | 3.2214155
关联查询:
SELECT a.id,a.name,a.birth_place,q.score FROM people_info a, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'traffic.public.people_info',
'曹魏柏', null,'rows=2') q WHERE a.id = q.id::INT ORDER BY q.score DESC;
id name birth_place score
4 曹魏柏 浙江省宁波市镇海区 10.359904
124 柏蒋唐 广西省北海市北海 3.8320003
50 曹秦金 河南省洛阳市吉利区 3.3604743
10 酆豫魏 云南省昆明市东川区 3.2373455
19 计明曹 四川省雅安市雨城区 3.227208
3 苏谢曹 广东省茂名市高州市 3.227208
17 云凤曹 黑龙江省牡丹江市阳明区 3.2214155
9 曹接郎 福建省厦门市海沧区 3.2214155
查看分词效果:
SELECT * FROM gptext.analyzer('traffic.public.people_info','query','text_zhsmart','曹魏柏');
class tokens
HMMChineseTokenizer {{"曹"},{"魏"},{"柏"}}
LowerCaseFilter {{"曹"},{"魏"},{"柏"}}
StopFilter {{"曹"},{"魏"},{"柏"}}
SetKeywordMarkerFilter {{"曹"},{"魏"},{"柏"}}
PorterStemFilter {{"曹"},{"魏"},{"柏"}}
指定字段查询
SELECT a.id,a.name,a.birth_place,q.score FROM people_info a, gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'traffic.public.people_info',
'birth_place:湖南省郴州市北湖区', null) q WHERE a.id = q.id::INT ORDER BY q.score DESC;
id name birth_place score
1095 赵接冯 湖南省郴州市北湖区 20.21499
847 云孔奚 湖南省郴州市北湖区 19.68612
10043 毕邹项 湖南省郴州市北湖区 19.68612
13056 郝毛豫 湖南省郴州市北湖区 19.52099
924 皮金屈 湖南省郴州市北湖区 19.52099
10718 齐豫茅 湖南省郴州市北湖区 19.52099
9896 史酆袁 湖南省郴州市永兴县 10.6771965
12964 邹施柏 湖南省郴州市永兴县 10.6771965
14210 潘狄飞 湖南省郴州市桂东县 10.159038
14861 金豫王 湖南省郴州市桂东县 10.159038
10692 汪贝王 湖南省郴州市郴州 10.159038
8877 花葛郑 湖南省郴州市桂东县 10.159038
进行关联查询时,会消耗大量的资源去原表进行关联,影响查询效率,可直接返回存储字段。
SELECT * FROM gptext.search(TABLE(SELECT 1 SCATTER BY 1), 'traffic.public.people_info',
'birth_place:重庆渝北', null,'fl=id,name') ORDER BY score desc
id score rf
5298 13.248789 {"columnValue":[{"name":"id","value":"5298"},{"name":"name","value":"葛臧秦"}]}
1787 13.150803 {"columnValue":[{"name":"id","value":"1787"},{"name":"name","value":"卫云唐"}]}
8774 9.434903 {"columnValue":[{"name":"id","value":"8774"},{"name":"name","value":"史三严"}]}
14795 9.434903 {"columnValue":[{"name":"id","value":"14795"},{"name":"name","value":"严地分"}]}