-
在CDH上安装部署
Key-Value Store Indexer
和Solr
服务。 -
打开hbase的复制功能
#在CDH上打开hbase的复制功能 hbase.replication=true #对于已经存在的表,后台进入hbase的shell窗口修改列族属性 disable 'hbase_solr' alter 'hbase_solr', {NAME => 'cf', REPLICATION_SCOPE => 1} enable 'hbase_solr'
-
在solr所在机器上生成配置目录
solrctl instancedir --generate /opt/solr/hbase_packet_indexer
在生成的配置文件的conf下,修改schemal.xml文件,在id属性下面添加以下字段,其中name为Hbase中的vin码映射到Solr中的索引值。
<field name="index_packet_vin" type="string" indexed="true" stored="true" required="true" multiValued="false" />
-
初始化collection实例
solrctl instancedir --create packet_index /opt/solr/hbase_packet_indexer #删除实例命令 solrctl instancedir --delete packet_17691_index
-
创建collection
solrctl collection --create packet_index -s 1 -c packet_index #删除collection solrctl collection --delete packet_17691_index
-
CDH添加morphlines配置
在Key-Value Store Indexer Morphlines 文件配置中添加以下配置,不同的表通过morphlineId来区分。
SOLR_LOCATOR : { # collection : realinfo_17691_index zkHost : "$ZK_HOST" } morphlines : [ { id : morphline_realinfo_17691 importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"] commands : [ { extractHBaseCells { mappings : [ { inputColumn : "cf:VIN" outputField : "index_realinfo_17691_vin" type : string source : value } ] } } { convertTimestamp { field : createTime inputFormats : ["yyyy-MM-dd HH:mm:ss"] inputTimezone : Asia/Shanghai outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" outputTimezone : Asia/Shanghai } } { logDebug { format : "output record: {}", args : ["@{}"] } } ] }, { id : morphline_realinfo importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"] commands : [ { extractHBaseCells { mappings : [ { inputColumn : "cf:VIN" outputField : "index_realinfo_vin" type : string source : value } ] } } { convertTimestamp { field : createTime inputFormats : ["yyyy-MM-dd HH:mm:ss"] inputTimezone : Asia/Shanghai outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" outputTimezone : Asia/Shanghai } } { logDebug { format : "output record: {}", args : ["@{}"] } } ] }, { id : morphline_packet importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"] commands : [ { extractHBaseCells { mappings : [ { inputColumn : "cf:VIN" outputField : "index_packet_vin" type : string source : value } ] } } { convertTimestamp { field : createTime inputFormats : ["yyyy-MM-dd HH:mm:ss"] inputTimezone : Asia/Shanghai outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" outputTimezone : Asia/Shanghai } } { logDebug { format : "output record: {}", args : ["@{}"] } } ] }, { id : morphline_packet_17691 importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"] commands : [ { extractHBaseCells { mappings : [ { inputColumn : "cf:VIN" outputField : "index_packet_17691_vin" type : string source : value } ] } } { convertTimestamp { field : createTime inputFormats : ["yyyy-MM-dd HH:mm:ss"] inputTimezone : Asia/Shanghai outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" outputTimezone : Asia/Shanghai } } { logDebug { format : "output record: {}", args : ["@{}"] } } ] } ]
-
修改morphlines文件
morphline-hbase-mapper.xml文件
<?xml version="1.0"?> <indexer table="#Hbase表名" mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper"> <!-- The relative or absolute path on the local file system to the morphline configuration file. --> <!-- Use relative path "morphlines.conf" for morphlines managed by Cloudera Manager --> <param name="morphlineFile" value="morphlines.conf"/> <!-- The optional morphlineId identifies a morphline if there are multiple morphlines in morphlines.conf --> <param name="morphlineId" value="morphline_packet"/> </indexer>
morphlines.conf
SOLR_LOCATOR : { zkHost : "$ZK_HOST" } morphlines : [ { id : morphline_packet importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"] commands : [ { extractHBaseCells { mappings : [ { inputColumn : "cf:VIN" outputField : "index_packet_vin" type : string source : value } ] } } { convertTimestamp { field : createTime inputFormats : ["yyyy-MM-dd HH:mm:ss"] inputTimezone : Asia/Shanghai outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'" outputTimezone : Asia/Shanghai } } { logDebug { format : "output record: {}", args : ["@{}"] } } ] } ]
-
创建索引
hbase-indexer add-indexer -n packet_indexer -c /opt/solr/hbase_packet_indexer/morphline-hbase-mapper.xml -z xxx.xxx.xxx.xx -cp solr.zk=xxx.xxx.xxx.xx:xxxx/solr -cp solr.collection=packet_index #删除hbase-indexer hbase-indexer delete-indexer --name 'realinfo_17691_indexer' -z 172.25.8.78
-
将历史数据映射到solr中
sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/hbase-indexer-mr-1.5-cdh5.16.1-job.jar --conf /etc/hbase/conf/hbase-site.xml --zk-host 172.25.8.78:2181/solr --hbase-indexer-file /opt/solr/hbase_realinfo_17691_indexer/morphline-hbase-mapper.xml --collection realinfo_17691_index --morphline-file /opt/solr/hbase_realinfo_17691_indexer/morphlines.conf --hbase-indexer-name realinfo_17691_indexer --reducers 0
CDH-Hbase多表基于solr创建二级索引
最新推荐文章于 2024-01-25 01:55:24 发布