先下载 ES 容器,我下的是 6.7.0 版本
docker pull elasticsearch:6.7.0
启动 ES 容器
docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.7.0
如果启动过,直接 restart
docker restart 893
进入 ES 容器的 bin/bash 目录,通过 elasticsearch-plugin 命令安装 smartcn 分词器,成功后重启 ES 使其生效
docker exec -ti 893 /bin/bash
./bin/elasticsearch-plugin install analysis-smartcn
重启后的日志可以看到 smartcn 插件被加载:
可以测试下分词器
官网下载 logstash,版本同 ES 也是 6.7.0。logstash 从 5.x 以后就自带 logstash-input-jdbc 插件,具体步骤可参考 Mysql数据同步Elasticsearch方案总结 , 安装好插件后,就需要写配置文件了,主要就是 jdbc.conf 和 jdbc.sql
jdbc.conf 内容如下(要保证 lib 下有 mysql 的驱动 jar 包):
input {
stdin {}
jdbc {
jdbc_driver_library => "E:/software/logstash-6.7.0/lib/mysql-connector-java-5.1.30.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://ip_address:3306/wei"
jdbc_user => "heart"
jdbc_password => "******"
statement_filepath => "E:/software/logstash-6.7.0/config/jdbc.sql"
type => "es"
}
}
output {
elasticsearch {
index => "search_all"
hosts => "localhost:9200"
template => "E:/software/logstash-6.7.0/config/index_template.json"
}
stdout {
codec => json_lines
}
}
jdbc.sql:
select distinct PERSON_ID AS PATIENT_ID, CONCEPT_NAME as txt from condition_occurrence co INNER JOIN CONCEPT C ON co.CONDITION_CONCEPT_ID = C.CONCEPT_ID
UNION ALL
select distinct PERSON_ID AS PATIENT_ID, CONCEPT_NAME as txt from drug_exposure d INNER JOIN CONCEPT C ON d.DRUG_CONCEPT_ID = C.CONCEPT_ID
UNION ALL
select distinct PERSON_ID AS PATIENT_ID, CONCEPT_NAME as txt from measurement m INNER JOIN CONCEPT C ON m.MEASUREMENT_CONCEPT_ID = C.CONCEPT_ID
UNION ALL
select distinct PERSON_ID AS PATIENT_ID, CONCEPT_NAME as txt from procedure_occurrence p INNER JOIN CONCEPT C ON p.PROCEDURE_CONCEPT_ID = C.CONCEPT_ID
UNION ALL
select PERSON_ID AS PATIENT_ID, NOTE_TEXT as txt from NOTE
index_template.json:
{
"template":"search-all",
"settings":{
"analysis":{
"analyzer":{
"default": {
"type":"smartcn"
}
}
}
},
"mappings":{
"doc":{
"properties":{
"txt":{
"type":"text",
"analyzer":"smartcn"
}
}
}
}
}
配置完了,执行索引创建,发现 smartcn 并没有成为索引的分词器。然后尝试在 es 的配置文件里配置默认 analyzer :
也就是在 ES 的 config/elasticsearch.yml 里面加上:
index.analysis.analyzer.default.type: "smartcn"
还是没有起作用,原来是从 ES 5.0.x 后就不支持在配置文件里尤其 elasticsearch.yaml 文件里配置,只支持 API 的方式进行设置:
请求体为:
{
"settings":{
"analysis":{
"analyzer":{
"default":{
"type":"smartcn"
}
}
}
},
"mappings":{
"doc":{
"properties":{
"txt":{
"type":"text",
"analyzer":"smartcn"
}
}
}
}
}
索引的默认分词器执行后,可以通过浏览器看下,是不是真的生效了:
默认的分词器生效后,jdbc.conf 里面就不需要(而且也不管用的 template 属性了)。也就是新的 jdbc.conf:
input {
stdin {}
jdbc {
jdbc_driver_library => "E:/software/logstash-6.7.0/lib/mysql-connector-java-5.1.30.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://ip_address:3306/wei"
jdbc_user => "heart"
jdbc_password => "******"
statement_filepath => "E:/software/logstash-6.7.0/config/jdbc.sql"
type => "es"
}
}
output {
elasticsearch {
index => "search_all"
hosts => "localhost:9200"
}
stdout {
codec => json_lines
}
}
现在可以执行 logstash 命令去填充数据到空索引 search_all 里面了:
E:\software\logstash-6.7.0\bin>logstash -f ../config/jdbc.conf --config.reload.automatic
没有问题的话,索引 search_all 准备就绪,可以搜索了。
------------------下面要讲 SpringBoot 连接 ES 进行查询----------------
springboot 启动后的第一次查询,就报错:
10:38:36 ERROR [c.h.r.person.controller.GlobalExceptionHandler] - java.lang.IllegalStateException: availableProcessors is already set to [4], rejecting [4]
at io.netty.util.NettyRuntime$AvailableProcessorsHolder.setAvailableProcessors(NettyRuntime.java:51)
at io.netty.util.NettyRuntime.setAvailableProcessors(NettyRuntime.java:87)
at org.elasticsearch.transport.netty4.Netty4Utils.setAvailableProcessors(Netty4Utils.java:79)
at org.elasticsearch.transport.netty4.Netty4Transport.<init>(Netty4Transport.java:112)
at org.elasticsearch.transport.Netty4Plugin.lambda$getTransports$0(Netty4Plugin.java:85)
at org.elasticsearch.client.transport.TransportClient.buildTemplate(TransportClient.java:192)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:288)
at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:128)
at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:114)
at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:104)
at com.hebta.retrieval.person.util.ElasticSearchClientBuilder.<init>(ElasticSearchClientBuilder.java:23)
这个比较好百度,解决办法就是 springboot 启动类加一个系统属性配置
pom.xml 要加上 ES 的相关依赖:
测试:
总结:
1. 下载 ES 及 logstash,版本一致,ES 要安装 smartcn 插件
2. 通过 PUT 命令生成 ES 空索引,设置默认的分词器 analyzer
3. 编写 logstash 的 logstash-input-jdbc 插件所需要的配置文件并放到 config 目录下
4. 执行 logstash 命令生成索引数据