Elasticsearch安装和基本使用+数据同步

最新推荐文章于 2022-08-16 11:24:41 发布

kogwang

最新推荐文章于 2022-08-16 11:24:41 发布

阅读量275

点赞数 1

分类专栏： Python 数据库大数据文章标签： Python ElasticSearch 数据同步数据分析

本文链接：https://blog.csdn.net/weixin_42662249/article/details/103709734

版权

Python 同时被 3 个专栏收录

14 篇文章 0 订阅

订阅专栏

数据库

9 篇文章 0 订阅

订阅专栏

大数据

4 篇文章 0 订阅

订阅专栏

楼主也是最近刚使用Elasticsearch如有错误欢迎指出及时更改

使用Elasticsearch前先确认本机是否已经安装了jdk 命令: java -version

# 以下安装成功
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

下载Elasticsearch此处我们下载Linux版本

官网: https://www.elastic.co/cn/products/ 下载地址: https://www.elastic.co/cn/downloads/elasticsearch
下载完是elasticsearch-7.5.1-linux-x86_64.tar.gz文件解压文件: tar -zxvf elasticsearch-7.5.1-linux-x86_64.tar.gz
重命名解压文件为ES进入到ES文件夹下执行 ./bin/elasticsearch 关注打印的日志中几个关键字initialized starting ...
出现started的时候说明启动成功本地浏览器访问 localhost:9200 返回json数据启动成功接下来进行配置

配置远程访问 vim config/elasticsearch.yml (因为我是在云上操作自己本机测试可以忽略)

# 注释解开改为自己本机的ip地址
network.host: 127.0.0.1   # you_ip
# 此处注释解开
cluster.initial_master_nodes: ["node-1", "node-2"]

# 文件最后添加  注意冒号后面要空格
http.cors.enabled: true
http.cors.allow-origin: "*"

为了便于我们浏览查看此处我用到elasticsearch-head

github上搜索elasticsearch-head 下载mobz/elasticsearch-head
cd elasticsearch-head
npm install
npm run start
open http://localhost:9100/
连接you_ip:9200

远程访问you_ip:9200返回json数据服务正常启动继续配置elasticsearch.yml

# 因为支持分布式集群便于管理配置你的主ES
cluster.name: esserver
node.name: master
node.master: true

此处是集群配置如果不需要集群以下不用配置

# 此处集群配置随从ES 同样在子ES中配置config/elasticsearch.yml
network.host: 127.0.0.1   # 还是你的IP

http.port: 8200           # 不要和主IP冲突

cluster.initial_master_nodes: ["node-1","node-2"]   # 解开注释

# 指定到主ES
cluster.name: esserver
node.name: slave1
discovery.zen.ping.unicast.hosts: ["192.168.0.52"]

以上整个是一个安装及配置的过程接下来是简单使用

导入Elasticsearch数据需要用到两个插件安装非常简单但是可能需要梯子

官网下载logshtash 下载地址: https://www.elastic.co/cn/downloads/logstash
解压cd到文件bin目录
安装logstash-input-jdbc插件 ./logstash-plugin install logstash-input-jdbc

在解压的logstash的config文件夹下新建jdbc.conf,配置内容如下(此处我是从其他网页上复制的原网页没找到如有版权及时联系删除)

# 输入部分
input {
  stdin {}
  jdbc {
    # mysql数据库驱动
    jdbc_driver_library => "/usr/local/logstash-6.4.2/config/mysql-connector-java-5.1.30.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    # mysql数据库链接，数据库名
    jdbc_connection_string => "jdbc:mysql://localhost:3306/octopus"
    # mysql数据库用户名，密码
    jdbc_user => "root"
    jdbc_password => "12345678"
    # 设置监听间隔  各字段含义（分、时、天、月、年），全部为*默认含义为每分钟更新一次
    schedule => "* * * * *"
    # 分页
    jdbc_paging_enabled => "true"
    # 分页大小
    jdbc_page_size => "50000"
    # sql语句执行文件，也可直接使用 statement => 'select * from t_employee'
    statement_filepath => "/usr/local/logstash-6.4.2/config/jdbc.sql"
    # elasticsearch索引类型名
    type => "t_employee"
  }
}

# 过滤部分(不是必须项）
filter {
    json {
        source => "message"
        remove_field => ["message"]
    }
}

# 输出部分
output {
    elasticsearch {
        # elasticsearch索引名
        index => "octopus"
        # 使用input中的type作为elasticsearch索引下的类型名
        document_type => "%{type}"   # <- use the type from each input
        # elasticsearch的ip和端口号
        hosts => "localhost:9200"
        # 同步mysql中数据id作为elasticsearch中文档id
        document_id => "%{id}"
    }
    stdout {
        codec => json_lines
    }
}

# 注: 使用时请去掉此文件中的注释，不然会报错

在logstash-6.4.2/config 目录下新建jdbc.sql文件

select * from t_employee

运行

cd logstash-6.4.2
# 检查配置文件语法是否正确
bin/logstash -f config/jdbc.conf --config.test_and_exit
# 启动
bin/logstash -f config/jdbc.conf --config.reload.automatic