Elasticsearch介绍
全文搜索属于最常见的需求, elasticsearch是目前全文搜索引擎的首选。
Elasticsearch是一个基于 Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引孳,基于 RESTful web接口。
Elasticsearch是用Java开发的,并作为 Apache许可条款下的开放源码发布。设计用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便.
它可以快速地储存、搜索和分析海量数据。维基百科、 Stack Overflow、ihub等知名网站都采用了它。
Kibana 是为 Elasticsearch设计的开源分析和可视化平台。你可以使用 Kibana 来搜索,查看存储在 Elasticsearch 索引中的数据并与之交互。你可以很容易实现高级的数据分析和可视化,以图标的形式展现出来。
Elasticsearch官网:https://www.elastic.co/cn/products/elasticsearch/
安装前需有Java环境
1.安装JDK
yum install -y java
查看是否安装成功
java -version
2. 安装elasticsearch (端口默认9200)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.12.0-linux-x86_64.tar.gz
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.12.0-linux-x86_64.tar.gz.sha512
shasum -a 512 -c elasticsearch-7.12.0-linux-x86_64.tar.gz.sha512
tar -xzf elasticsearch-7.12.0-linux-x86_64.tar.gz
cd elasticsearch-7.12.0/
执行 ./bin/elasticsearch 报错 java.lang.RuntimeException: can not run elasticsearch as root
新的版本安全级别 提高了,不允许采用root启动,我们需要添加一个新的用户
groupadd elsearch
useradd elsearch -g elsearch
chown -R elsearch:elsearch ./*
使用elsearch 用户启动 elasticsearch
su elsearch
./bin/elasticsearch
验证是否开启 curl localhost:9200
查看端口是否启动:
netstat -apn | grep 9200
要将Elasticsearch作为守护进程运行,请在命令行中指定-d,并使用-p选项将进程ID记录到一个文件中:
./bin/elasticsearch -d -p pid
日志消息可以在$ES_HOME/logs/目录中找到。
要关闭Elasticsearch,杀死pid文件中记录的进程ID:
pkill -F pid
启动失败提示错误:
错误0:
bootstrap check failure [1] of [2]: max number of threads [3796] for user [elsearch] is too low, increase to at least [4096]
bootstrap check failure [2] of [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
修改 /etc/security/limits.conf
在文件末尾增加以下两行:(此操作需重启服务器)
* soft nproc 65535
* hard nproc 65535
修改 vi /etc/sysctl.conf
添加 vm.max_map_count=262144,然后加载参数:sysctl -p
参考地址:https://blog.csdn.net/zeorg/article/details/115209262,https://blog.csdn.net/x356982611/article/details/89178866
错误1:
Native memory allocation(mmap) failed to map 107286953984 bytes for committing reserved memory.
解决方案:config/jvm.options
修改jvm.opains文件中关于内存的配置(根据服务器实际内存大小调整)
vi config/jvm.options
-Xms256m
-Xmx256m
错误2:
configure the number of parallel GC threads appropriately using-XX:ParalleIGCThreads=N
解决方案:添加配置到jvm.options
vi config/jvm.options
###在最后一行添加
-XX:-AssumeMP
错误3:ERROR: Elasticsearch did not exit normally - check the logs at xxx.log
在运行命令中添加 -E "discovery.type=single-node"-------------------> ./bin/elasticsearch -E "discovery.type=single-node"
允许外部访问9200:
修改配置文件:(参考地址:ElasticSearch7 设置外网访问失败)
vi config/elasticsearch.yml
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["127.0.0.1", "[::1]"]
保存重启
将9200端口加入防火墙:
firewall-cmd --zone=public --add-port=9200/tcp --permanent
firewall-cmd --reload
systemctl restart firewalld
3. 安装kibana(端口默认5601)
curl -O https://artifacts.elastic.co/downloads/kibana/kibana-7.12.0-linux-x86_64.tar.gz
curl https://artifacts.elastic.co/downloads/kibana/kibana-7.12.0-linux-x86_64.tar.gz.sha512 | shasum -a 512 -c -
tar -xzf kibana-7.12.0-linux-x86_64.tar.gz
cd kibana-7.12.0-linux-x86_64/
配置:
vi config/kibana.yml
server.port: 5601
server.host: "0.0.0.0"
server.name: "kibana"
elasticsearch.hosts: ["http://localhost:9200"]
##汉化
i18n.locale: "zh-CN"
查看更改的相关配置
cat config/kibana.yml | grep -v '#' | grep -v '^$'
将5601端口加入防火墙
firewall-cmd --zone=public --add-port=5601/tcp --permanent
firewall-cmd --reload
systemctl restart firewalld
使用elsearch 用户启动 elasticsearch
su elsearch
./bin/kibana
访问:
127.0.0.1:5601
4.安装Logstash
curl -O https://artifacts.elastic.co/downloads/logstash/logstash-7.12.1-linux-x86_64.tar.gz
tar -xzf logstash-7.12.1-linux-x86_64.tar.gz
cd logstash-7.12.1-linux-x86_64
启动:
bin/logstash -f config/logstash-sample.conf
导入Movielens的测试数据集
下载地址:http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
配置config/logstash-sample.conf
input {
file {
path => "/home/logstash/movies.csv" # 实际的movies.csv路径
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["id", "content", "genre"]
}
mutate {
split => {"genre" => "|"}
remove_field => ["path", "host", "@timestamp", "message"]
}
mutate {
split => ["content", "("]
add_field => {"title" => "%{[content][0]}%"}
add_field => {"tear" => "%{[content][1]}%"}
}
mutate {
convert => {
"year" => "integer"
}
strip => ["title"]
remove_field => ["path", "host", "@timestamp", "message", "content"]
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "movies"
"document_id" => "%{id}"
}
stdout {}
}
执行导入操作:
# -t 检测文件的正确与否的选项
./bin/logstash -f ./config/logstash-sample.conf -t
# 执行导入
./bin/logstash -f ./config/logstash-sample.conf