官网下载地址:https://www.elastic.co/cn/downloads/
1、ElesticSearch背景及介绍
ELK是三个组件的头字母简称:
Elasticsearch(ES):对数据进行搜索、分析和存储;也是个NoSQL,类似于Redis/HBase/…
Logstash:收集日志文件,动态数据收集管道,还有Filebeat也可以收集(比Flume轻量级)
Kibana:实现数据可视化
MIS系统搜索是有很大的限制的:
都是点对点的查询,没法做分词,约等于查询
正排索引: doc_id到doc_content的关系
doc_id doc_content
1 若泽数据从事大数据培训
2 Spark是一种分布式计算引擎
3 大数据培训有很多
倒排索引: 单词到doc_id的关系
word doc_id
若泽数据 1
从事 1
大数据培训 1,3
Spark 2
一种 2
分布式 2
计算引擎 2
很多 3
核心概念:
NRT : Near Realtime 近实时的,指从index创建到查询
Cluster:1..n Node
Node
Index: 类似Database
Document: 类似Row
Type: 类似Table
Field: 类似Column
2、ElesticSearch安装
前置要求:
JDK8+
CentOS7
我们这里选择的版本是6.6.2,JDK1.8,Centos6.5(需要修改配置,当然直接用CentOS7的话就会少很多问题)
#解压到你自己的app目录下
[hadoop@hadoop001 soft]$ tar -zxvf elasticsearch-6.6.2.tar.gz -C ../app/
[hadoop@hadoop001 elasticsearch-6.6.2]$ mkdir data logs
[hadoop@hadoop001 config]$ pwd
/home/hadoop/app/elasticsearch-6.6.2/config
[hadoop@hadoop001 config]$ vi elasticsearch.yml
cluster.name: ruozedata-es-cluster
node.name: ruozedata-es-node1
path.data: /home/hadoop/app/elasticsearch-6.6.0/data
path.logs: /home/hadoop/app/elasticsearch-6.6.0/logs
network.host: 0.0.0.0
[hadoop@hadoop001 config]$ cd ../bin/
[hadoop@hadoop001 bin]$ rm *.bat
3、启动过程出现的问题
//另外如果用root用户的话还会报can not run elasticsearch as root,这时候就需要创建用户,比如useradd elkuser、chown -R elkuser:elkuser elasticsearch-6.6.2,当然我们这里用的是hadoop用户,也就没有这个报错了
ERROR: [4] bootstrap checks failed
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
//修改文件数nofile,另外修改Linux系统参数肯定要用root用户
#要用root用户,星号表示用户
echo "*soft nofile 65536">> /etc/security/limits.conf
echo "*hard nofile 131072">> /etc/security/limits.conf
[root@hadoop001 ~]# echo "* soft nofile 65536" >>/etc/security/limits.cof
[root@hadoop001 ~]# echo "* hard nofile 131072" >>/etc/security/limits.cof
[2]: max number of threads [1024] for user [hadoop] is too low, increase to at least [4096]
//针对elastic安装所在hadoop用户修改进程数
echo "elkuser soft nproc 4096">> /etc/security/limits.conf
echo "elkuser hard nproc 4096">> /etc/security/limits.conf
[root@hadoop001 ~]# echo "hadoop soft nproc 4096" >>/etc/security/limits.cof
You have new mail in /var/spool/mail/root
[root@hadoop001 ~]# echo "hadoop hard nproc 4096" >>/etc/security/limits.cof
[3]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
sysctl -w "vm.max_map_count=262144" >> /etc/sysctl.conf
echo "vm.max_map_count=262144" >> /etc/sysctl.conf
sysctl -p
[root@hadoop001 ~]# sysctl -w vm.max_map_count=262144 #临时生效,只对当前session生效
vm.max_map_count = 262144
You have new mail in /var/spool/mail/root
[root@hadoop001 ~]# echo "vm.max_map_count=262144" >> /etc/sysctl.conf
[root@hadoop001 ~]# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
vm.nr_hugepages = 4
vm.max_map_count = 262144
[root@hadoop001 ~]#
#永久生效,而上面的修改文件数以及进程数必须要重启
[root@hadoop001 ~]#
[4]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
两种方法:1.升级centos --> 7.x 2.禁用disable
//到elastic search 配置文件修改
[hadoop@hadoop001 config]$ vi elasticsearch.yml
# Lock the memory on startup:
#
bootstrap.memory_lock: false
bootstrap.system_call_filter :false
[root@hadoop001 tmp]# reboot
[root@hadoop001 ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63715
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 63715
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[root@hadoop001 ~]#
4、ElesticSearch启动
看下启动脚本,其中可以看出加-d参数可以后台启动
[hadoop@hadoop001 bin]$ vi elasticsearch
#!/bin/bash
# CONTROLLING STARTUP:
#
# This script relies on a few environment variables to determine startup
# behavior, those variables are:
#
# ES_PATH_CONF -- Path to config directory
# ES_JAVA_OPTS -- External Java Opts on top of the defaults set
#
# Optionally, exact memory values can be set using the `ES_JAVA_OPTS`. Note that
# the Xms and Xmx lines in the JVM options file must be commented out. Example
# values are "512m", and "10g".
#
# ES_JAVA_OPTS="-Xms8g -Xmx8g" ./bin/elasticsearch
source "`dirname "$0"`"/elasticsearch-env
ES_JVM_OPTIONS="$ES_PATH_CONF"/jvm.options
JVM_OPTIONS=`"$JAVA" -cp "$ES_CLASSPATH" org.elasticsearch.tools.launchers.JvmOptionsParser "$ES_JVM_OPTIONS"`
ES_JAVA_OPTS="${JVM_OPTIONS//\$\{ES_TMPDIR\}/$ES_TMPDIR} $ES_JAVA_OPTS"
cd "$ES_HOME"
# manual parsing to find out, if process should be detached
if ! echo $* | grep -E '(^-d |-d$| -d |--daemonize$|--daemonize )' > /dev/null; then
exec \
"$JAVA" \
$ES_JAVA_OPTS \
-Des.path.home="$ES_HOME" \
-Des.path.conf="$ES_PATH_CONF" \
-Des.distribution.flavor="$ES_DISTRIBUTION_FLAVOR" \
-Des.distribution.type="$ES_DISTRIBUTION_TYPE" \
-cp "$ES_CLASSPATH" \
org.elasticsearch.bootstrap.Elasticsearch \
"$@"
else
exec \
"$JAVA" \
$ES_JAVA_OPTS \
-Des.path.home="$ES_HOME" \
-Des.path.conf="$ES_PATH_CONF" \
-Des.distribution.flavor="$ES_DISTRIBUTION_FLAVOR" \
-Des.distribution.type="$ES_DISTRIBUTION_TYPE" \
-cp "$ES_CLASSPATH" \
org.elasticsearch.bootstrap.Elasticsearch \
"$@" \
<&- &
retval=$?
pid=$!
"elasticsearch" 60L, 1777C
ES的bin目录,启动
[hadoop@hadoop001 bin]$ ./elasticsearch
用如下命令进行启动(直接将参数带在启动命令后面)
bin/elasticsearch -E cluster.name=ruozedata-es-cluster -E node.name=ruozedata-es-node1 -E path.data=ruozedata-es-node1-data
//当然会有一个优先级的问题,生产上改配置文件的多
#如果单台虚拟机模拟集群,那么启动第二个启动,要区别node-name,和data-path
bin/elasticsearch -E cluster.name=ruozedata-es-cluster -E node.name=ruozedata-es-node2 -E path.data=ruozedata-es-node2-data
连接不上的原因是有一个跨域访问没配,另外需要修改配置文件:
#elasticsearch.yml(true前有一个空格)
http.cors.enabled: true
http.cors.allow-origin: "*"
不过这里面有一个优先级的问题,生产上还是直接改elasticsearch.yml配置文件的多
后台启动命令:
[hadoop@hadoop001 bin]$ ./elasticsearch -d
[hadoop@hadoop001 bin]$ ps -ef |grep elasticsearch
hadoop 3217 1 99 04:34 pts/1 00:00:20 /usr/java/jdk1.8.0_45/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-7405814871228322880 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=data -XX:ErrorFile=logs/hs_err_pid%p.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:logs/gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=64m -Des.path.home=/home/hadoop/app/elasticsearch-6.6.2 -Des.path.conf=/home/hadoop/app/elasticsearch-6.6.2/config -Des.distribution.flavor=default -Des.distribution.type=tar -cp /home/hadoop/app/elasticsearch-6.6.2/lib/* org.elasticsearch.bootstrap.Elasticsearch -d
hadoop 3236 3217 0 04:34 pts/1 00:00:00 /home/hadoop/app/elasticsearch-6.6.2/modules/x-pack-ml/platform/linux-x86_64/bin/controller
hadoop 3267 2520 0 04:34 pts/1 00:00:00 grep elasticsearch
这时候在UI地址后面加上如下参数就可以看到相应的节点状况:
/_cat/nodes 节点状况
/_cluster/health/pretty 集群健康状况
//网络通的情况下是认cluster.name
http://hadoop001:9200/_cat/nodes
http://hadoop001:9200/_cluster/health/pretty
克隆一个窗口,测试
ES默认端口9200
[hadoop@hadoop001 bin]$ curl -XGET hadoop001:9200
{
"name" : "ruozedata-es-node1",
"cluster_name" : "ruozedata-es-cluster",
"cluster_uuid" : "QJXuB1DBSau5PPKV4L8dow",
"version" : {
"number" : "6.6.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "3bd3e59",
"build_date" : "2019-03-06T15:16:26.864148Z",
"build_snapshot" : false,
"lucene_version" : "7.6.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
[hadoop@hadoop001 bin]$