janusGraph+hbase+es搭建踩坑记录

最新推荐文章于 2024-06-30 21:03:25 发布

球球offer

最新推荐文章于 2024-06-30 21:03:25 发布

阅读量3.5k

点赞数 1

分类专栏：技术坑文章标签：分布式 elasticsearch 大数据数据库 hbase

本文链接：https://blog.csdn.net/qq_37492509/article/details/112400095

版权

技术坑专栏收录该内容

6 篇文章 6 订阅

订阅专栏

janusGraph集群搭建

前言
- 软件下载
配置步骤
测试安装结果
问题汇总

前言

janusGraph的资料实在又少又杂，近日搭建了一个集群，写下踩坑记录。由于janusGraph可以自由配置后端存储和索引。之前先写了一个非root用户的hadoop搭建，本文在那篇文章的基础上继续搭建。
涉及部分服务器，节点描述都和hadoop搭建的相同、
非root用户的hadoop搭建。

软件下载

本次搭建的版本为hadoop2.10.1+zookeeper3.5.8+hbase2.2.6+elastisearch6.6.0+janusGraph0.5.2
下文都提供了官方下载地址，为了方便读者，可从如下链接一键免费获取。janusGraph在linux服务器的分布式配置安装全家桶
1. hadoop
由于使用分布式储存，首先需要搭建hadoop，参考上个博客下载安装。软件地址hadoop官方
2. zookeeper
ZooKeeper是一个分布式的，开放源码的分布式应用程序协调服务，是Google的Chubby一个开源的实现，是Hadoop和Hbase的重要组件。它是一个为分布式应用提供一致性服务的软件，提供的功能包括：配置维护、域名服务、分布式同步、组服务等。zookeeper下载官方地址。
本次搭建搭建三个zookeeper。
3. hbase
HBase是一个分布式的、面向列的开源数据库，该技术来源于 Fay Chang 所撰写的Google论文“Bigtable：一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统（File System）所提供的分布式数据存储一样，HBase在Hadoop之上提供了类似于Bigtable的能力。HBase是Apache的Hadoop项目的子项目。HBase不同于一般的关系数据库，它是一个适合于非结构化数据存储的数据库。另一个不同的是HBase基于列的而不是基于行的模式。
hbase下载官网
4. elasticsearch
Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java语言开发的，并作为Apache许可条款下的开放源码发布，是一种流行的企业级搜索引擎。Elasticsearch用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。官方客户端在Java、.NET（C#）、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示，Elasticsearch是最受欢迎的企业搜索引擎，其次是Apache Solr，也是基于Lucene。
elasticsearch官网下载
5. janusGraph
有关janusGraph的介绍可以参考我之前写的在windows10安装的教程。janusGraph在window10配置教程
 janusGraph官网下载

配置步骤

同hadoop安装相同，没有特别申明的情况下在master64服务器操作。
在这里插入图片描述

hadoop安装

安装参考上篇博客非root用户的hadoop搭建。

zookeeper安装

在主服务器上解压并改名

tar -zxvf apache-zookeeper-3.5.8-bin.tar.gz
mv apache-zookeeper-3.5.8-bin.tar.gz zookeeper

进入conf目录进行配置

cd zookeeper/conf
mv zoo_sample.cfg zoo.cfg
vim zoo.cfg

以下是zoo.cfg内容，特别注意，dataDir的目录为自定义，最后三行为集群的ip地址，node1,node2，node3已经写入系统hosts文件，否则需要写成ip。

 1 # The number of milliseconds of each tick
  2 tickTime=2000
  3 # The number of ticks that the initial 
  4 # synchronization phase can take
  5 initLimit=10
  6 # The number of ticks that can pass between 
  7 # sending a request and getting an acknowledgement
  8 syncLimit=5
  9 # the directory where the snapshot is stored.
 10 # do not use /tmp for storage, /tmp here is just 
 11 # example sakes.
 12 dataDir=/home/hadoop/zookeeper/data
 13 # the port at which the clients will connect
 14 clientPort=2181
 15 # the maximum number of client connections.
 16 # increase this if you need to handle more clients
 17 #maxClientCnxns=60
 18 #
 19 # Be sure to read the maintenance section of the 
 20 # administrator guide before turning on autopurge.
 21 #
 22 # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
 23 #
 24 # The number of snapshots to retain in dataDir
 25 #autopurge.snapRetainCount=3
 26 # Purge task interval in hours
 27 # Set to "0" to disable auto purge feature
 28 #autopurge.purgeInterval=1
 29 server.1=node1:2888:3888
 30 server.2=node2:2888:3888
 31 server.3=node3:2888:3888
~

在dataDir编辑的保存目录下创建myid文件，并在64服务器编辑内容为1，178服务器为2，179服务器为2（对服务器名字有困惑的移步hadoop搭建）

cd ../data
vim myid

在这里插入图片描述
至此已经完成了配置64服务器的配置，将本次配置分发道178，179服务器。注意路径要更改

tar zxcf zookeeper.master.tar.gz zookeeper
scp zookeeper.master.tar.gz node2:/home/hadoop
scp zookeeper.master.tar.gz node3:/home/hadoop

分发到两个服务器后解压。需要修改两个的myid文件。其实这个id对应的就是zoo.cfg的server.1,2,3。
启动zookeeper,可以首先进入~/.bashrc添加环境变量。下图可以参考。
在这里插入图片描述
让环境变量生效。

source ~/.bashrc

下面是几个指令。打开，关闭zookeeper。查询状态

zkServer.sh start     
zkServer.sh stop
zkServer.sh status

本次首先打开并查看状态，三个服务器分别打开后，随机有个leader,两个follower
在这里插入图片描述

hbase安装

解压，改名并进入conf目录

tar zxvf hbase-2.2.6-bin.tar.gz
mv hbase-2.2.6-bin.tar.gz hbase
cd hbase/conf
vim hbase-env.sh

增加三行，java路径要根据具体情况写，HBASE_CLASSPATH的路径是hadoop的etc配置文件路径。最后ZK需要改成false，不使用hbase自带的。

141 export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162
142 export HBASE_CLASSPATH=/diskC/hadoop/hadoop/etc/hadoop/
143 export HBASE_MANAGES_ZK=false

之后进入hbase-site.xml修改，由于是集群分布，所以hbase.cluster.distributed为true,rootdir和hadoop安装的core-site.xml相同，并添加hbase。hbase.zookeeper.quorum为三个节点的ip。

 42   <property>
 43     <name>hbase.cluster.distributed</name>
 44     <value>true</value>
 45   </property>
 46   <property>
 47     <name>hbase.tmp.dir</name>
 48     <value>/diskC/hadoop/hbase/tmp</value>
 49   </property>
 50   <property>
 51     <name>hbase.unsafe.stream.capability.enforce</name>
 52     <value>false</value>
 53   </property>
 54   <property>
 55           <name>hbase.rootdir</name>
 56           <value>hdfs://node1:8020/hbase</value>
 57   </property>
 58   <property>
 59           <name>hbase.zookeeper.quorum</name>
 60           <value>node1,node2,node3</value>
 61   </property>
 62   <property>
 63           <name>hbase.zookeeper.property.dataDir</name>
 64           <value>/diskC/hadoop/zookeeper/data</value>
 65           <description>Property fromZooKeeper's config zoo.cfg.
 66                   The directory where the snapshot isstored.
 67           </description>
 68   </property>
 69   <property>
 70           <name>hbase.zookeeper.property.clientPort</name>
 71           <value>2181</value>
 72           <description>Property fromZooKeeper's config zoo.cfg.
 73                   The port at which the clients willconnect.
 74           </description>
 75   </property>
 76 </configuration>

之后修改regionservers。三个节点的ip。

vim regionservers

在这里插入图片描述
增加环境变量。

131 #======hbase===
132 export HBASE_HOME=/home/hadoop/hbase/
133 export PATH=$PATH:$HBASE_HOME/bin:$HBASE_HOME/conf

在这里插入图片描述
同样将压缩并分发到另外两个服务器，注意hbase-env.sh和hbase-site.xml中的相应位置会存在不同。 需要特别留意。
开启hbase指令为,这个指令只需要在64主节点服务器就可以。

start-hbase.sh

关闭指令为

stop-hbase.sh

elasticsearch安装

解压并改名，只在64服务器

tar zxvf elasticsearch-6.6.0.tar.gz
mv elasticsearch-6.6.0.tar.gz elasticsearch

进入~/.bashrc添加环境变量（可不做）开启。

bin/elasticsearch

我自己在开启时使用了nohup在后台开启

nohup bin/elasticsearch >logs/es.log 2>&1&

janusGraph安装

解压并改名,只在64服务器。
让janusGraph使用server和client模式。进入conf目录修改。

cd janusGraph/conf
vim janusgraph-hbase-es.properties

强调修改的几个，一个是storage.hostname为三个节点的ip,index.search.hostname也为三个节点的ip

1 # Copyright 2019 JanusGraph Authors
  2 #
  3 # Licensed under the Apache License, Version 2.0 (the "License");
  4 # you may not use this file except in compliance with the License.
  5 # You may obtain a copy of the License at
  6 #
  7 #      http://www.apache.org/licenses/LICENSE-2.0
  8 #
  9 # Unless required by applicable law or agreed to in writing, software
 10 # distributed under the License is distributed on an "AS IS" BASIS,
 11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 # See the License for the specific language governing permissions and
 13 # limitations under the License.
 14 
 15 # JanusGraph configuration sample: HBase and Elasticsearch
 16 #
 17 # This file connects to HBase using a Zookeeper quorum
 18 # (storage.hostname) consisting solely of localhost.  It also connects
 19 # to Elasticsearch running on localhost over Elasticsearch's native "Transport"
 20 # protocol.  Zookeeper, the HBase services, and Elasticsearch must already 
 21 # be running and available before starting JanusGraph with this file.
 22 
 23 # The implementation of graph factory that will be used by gremlin server
 24 #
 25 # Default:    org.janusgraph.core.JanusGraphFactory
 26 # Data Type:  String
 27 # Mutability: LOCAL
 28 gremlin.graph=org.janusgraph.core.JanusGraphFactory
 29 
 30 # The primary persistence provider used by JanusGraph.  This is required. 
 31 # It should be set one of JanusGraph's built-in shorthand names for its
 32 # standard storage backends (shorthands: berkeleyje, cassandrathrift,
 33 # cassandra, astyanax, embeddedcassandra, cql, hbase, inmemory) or to the
 34 # full package and classname of a custom/third-party StoreManager
 35 # implementation.
 36 #
 37 # Default:    (no default value)
 38 # Data Type:  String
 39 # Mutability: LOCAL
 40 storage.backend=hbase
 41 
 42 # The hostname or comma-separated list of hostnames of storage backend
 43 # servers.  This is only applicable to some storage backends, such as
 44 # cassandra and hbase.
 45 #
 46 # Default:    127.0.0.1
 47 # Data Type:  class java.lang.String[]
 48 # Mutability: LOCAL
 49 #storage.hostname=127.0.0.1
 50 
 51 storage.hostname= XXX.XXX.XXX.64, XXX.XXX.XXX.178, XXX.XXX.XXX.179
 52 # Whether to enable JanusGraph's database-level cache, which is shared
 53 # across all transactions. Enabling this option speeds up traversals by
 54 # holding hot graph elements in memory, but also increases the likelihood
 55 # of reading stale data.  Disabling it forces each transaction to
 56 # independently fetch graph elements from storage before reading/writing
  57 # them.
 58 #
 59 # Default:    false
 60 # Data Type:  Boolean
 61 # Mutability: MASKABLE
 62 cache.db-cache = true
 63 
 64 # How long, in milliseconds, database-level cache will keep entries after
 65 # flushing them.  This option is only useful on distributed storage
 66 # backends that are capable of acknowledging writes without necessarily
 67 # making them immediately visible.
 68 #
 69 # Default:    50
 70 # Data Type:  Integer
 71 # Mutability: GLOBAL_OFFLINE
 72 #
 73 # Settings with mutability GLOBAL_OFFLINE are centrally managed in
 74 # JanusGraph's storage backend.  After starting the database for the first
 75 # time, this file's copy of this setting is ignored.  Use JanusGraph's
 76 # Management System to read or modify this value after bootstrapping.
 77 cache.db-cache-clean-wait = 20
 78 
 79 # Default expiration time, in milliseconds, for entries in the
 80 # database-level cache. Entries are evicted when they reach this age even
 81 # if the cache has room to spare. Set to 0 to disable expiration (cache
 82 # entries live forever or until memory pressure triggers eviction when set
 83 # to 0).
 84 #
 85 # Default:    10000
 86 # Data Type:  Long
 87 # Mutability: GLOBAL_OFFLINE
 88 #
 89 # Settings with mutability GLOBAL_OFFLINE are centrally managed in
 90 # JanusGraph's storage backend.  After starting the database for the first
 91 # time, this file's copy of this setting is ignored.  Use JanusGraph's
 92 # Management System to read or modify this value after bootstrapping.
 93 cache.db-cache-time = 180000
 94 
 95 # Size of JanusGraph's database level cache.  Values between 0 and 1 are
 96 # interpreted as a percentage of VM heap, while larger values are
 97 # interpreted as an absolute size in bytes.
 98 #
 99 # Default:    0.3
100 # Data Type:  Double
101 # Mutability: MASKABLE
102 cache.db-cache-size = 0.5
103 
104 # The indexing backend used to extend and optimize JanusGraph's query
105 # functionality. This setting is optional.  JanusGraph can use multiple
106 # heterogeneous index backends.  Hence, this option can appear more than
107 # once, so long as the user-defined name between "index" and "backend" is
108 # unique among appearances.Similar to the storage backend, this should be
109 # set to one of JanusGraph's built-in shorthand names for its standard
110 # index backends (shorthands: lucene, elasticsearch, es, solr) or to the
111 # full package and classname of a custom/third-party IndexProvider
112 # implementation.
113 #
114 # Default:    elasticsearch
115 # Data Type:  String
116 # Mutability: GLOBAL_OFFLINE
117 #
118 # Settings with mutability GLOBAL_OFFLINE are centrally managed in
119 # JanusGraph's storage backend.  After starting the database for the first
120 # time, this file's copy of this setting is ignored.  Use JanusGraph's
121 # Management System to read or modify this value after bootstrapping.
122 index.search.backend=elasticsearch
123 
124 # The hostname or comma-separated list of hostnames of index backend
125 # servers.  This is only applicable to some index backends, such as
126 # elasticsearch and solr.
127 #
128 # Default:    127.0.0.1
129 # Data Type:  class java.lang.String[]
130 # Mutability: MASKABLE
131 index.search.hostname= XXX.XXX.XXX.64, XXX.XXX.XXX.178, XXX.XXX.XXX.179

进入gremlin-server修改gremlin-server.yaml。主要是修改了graph的配置。

 16 host: 0.0.0.0
 17 port: 8182
 18 scriptEvaluationTimeout: 30000
 19 channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
 20 #channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
 21 graphs: {
 22         # graph: conf/janusgraph-hbase-es.properties
 23   graph: conf/janusgraph-hbase-es.properties
 24 }
 25 scriptEngines: {
 26   gremlin-groovy: {
 27     plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
 28                org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
 29                org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
 30                org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
 31                org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
 32 serializers:
 33   - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
 34   - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
 35   - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
 36   # Older serialization versions for backwards compatibility:
 37   - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
 38   - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
 39   - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
 40   - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
 41   - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
 42   - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
 43 processors:
 44   - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
 45   - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
 46 metrics: {
 47   consoleReporter: {enabled: true, interval: 180000},
 48   csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
 49   jmxReporter: {enabled: true},
 50   slf4jReporter: {enabled: true, interval: 180000},
 51   gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
 52   graphiteReporter: {enabled: false, interval: 180000}}
 53 maxInitialLineLength: 4096
 54 maxHeaderSize: 8192
 55 maxChunkSize: 8192
 56 maxContentLength: 65536

修改完配置后，打开远端服务。

bin/gremlin-server.sh ./conf/gremlin-server/gremlin-server.yaml

我的图片都是log图片，一些文件位置有点区别。
在这里插入图片描述
简单的测试。

$  bin/gremlin.sh
         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
plugin activated: tinkerpop.tinkergraph
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured localhost/127.0.0.1:8182
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182] - type ':remote console' to return to local mode
gremlin> graph
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> g
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin> g.V()
gremlin> user = "Chris"
==>Chris
gremlin> graph.addVertex("name", user)
No such property: user for class: Script21
Type ':help' or ':h' for help.
Display stack trace? [yN]