Docker下的Apache-Atlas元数据治理组件安装

操作环境:

  • 操作系统:Windows10
  • Docker Desktop:4.10.1
  • Docker version: 20.10.17, build 100c701

一、Docker镜像制作

组件版本

组件名称 组件版本

Hadoop 3.2.1
Hive 3.1.2
Hbase 2.3.4
Zookeeper 3.5.9
Kafka 2.6.2
Solr 7.4.0
Atlas 2.1.0
jdk 1.8
python 2.7
Maven 3.6.3

步骤一
在三个节点中执行下面命令,生产密钥文件

ssh-keygen

执行命令后会要求确认密钥文件的存储位置(默认~/.ssh/),这个过程直接按“Enter”键即可,id_rsa是本机私钥文件,id_rsa.pub是本机公钥文件

步骤二
分别在三个节点中执行下面命令:

ssh-copy-id hadoop01
ssh-copy-id hadoop02
ssh-copy-id hadoop03

这个过程会要求输入yes或者no,这里直接输入yes,然后输入主机密码

步骤三
在各节点用以下命令测试ssh免密登录

ssh hadoop01
ssh hadoop02
ssh hadoop03

安装noded
1.下载解压
wget https://cdn.npm.taobao.org/dist/node/v12.16.2/node-v12.16.2-linux-x64.tar.xz
tar -xf node-v12.16.2-linux-x64.tar.xz
cd node-v12.16.2-linux-x64/bin
./node -v

2.添加环境变量
export PATH= P A T H : PATH: PATH:NODE_HOME/bin

一、安装JDK

# 1.下载解压jdk到指定目录(先创建好目录)
tar -zxvf {file-dir}/jdk-8u341-linux-x64.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录

# 2.配置环境变量
vim /etc/profile
export JAVA_HOME=/root/environments/jdk1.8.0_341
export PATH=$PATH:$JAVA_HOME/bin

# 3.刷新使环境变量生效
source /etc/profile

# 4.验证
java -version

二、安装MAVEN

maven下载地址:https://dlcdn.apache.org/maven/maven-3/

# 1.下载解压maven到指定目录(先创建好目录)
tar -zxvf {file-dir}/apache-maven-3.6.3-bin.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录

# 2.配置环境变量
export MVN_HOME=/root/environments/apache-maven-3.6.3
export PATH=$PATH:$MVN_HOME/bin

# 3.刷新使环境变量生效
source /etc/profile

# 4.验证
mvn -version 

# 5.配置maven仓库地址
vim $MVN_HOME/conf/settings.xml
<!-- 然后在 mirros 中添加以下项-->
	<mirror>
    	<id>alimaven</id>
    	<name>aliyun maven</name>
    	<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
    	<mirrorOf>central</mirrorOf>
	</mirror>
<!-- 中央仓库1 -->
    <mirror>
        <id>repo1</id>
        <mirrorOf>central</mirrorOf>
        <name>Human Readable Name for this Mirror.</name>
        <url>https://repo1.maven.org/maven2/</url>
    </mirror>
<!-- 中央仓库2 -->
    <mirror>
        <id>repo2</id>
        <mirrorOf>central</mirrorOf>
        <name>Human Readable Name for this Mirror.</name>
        <url>https://repo2.maven.org/maven2/</url>
    </mirror>

maven在调配置文件的时候优先调用的是/root/.m2/(隐藏目录)下的内容,创建/root/.m2目录一个然后将配置文件复制过去

mkdir /root/.m2
cp $MVN_HOME/conf/settings.xml /root/.m2/

安装顺序zookeeper ,hadoop,hbase,hive,kafka,solr,atlas

安装zookeeper

所有组件版本可在apache的仓库中找到https://archive.apache.org/dist/hbase/,国内镜像缺少很多版本,多为稳定版

 
tar -zxvf {file-dir}/apache-zookeeper-3.5.9-bin.tar.gz -C /root/environments/ # {file-dir}为存放安装包的目录

cd /root/environments/zookeeper-3.4.6/conf
将zoo_sample.cfg拷贝一份
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/root/environments/zookeeper-3.4.6/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop01:2888:3888
server.2=hadoop02:2888:3888
server.3=hadoop03:2888:3888

创建环境变量

export ZK_HOME=/root/environments/zookeeper-3.4.6
export PATH=$PATH:$ZK_HOME/bin
source /etc/profile
创建data文件
mkdir /root/environments/zookeeper-3.4.6/data
cd /root/environments/zookeeper-3.4.6/data
touch myid && echo "1" > myid

然后将/root/environments/zookeeper-3.4.6整个文件夹拷贝到hadoop02、hadoop03并配置环境变量

scp -r /root/environments/zookeeper-3.4.6 hadoop02:/root/environments/
scp -r /root/environments/zookeeper-3.4.6 hadoop03:/root/environments/

并修改hadoop02、hadoop03机器上的/root/environments/zookeeper-3.4.6/data/myid文件(#不一样 ---------- 01≠02≠03

hadoop02   2
hadoop03   3
3台机器上分别启动zk
zkServer.sh start

zkServer.sh status 查看状态

安装hadoop

1.解压

tar -zxvf {file-dir}/hadoop-3.1.1.tar.gz  -C /root/environments/ # {file-dir}为存放安装包的目录

2.加入环境变量

vi /etc/profile
#tip:在文件末尾追加
export HADOOP_HOME=/root/environments/hadoop-3.1.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# 使配置文件生效
source /etc/profile

#测试
hadoop version

3.需要编辑的文件都在/root/environments/hadoop-3.1.1/etc/hadoop目录下

<!--vim core-site.xml -->

<configuration>
    <!-- HDFS主入口,mycluster仅是作为集群的逻辑名称,可随意更改但务必与hdfs-site.xml中dfs.nameservices值保持一致 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>

    <!-- 默认的hadoop.tmp.dir指向的是/tmp目录,将导致namenode与datanode数据全都保存在易失目录中,此处进行修改 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/hadoop</value>
    </property>

    <!-- 用户角色配置,不配置此项会导致web页面报错 -->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
    </property>

    <!-- zookeeper集群地址,这里只配置了单台,如是集群以逗号进行分隔 -->
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>

    <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>

    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
</configuration>
vi hadoop-env.sh

export JAVA_HOME=/root/environments/jdk1.8.0_341
export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_ZKFC_USER="root"
export HDFS_JOURNALNODE_USER="root"

hdfs-site.xml
其中还设置hadoop01,hadoop02为NN()

<configuration>
	<property>
       <name>dfs.replication</name>
       <value>2</value>
   </property>
   <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
   </property>
   <!--指定hdfs的nameservice为mycluster,需要和core-site.xml中的保持一致 -->
   <property>
       <name>dfs.nameservices</name>
       <value>mycluster</value>
   </property>
   <!-- mycluster下面有两个NameNode,分别是nn1,nn2 -->
   <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
   </property>
   <!-- RPC通信地址 -->
   <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>hadoop01:8020</value>
   </property>
   <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>hadoop02:8020</value>
   </property>
 <!-- http通信地址 -->
   <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>hadoop01:9870</value>
   </property>
   <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>hadoop02:9870</value>
   </property>
   <!-- 指定NameNode的edits元数据在JournalNode上的存放位置 -->
   <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/mycluster</value>
   </property>
   <!-- 指定JournalNode在本地磁盘存放数据的位置 -->
   <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/data/hadoop/ha-hadoop/journaldata</value>
   </property>
	<!-- 开启NameNode失败自动切换 -->
   <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
   </property>
   <!-- 配置失败自动切换实现方式 -->
   <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
   </property>
   <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
   <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
                sshfence
                shell(/bin/true)
        </value>
   </property>
   <!-- 使用sshfence隔离机制时需要ssh免登陆 -->
   <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
   </property>
   <!-- 配置sshfence隔离机制超时时间 -->
   <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
   </property>
</configuration>

mapred-env.sh

export JAVA_HOME=/root/environments/jdk1.8.0_341

mapred-site.xml

<configuration>
     <!-- 指定mr框架为yarn方式 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <!-- 指定mapreduce jobhistory地址 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop01:10020</value>
    </property>

    <!-- 任务历史服务器的web地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop01:19888</value>
    </property>

    <property>
      <name>mapreduce.application.classpath</name>
      <value>
                /root/environments/hadoop-3.1.1/etc/hadoop,
                /root/environments/hadoop-3.1.1/share/hadoop/common/*,
                /root/environments/hadoop-3.1.1/share/hadoop/common/lib/*,
                /root/environments/hadoop-3.1.1/share/hadoop/hdfs/*,
                /root/environments/hadoop-3.1.1/share/hadoop/hdfs/lib/*,
                /root/environments/hadoop-3.1.1/share/hadoop/mapreduce/*,
                /root/environments/hadoop-3.1.1/share/hadoop/mapreduce/lib/*,
                /root/environments/hadoop-3.1.1/share/hadoop/yarn/*,
                /root/environments/hadoop-3.1.1/share/hadoop/yarn/lib/*
      </value>
    </property>
</configuration>

yarn-env.sh

export JAVA_HOME=/root/environments/jdk1.8.0_341

yarn-site.xml
其中还设置hadoop01,hadoop02为RM

<configuration>
<!-- Site specific YARN configuration properties -->
    <!-- 开启RM高可用 -->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <!-- 指定RM的cluster id -->
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>cluster1</value>
    </property>

    <!-- 指定RM的名字 -->
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>

    <!-- 分别指定RM的地址 -->
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop01</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop02</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>hadoop01:8088</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>hadoop02:8088</value>
    </property>

    <!-- 指定zk集群地址 -->
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>86400</value>
    </property>

    <!-- 启用自动恢复 -->
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>

    <!-- 制定resourcemanager的状态信息存储在zookeeper集群上 -->
    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>

    <!-- Whether virtual memory limits will be enforced for containers.  -->
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>

    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>5</value>
    </property>

</configuration>

workers

hadoop01
hadoop02
hadoop03

hadoop3有权限问题,为避免因权限问题造成的启动失败,在如下文件添加指定用户

vim /root/environments/hadoop-3.1.1/sbin/start-dfs.sh
vim /root/environments/hadoop-3.1.1/sbin/stop-dfs.sh

添加
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs # 已过时系统建议使用 HADOOP_SECURE_DN_USER
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
vim /root/environments/hadoop-3.1.1/sbin/start-yarn.sh
vim /root/environments/hadoop-3.1.1/sbin/stop-yarn.sh

添加
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn # 已过时系统建议使用 HADOOP_SECURE_DN_USER
YARN_NODEMANAGER_USER=root

启动Zookeeper->JournalNode->格式化NameNode->创建命名空间zkfs->NameNode->Datanode->ResourceManager->NodeManager

3台机器上启动JournalNode

3台机器上启动JournalNode
cd /root/environments/hadoop-3.1.1/sbin/
./hadoop-daemon.sh start journalnode  启动journalnode

在hadoop01上执行格式化namenode
同步hadoop02的配置(#不一样 ---------- 01=02≠03

#在hadoop01上执行
hadoop namenode -format
#将/data/hadoop/dfs/name目录下的内容拷贝到备用namenode主机
 
#如果备用namenode主机没有该目录就创建一个
scp -r /data/hadoop/dfs/name hadoop02:/data/hadoop/dfs/name/

格式化zkfc,在两个namenode主机上进行zkfc的格式化(#不一样 ---------- 01=02≠03

./hdfs zkfc -formatZK

关闭JournalNode

#3台机器上关闭JournalNode
cd /root/environments/hadoop-3.1.1/sbin/
./hadoop-daemon.sh stop journalnode

启动hadoop

#在hadoop01机器上执行:
start-all.sh

安装hbase,在hive之前

tar -xzvf hbase-2.0.2-bin.tar.gz -C /root/environments/
 

hbase-env.sh

export JAVA_HOME=/root/environments/jdk1.8.0_341
export HBASE_CLASSPATH=/root/environments/hadoop-3.1.1/etc/hadoop
export HBASE_MANAGES_ZK=false # 使用自己安装的zookeeper。 一定要加这个,不使用自带的zookeeper,否则自己的zookeeper就无法启动了

hbase-site.xml

<configuration>
	<!-- mycluster是根据hdfs-site.xml的dfs.nameservices的value进行配置 -->
	<property>
	        <name>hbase.rootdir</name>
	        <value>hdfs://mycluster/hbase</value>
	</property>
	<property>
	        <name>hbase.master</name>
	        <value>8020</value>
	</property>
	<!-- zookeeper集群 -->
	<property>
	        <name>hbase.zookeeper.quorum</name>
	        <value>hadoop01,hadoop02,hadoop03</value>
	</property>
	<property>
	        <name>hbase.zookeeper.property.clientProt</name>
	        <value>2181</value>
	</property>
	<property>
	        <name>hbase.zookeeper.property.dataDir</name>
	        <value>/root/environments/zookeeper-3.4.6/conf</value>
	</property>
	<property>
	        <name>hbase.tmp.dir</name>
	        <value>/var/hbase/tmp</value>
	</property>
	<property>
	        <name>hbase.cluster.distributed</name>
	        <value>true</value>
	</property>
	<property>
	    <name>hbase.cluster.distributed</name>
	    <value>true</value>
	</property>
	
	<!-- 如果启动不了Hmaster,查看日志报了下面错误:  The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
	     则启用该配置
	<property>
	        <name>hbase.unsafe.stream.capability.enforce</name>
	        <value>false</value>
	</property>
	-->
</configuration>

regionservers

hadoop01
hadoop02
hadoop03

Hbase启动高可用需要编辑文件backup-masters(里面添加备用的HMaster的主机)

vim backup-masters

hadoop03

配置环境变量

export HBASE_HOME=/root/environments/hbase-2.0.2
export PATH=$PATH:$HBASE_HOME/bin
source /etc/profile

拷贝到其他节点

scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
scp -r /root/environments/hbase-2.0.2 hadoop02:/root/environments/
scp -r /root/environments/hbase-2.0.2 hadoop03:/root/environments/

在 HMaster 节点启动,想让谁做HMaster 就在谁上面启动,本例中适合在hadoop01或hadoop02上启动。因为hadoop03是备用HMaster

start-hbase.sh

yarn rmadmin -getAllServiceState
查看http://hadoop03:16010/master-status

安装hive

mysql安装
略

tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /app
mv apache-hive-3.1.2-bin apache-hive-3.1.2

需要编辑的文件都在/root/environments/apache-hive-3.1.0/conf目录下

vi hive-env.sh
export HADOOP_HOME=/root/environments/hadoop-3.1.1
export HIVE_CONF_DIR=/root/environments/apache-hive-3.1.0/conf

hive-site.xml

<configuration>
	<!-- 记录HIve中的元数据信息  记录在mysql中 -->
	<property>
		<name>javax.jdo.option.ConnectionURL</name>
		<value>jdbc:mysql://mysql57:3307/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
	</property>
	
	<!-- jdbc mysql驱动 -->
	<property>
		<name>javax.jdo.option.ConnectionDriverName</name>
		<value>com.mysql.jdbc.Driver</value>
	</property>
	
	<!-- mysql的用户名和密码 -->
	<property>
		<name>javax.jdo.option.ConnectionUserName</name>
		<value>root</value>
	</property>
	<property>
		<name>javax.jdo.option.ConnectionPassword</name>
		<value>123456</value>
	</property>
	
	<property>
		<name>hive.metastore.warehouse.dir</name>
		<value>/user/hive/warehouse</value>
	</property>
	
	<property>
		<name>hive.exec.scratchdir</name>
		<value>/user/hive/tmp</value>
	</property>
	
	<!-- 日志目录 -->
	<property>
		<name>hive.querylog.location</name>
		<value>/user/hive/log</value>
	</property>
	
	<!-- 设置metastore的节点信息 -->
	<!-- 远程模式(Remote), 客户端只连一个远程的metastore服务 -->
	<property>
	  <name>hive.metastore.local</name>
	  <value>false</value>
	</property>
	<property>
		<name>hive.metastore.uris</name>
		<value>thrift://hadoop01:9083</value>
	</property>
	<!-- 客户端远程连接的端口 -->
	<property>
		<name>hive.server2.thrift.port</name>
		<value>10000</value>
	</property>
	<property>
		<name>hive.server2.thrift.bind.host</name>
		<value>0.0.0.0</value>
	</property>
	<property>
		<name>hive.server2.webui.host</name>
		<value>0.0.0.0</value>
	</property>
	
	<!-- hive服务的页面的端口 -->
	<property>
		<name>hive.server2.webui.port</name>
		<value>10002</value>
	</property>
	
	<property>
		<name>hive.server2.long.polling.timeout</name>
		<value>5000</value>
	</property>
	
	<property>
		<name>hive.server2.enable.doAs</name>
		<value>true</value>
	</property>
	
	<property>
		<name>datanucleus.autoCreateSchema</name>
		<value>false</value>
	</property>
	
	<property>
		<name>datanucleus.fixedDatastore</name>
		<value>true</value>
	</property>
	
	<property>
		<name>hive.execution.engine</name>
		<value>mr</value>
	</property>
</configuration>

将mysql的驱动jar包上传到hive的lib目录下
https://mvnrepository.com/artifact/mysql/mysql-connector-java/8.0.20

配置环境变量

export HIVE_HOME=/root/environments/apache-hive-3.1.0
export PATH=$PATH:$HIVE_HOME/bin

刷新

source /etc/profile

初始化hive的元数据库

schematool -dbType mysql -initSchema

启动hive的matestore(重要 不知道为什么依赖hbase,应该是我看错了)

hive --service metastore 
hive --service metastore & #后台启动

使用ps查看metastore服务是否起来

ps -ef | grep metastore # ps -ef表示查看全格式的全部进程。 -e 显示所有进程。-f 全格式。-h 不显示标题。-l 长格式。-w 宽输出

进入hive进行验证

hive
命令: create database filetest;
show databases;
切换filetest数据库:use filetest;

将/app/hive目录进行分发(目的是所有机器都可以使用hive,不需要修改任何配置)

scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
scp -r /root/environments/apache-hive-3.1.0  hadoop02:/root/environments/
scp -r /root/environments/apache-hive-3.1.0  hadoop03:/root/environments/

并刷新

source /etc/profile

安装Kafka

 tar -xzvf kafka_2.12-2.0.0.tgz -C /root/environments/
 #需要编辑的文件都在/app/kafka/config目录下

修改server.properties中的

broker.id=1
zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181

修改zookeeper.properties(未做修改)

dataDir=/home/hadoop/data/zookeeper/zkdata
clientPort=2181

修改consumer.properties(未做修改)

zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181

修改producer.properties(未做修改)

metadata.broker.list=hadoop01:9092,hadoop02:9092,hadoop03:9092

配置环境变量

export KAFKA_HOME=/root/environments/kafka_2.12-2.0.0
export PATH=$PATH:$KAFKA_HOME/bin

刷新

source /etc/profile

将/app/kafka文件分发到其余的机器并修改kafka_2.12-2.0.0/config/server.properties文件中的broker.id的值 (#不一样 ---------- 01≠02≠03

scp /etc/profile hadoop02:/etc/
scp /etc/profile hadoop03:/etc/
scp -r /root/environments/kafka_2.12-2.0.0 hadoop02:/root/environments/
scp -r /root/environments/kafka_2.12-2.0.0 hadoop03:/root/environments/

并刷新

source /etc/profile
vim /root/environments/kafka_2.12-2.0.0/config/server.properties
hadoop02    2
hadoop03    3

kafka 群起脚本

for i in hadoop102 hadoop103 hadoop104
do
echo "========== $i ==========" 
ssh $i '/opt/module/kafka/bin/kafka-server-start.sh -daemon 
/opt/module/kafka/config/server.properties'
done

各自三台机器启动kafka

#3台机器分别启动kafka
后台启动:
kafka-server-start.sh -daemon /root/environments/kafka_2.12-2.0.0/config/server.properties

http://hadoop01:8048

1)查看当前服务器中的所有 topic

kafka-topics.sh --zookeeper hadoop01:2181 --list

2)创建 topic(后面分发部署好集群后会同步消息)

kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK

Alienware^Kafka基础笔记

kafka-topics.sh --zookeeper hadoop01:2181 --create --replication-factor 3 --partitions 1 --topic first # 

选项说明:
–topic 定义 topic 名
–replication-factor 定义副本数
–partitions 定义分区数

3)删除 topic

[root@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --delete --topic first

需要 server.properties 中设置 delete.topic.enable=true 否则只是标记删除。

4)发送消息

[root@hadoop102 kafka]$ bin/kafka-console-producer.sh --broker-list hadoop102:9092 --topic first
>hello world

5)消费消息

[root@hadoop102 kafka]$ bin/kafka-console-consumer.sh --zookeeper hadoop102:2181 --topic first
[root@hadoop102 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic first
[root@hadoop102 kafka]$ bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --from-beginning --topic first
--from-beginning:会把主题中以往所有的数据都读取出来。

6)查看某个 Topic 的详情

[root@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --describe --topic first

7)修改分区数

[root@hadoop102 kafka]$ bin/kafka-topics.sh --zookeeper hadoop102:2181 --alter --topic first --partitions 6

安装solr

解压

 tar -xzvf solr-7.5.0.tgz -C /root/environments/

需要编辑的文件都在/app/solr/bin目录下

solr.in.sh

ZK_HOST="hadoop01:2181,hadoop02:2181,hadoop03:2181"
SOLR_HOST="hadoop01"
export SOLR_HOME=/root/environments/solr-7.5.0
export PATH=$PATH:$SOLR_HOME/bin
source /etc/profile

注:配置环境变量会出大问题!!!!!

# 别人的
16:42:39.035 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: /opt/xxx/solr-6.5.1/server/solr
16:42:39.099 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading solr.xml from SolrHome (not found in ZooKeeper)
16:42:39.100 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading container configuration from /opt/xxx/solr-6.5.1/server/solr/solr.xml
16:42:39.413 INFO  (main) [   ]
# 我的
2022-07-23 10:55:51.469 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: /root/environments/solr-7.5.0
2022-07-23 10:55:51.638 INFO  (zkConnectionManagerCallback-2-thread-1) [   ] o.a.s.c.c.ConnectionManager zkClient has connected
2022-07-23 11:29:34.848 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading solr.xml from SolrHome (not found in ZooKeeper)
2022-07-23 11:29:34.854 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading container configuration from /root/environments/solr-7.5.0/solr.xml
2022-07-23 11:29:34.859 ERROR (main) [   ] o.a.s.s.SolrDispatchFilter Could not start Solr. Check solr/home property and the logs
2022-07-23 11:29:34.903 ERROR (main) [   ] o.a.s.c.SolrCore null:org.apache.solr.common.SolrException: solr.xml does not exist in /root/environments/solr-7.5.0 cannot start Solr
将/root/environments/solr-7.5.0文件分发到其余的机器并修改/root/environments/solr-7.5.0/bin/solr.in.sh文件中的SOLR_HOST的值

scp -r /root/environments/solr-7.5.0 hadoop02:/root/environments/
scp -r /root/environments/solr-7.5.0 hadoop03:/root/environments/

修改/root/environments/solr-7.5.0/bin/solr.in.sh文件中的SOLR_HOST的值(#不一样 ---------- 01≠02≠03

vim /root/environments/solr-7.5.0/bin/solr.in.sh
hadoop02    hadoop02
hadoop03    hadoop03

3台机器分别启动solr

# 一定要到目录执行,不要设置环境变量!!会导致后面的solr.solr.home目录错误“/root/environments/solr-7.5.0/”,变成你设置的环境变量,而对的是/root/environments/solr-7.5.0/server/solr
cd /root/environments/solr-7.5.0/bin
./solr start -force
# 或者
/root/environments/solr-7.5.0/bin/solr start -force
# 查看状态
cd /root/environments/solr-7.5.0/bin
./solr status
# 或者
/root/environments/solr-7.5.0/bin/solr status
# 

下面就成功了

“cloud”:{
“ZooKeeper”:“hadoop01:2181,hadoop02:2181,hadoop03:2181”,
“liveNodes”:“3”,
“collections”:“3”}}
或者访问 http://localhost:8983/solr/ ,有cloud菜单说明集群成功

三、安装atlas

atlas下载地址:https://atlas.apache.org/#/Downloads

# 解压atlas压缩包
tar -zxvf {file-dir}/apache-atlas-2.1.0-sources.tar.gz  -C /root/environments/ # {file-dir}为存放安装包的目录

编辑项目的顶级pom.xml文件,修改各个组件的版本,

# 进入atlas根目录,修改pom.xml文件
cd /root/environments/apache-atlas-sources-2.1.0/
vim pom.xml

主要修改如下安装组件对应版本,由于本此安装均是对照这里定义的版本安装的,因此不做修改

这里是引用需要修改的代码部分(网上资料说需要修改该部分代码,我已修改并成功运行,目前只测试了hive的hook,没有遇到任何问题,不知道不修改会怎样,)

反正我没改这里

vim /root/environments/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java

577行
将:
String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;
改为:
String catalogName = null;
vim /root/environments/apache-atlas-sources-2.1.0/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/AtlasHiveHookContext.java

81行
将:
this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;
改为:
this.metastoreHandler = null;

进行编译

cd /root/environments/apache-atlas-sources-2.1.0/

打包:(使用外部hbase和solr的打包方式,这里不考虑使用atlas自带的)
mvn clean -DskipTests package -Pdist -X

注:编译过程中可能会遇到报错,基本都是因为网络的问题,重试即可解决,如若重试也没有解决jar包的下载问题,可手动下载缺失的jar,放到本地maven仓库后重新打包。

遇到问题一:nodejs下载失败
收到拷贝到下载目录C:\Users\shuch\Downloads\node-12.16.0-linux-x64.tar.gz hadoop01:/root/.m2/repository/com/github/eirslett/node/12.16.0/
问题二:依赖于GitHub上面的包下载失败
设置代理或者修改hosts

# localhost name resolution is handled within DNS itself.# 127.0.0.1 localhost# ::1 localhost20.205.243.166 github.com

# GitHub Start
140.82.114.4 github.com
199.232.69.194 github.global.ssl.fastly.net
199.232.68.133 raw.githubusercontent.com
# GitHub End

编译完成后的atlas存放位置

cd /root/environments/apache-atlas-sources-2.1.0/distro/target

apache-atlas-2.1.0-bin.tar.gz 就是我们所需要的包

解压

 tar -xzvf apache-atlas-2.1.0-bin.tar.gz

需要编辑的文件在/root/environments/apache-atlas-2.1.0/conf

cd /root/environments/apache-atlas-2.1.0/conf

atlas-env.sh

#indicates whether or not a local instance of HBase should be started for Atlas
export MANAGE_LOCAL_HBASE=false

# indicates whether or not a local instance of Solr should be started for Atlas
export MANAGE_LOCAL_SOLR=false

# indicates whether or not cassandra is the embedded backend for Atlas
export MANAGE_EMBEDDED_CASSANDRA=false

# indicates whether or not a local instance of Elasticsearch should be started for Atlas
export MANAGE_LOCAL_ELASTICSEARCH=false
export JAVA_HOME=/root/environments/jdk1.8.0_341
export HBASE_CONF_DIR=/root/environments/hbase-2.0.2/conf

atlas-application.properties (这里给出全部内容,只集成了hive作为测试,如若有其他组件的需要,进行组件的安装与atlas hook的配置即可)

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#########  Graph Database Configs  #########

# Graph Database

#Configures the graph database to use.  Defaults to JanusGraph
#atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase

# Graph Storage
# Set atlas.graph.storage.backend to the correct value for your desired storage
# backend. Possible values:
#
# hbase
# cassandra
# embeddedcassandra - Should only be set by building Atlas with  -Pdist,embedded-cassandra-solr
# berkeleyje
#
# See the configuration documentation for more information about configuring the various  storage backends.
#
atlas.graph.storage.backend=hbase2
atlas.graph.storage.hbase.table=apache_atlas_janus

#Hbase
#For standalone mode , specify localhost
#for distributed mode, specify zookeeper quorum here
atlas.graph.storage.hostname=hadoop01:2181,hadoop02:2181,hadoop03:2181
atlas.graph.storage.hbase.regions-per-server=1
atlas.graph.storage.lock.wait-time=10000

#In order to use Cassandra as a backend, comment out the hbase specific properties above, and uncomment the
#the following properties
#atlas.graph.storage.clustername=
#atlas.graph.storage.port=

# Gremlin Query Optimizer
#
# Enables rewriting gremlin queries to maximize performance. This flag is provided as
# a possible way to work around any defects that are found in the optimizer until they
# are resolved.
#atlas.query.gremlinOptimizerEnabled=true

# Delete handler
#
# This allows the default behavior of doing "soft" deletes to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1 - all deletes are "soft" deletes
# org.apache.atlas.repository.store.graph.v1.HardDeleteHandlerV1 - all deletes are "hard" deletes
#
#atlas.DeleteHandlerV1.impl=org.apache.atlas.repository.store.graph.v1.SoftDeleteHandlerV1

# Entity audit repository
#
# This allows the default behavior of logging entity changes to hbase to be changed.
#
# Allowed Values:
# org.apache.atlas.repository.audit.HBaseBasedAuditRepository - log entity changes to hbase
# org.apache.atlas.repository.audit.CassandraBasedAuditRepository - log entity changes to cassandra
# org.apache.atlas.repository.audit.NoopEntityAuditRepository - disable the audit repository
#
atlas.EntityAuditRepository.impl=org.apache.atlas.repository.audit.HBaseBasedAuditRepository

# if Cassandra is used as a backend for audit from the above property, uncomment and set the following
# properties appropriately. If using the embedded cassandra profile, these properties can remain
# commented out.
# atlas.EntityAuditRepository.keyspace=atlas_audit
# atlas.EntityAuditRepository.replicationFactor=1


# Graph Search Index
atlas.graph.index.search.backend=solr

#Solr
#Solr cloud mode properties
atlas.graph.index.search.solr.mode=cloud
atlas.graph.index.search.solr.zookeeper-url=hadoop01:2181,hadoop02:2181,hadoop03:2181
atlas.graph.index.search.solr.zookeeper-connect-timeout=60000
atlas.graph.index.search.solr.zookeeper-session-timeout=60000
atlas.graph.index.search.solr.wait-searcher=true

#Solr http mode properties
#atlas.graph.index.search.solr.mode=http
#atlas.graph.index.search.solr.http-urls=http://localhost:8983/solr

# ElasticSearch support (Tech Preview)
# Comment out above solr configuration, and uncomment the following two lines. Additionally, make sure the
# hostname field is set to a comma delimited set of elasticsearch master nodes, or an ELB that fronts the masters.
#
# Elasticsearch does not provide authentication out of the box, but does provide an option with the X-Pack product
# https://www.elastic.co/products/x-pack/security
#
# Alternatively, the JanusGraph documentation provides some tips on how to secure Elasticsearch without additional
# plugins: https://docs.janusgraph.org/latest/elasticsearch.html
#atlas.graph.index.search.hostname=localhost
#atlas.graph.index.search.elasticsearch.client-only=true

# Solr-specific configuration property
atlas.graph.index.search.max-result-set-size=150

#########  Import Configs  #########
#atlas.import.temp.directory=/temp/import

#########  Notification Configs  #########
# atlas.notification.embedded=true 使用内嵌的kafka
atlas.notification.embedded=false
atlas.kafka.data=${sys:atlas.home}/data/kafka
atlas.kafka.zookeeper.connect=hadoop01:2181,hadoop02:2181,hadoop03:2181
atlas.kafka.bootstrap.servers=hadoop01:9092,hadoop02:9092,hadoop03:9092
atlas.kafka.zookeeper.session.timeout.ms=400
atlas.kafka.zookeeper.connection.timeout.ms=200
atlas.kafka.zookeeper.sync.time.ms=20
atlas.kafka.auto.commit.interval.ms=1000
atlas.kafka.hook.group.id=atlas

atlas.kafka.enable.auto.commit=true
atlas.kafka.auto.offset.reset=earliest
atlas.kafka.session.timeout.ms=30000
atlas.kafka.offsets.topic.replication.factor=1
atlas.kafka.poll.timeout.ms=1000

atlas.notification.create.topics=true
atlas.notification.replicas=1
atlas.notification.topics=ATLAS_HOOK,ATLAS_ENTITIES
atlas.notification.log.failed.messages=true
atlas.notification.consumer.retry.interval=500
atlas.notification.hook.retry.interval=1000
# Enable for Kerberized Kafka clusters
#atlas.notification.kafka.service.principal=kafka/_HOST@EXAMPLE.COM
#atlas.notification.kafka.keytab.location=/etc/security/keytabs/kafka.service.keytab

## Server port configuration
#atlas.server.http.port=21000
#atlas.server.https.port=21443

#########  Security Properties  #########

# SSL config
atlas.enableTLS=false

#truststore.file=/path/to/truststore.jks
#cert.stores.credential.provider.path=jceks://file/path/to/credentialstore.jceks

#following only required for 2-way SSL
#keystore.file=/path/to/keystore.jks

# Authentication config

atlas.authentication.method.kerberos=false
atlas.authentication.method.file=true

#### ldap.type= LDAP or AD
atlas.authentication.method.ldap.type=none

#### user credentials file
atlas.authentication.method.file.filename=${sys:atlas.home}/conf/users-credentials.properties

### groups from UGI
#atlas.authentication.method.ldap.ugi-groups=true

######## LDAP properties #########
#atlas.authentication.method.ldap.url=ldap://<ldap server url>:389
#atlas.authentication.method.ldap.userDNpattern=uid={0},ou=People,dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchBase=dc=example,dc=com
#atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=Users,dc=example,dc=com)
#atlas.authentication.method.ldap.groupRoleAttribute=cn
#atlas.authentication.method.ldap.base.dn=dc=example,dc=com
#atlas.authentication.method.ldap.bind.dn=cn=Manager,dc=example,dc=com
#atlas.authentication.method.ldap.bind.password=<password>
#atlas.authentication.method.ldap.referral=ignore
#atlas.authentication.method.ldap.user.searchfilter=(uid={0})
#atlas.authentication.method.ldap.default.role=<default role>


######### Active directory properties #######
#atlas.authentication.method.ldap.ad.domain=example.com
#atlas.authentication.method.ldap.ad.url=ldap://<AD server url>:389
#atlas.authentication.method.ldap.ad.base.dn=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.bind.dn=CN=team,CN=Users,DC=example,DC=com
#atlas.authentication.method.ldap.ad.bind.password=<password>
#atlas.authentication.method.ldap.ad.referral=ignore
#atlas.authentication.method.ldap.ad.user.searchfilter=(sAMAccountName={0})
#atlas.authentication.method.ldap.ad.default.role=<default role>

#########  JAAS Configuration ########

#atlas.jaas.KafkaClient.loginModuleName = com.sun.security.auth.module.Krb5LoginModule
#atlas.jaas.KafkaClient.loginModuleControlFlag = required
#atlas.jaas.KafkaClient.option.useKeyTab = true
#atlas.jaas.KafkaClient.option.storeKey = true
#atlas.jaas.KafkaClient.option.serviceName = kafka
#atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
#atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM

#########  Server Properties  #########
atlas.rest.address=http://hadoop01:21000
# If enabled and set to true, this will run setup steps when the server starts
atlas.server.run.setup.on.start=false

#########  Entity Audit Configs  #########
atlas.audit.hbase.tablename=apache_atlas_entity_audit
atlas.audit.zookeeper.session.timeout.ms=1000
atlas.audit.hbase.zookeeper.quorum=hadoop01:2181,hadoop02:2181,hadoop03:2181


#########  High Availability Configuration ########
atlas.server.ha.enabled=false
#### Enabled the configs below as per need if HA is enabled #####
#atlas.server.ids=id1
#atlas.server.address.id1=localhost:21000
#atlas.server.ha.zookeeper.connect=localhost:2181
#atlas.server.ha.zookeeper.retry.sleeptime.ms=1000
#atlas.server.ha.zookeeper.num.retries=3
#atlas.server.ha.zookeeper.session.timeout.ms=20000
## if ACLs need to be set on the created nodes, uncomment these lines and set the values ##
#atlas.server.ha.zookeeper.acl=<scheme>:<id>
#atlas.server.ha.zookeeper.auth=<scheme>:<authinfo>



######### Atlas Authorization #########
atlas.authorizer.impl=simple
atlas.authorizer.simple.authz.policy.file=atlas-simple-authz-policy.json

#########  Type Cache Implementation ########
# A type cache class which implements
# org.apache.atlas.typesystem.types.cache.TypeCache.
# The default implementation is org.apache.atlas.typesystem.types.cache.DefaultTypeCache which is a local in-memory type cache.
#atlas.TypeCache.impl=

#########  Performance Configs  #########
#atlas.graph.storage.lock.retries=10
#atlas.graph.storage.cache.db-cache-time=120000

#########  CSRF Configs  #########
atlas.rest-csrf.enabled=true
atlas.rest-csrf.browser-useragents-regex=^Mozilla.*,^Opera.*,^Chrome.*
atlas.rest-csrf.methods-to-ignore=GET,OPTIONS,HEAD,TRACE
atlas.rest-csrf.custom-header=X-XSRF-HEADER

############ KNOX Configs ################
#atlas.sso.knox.browser.useragent=Mozilla,Chrome,Opera
#atlas.sso.knox.enabled=true
#atlas.sso.knox.providerurl=https://<knox gateway ip>:8443/gateway/knoxsso/api/v1/websso
#atlas.sso.knox.publicKey=

############ Atlas Metric/Stats configs ################
# Format: atlas.metric.query.<key>.<name>
atlas.metric.query.cache.ttlInSecs=900
#atlas.metric.query.general.typeCount=
#atlas.metric.query.general.typeUnusedCount=
#atlas.metric.query.general.entityCount=
#atlas.metric.query.general.tagCount=
#atlas.metric.query.general.entityDeleted=
#
#atlas.metric.query.entity.typeEntities=
#atlas.metric.query.entity.entityTagged=
#
#atlas.metric.query.tags.entityTags=

#########  Compiled Query Cache Configuration  #########

# The size of the compiled query cache.  Older queries will be evicted from the cache
# when we reach the capacity.

#atlas.CompiledQueryCache.capacity=1000

# Allows notifications when items are evicted from the compiled query
# cache because it has become full.  A warning will be issued when
# the specified number of evictions have occurred.  If the eviction
# warning threshold <= 0, no eviction warnings will be issued.

#atlas.CompiledQueryCache.evictionWarningThrottle=0


#########  Full Text Search Configuration  #########

#Set to false to disable full text search.
#atlas.search.fulltext.enable=true

#########  Gremlin Search Configuration  #########

#Set to false to disable gremlin search.
atlas.search.gremlin.enable=false


########## Add http headers ###########

#atlas.headers.Access-Control-Allow-Origin=*
#atlas.headers.Access-Control-Allow-Methods=GET,OPTIONS,HEAD,PUT,POST
#atlas.headers.<headerName>=<headerValue>


#########  UI Configuration ########

atlas.ui.default.version=v1


######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary

集成hbase

注册hook 编辑 hbase-site.xml

vi /root/environments/hbase-2.0.2/conf/hbase-site.xml

添加以下配置

<property>
    <name>hbase.coprocessor.master.classes</name>
    <value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessor</value>
</property>

同步其他节点

scp /root/environments/hbase-2.0.2/conf/hbase-site.xml hadoop02:/root/environments/hbase-2.0.2/conf/
scp /root/environments/hbase-2.0.2/conf/hbase-site.xml hadoop03:/root/environments/hbase-2.0.2/conf/

引入依赖

# 将文件atlas-application.properties压缩进atlas下的hook/hbase/hbase-bridge-shim-2.1.0.jar包里
zip -u /root/environments/apache-atlas-2.1.0/hook/hbase/hbase-bridge-shim-2.1.0.jar  /root/environments/apache-atlas-2.1.0/conf/atlas-application.properties

# 然后将atlas的hook/hbase/* 拷贝至所有节点安装的hbase的lib目录下
cp  -r /root/environments/apache-atlas-2.1.0/hook/hbase/* /root/environments/hbase-2.0.2/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hbase/* hadoop02:/root/environments/hbase-2.0.2/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hbase/* hadoop03:/root/environments/hbase-2.0.2/lib/

引入配置
atlas-application.properties文件添加配置

vi /root/environments/apache-atlas-2.1.0/conf/atlas-application.properties
######### hbase Hook Configs #######
atlas.hook.hbase.synchronous=false 
atlas.hook.hbase.numRetries=3 
atlas.hook.hbase.queueSize=10000 

然后将atlas-application.properties文件拷贝到hbase/conf/

# 然后将atlas-application.properties拷贝至所有节点安装的hbase的conf目录下,一行一行地运行,不要全部复制,会出问题!!!
cd  /root/environments/apache-atlas-2.1.0/conf/ # !!不要忘了进到这个目录
cp  ./atlas-application.properties /root/environments/hbase-2.0.2/conf/
scp ./atlas-application.properties hadoop02:/root/environments/hbase-2.0.2/conf/
scp ./atlas-application.properties hadoop03:/root/environments/hbase-2.0.2/conf/
# 编辑atlas属性文件
vi atlas-application.properties

# 修改atlas存储数据主机
atlas.graph.storage.hostname=hadoop01:2181,hadoop02:2181,hadoop03:2181

# 建立软连接
ln -s /root/environments/hbase-2.0.2/conf/ /root/environments/apache-atlas-2.1.0/conf/hbase/
cp /root/environments/hbase-2.0.2/conf/* /root/environments/apache-atlas-2.1.0/conf/hbase/ # 看不懂这操作

# 添加HBase配置文件路径
vi /root/environments/apache-atlas-2.1.0/conf/atlas-env.sh

export HBASE_CONF_DIR=/root/environments/hbase-2.0.2/conf

集成solr

cp  -r /root/environments/apache-atlas-2.1.0/conf/solr  /root/environments/solr-7.5.0/
cd /root/environments/solr-7.5.0/
mv solr/  atlas-solr
scp -r ./atlas-solr/  hadoop02:/root/environments/solr-7.5.0/
scp -r ./atlas-solr/  hadoop03:/root/environments/solr-7.5.0/


# 重启solr
./solr stop -force
./solr start -force

# 查看状态
./solr status
# 或者访问 http://localhost:8983/solr/ ,有cloud菜单说明集群成功

在solr中创建索引
./solr create -c vertex_index -d /root/environments/solr-7.5.0/atlas-solr/ -shards 3 -replicationFactor 2 -force
./solr create -c edge_index -d /root/environments/solr-7.5.0/atlas-solr/ -shards 3 -replicationFactor 2 -force
./solr create -c fulltext_index -d /root/environments/solr-7.5.0/atlas-solr/ -shards 3 -replicationFactor 2 -force

如果以上创建错误,可以使用命令“solr delete -c ${collection_name}”删除重新创建。

kafka相关操作

在kafka中创建相关topic
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics.sh --zookeeper hadoop01:2181,hadoop02:2181,hadoop03:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK

集成hive

# 将文件atlas-application.properties压缩进atlas下的hook/hive/hive-bridge-shim-2.1.0.jar包里
zip -u /root/environments/apache-atlas-2.1.0/hook/hive/hive-bridge-shim-2.1.0.jar  /root/environments/apache-atlas-2.1.0/conf/atlas-application.properties

# 然后将atlas的hook/hive/* 拷贝至所有节点安装的hive的lib目录下
cp  -r /root/environments/apache-atlas-2.1.0/hook/hive/* /root/environments/apache-hive-3.1.0/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hive/* hadoop02:/root/environments/apache-hive-3.1.0/lib/
scp -r /root/environments/apache-atlas-2.1.0/hook/hive/* hadoop03:/root/environments/apache-hive-3.1.0/lib/
# 然后将atlas-application.properties拷贝至所有节点安装的hive的conf目录下,一行一行地运行,不要全部复制,会出问题!!!
cd  /root/environments/apache-atlas-2.1.0/conf/ # !!不要忘了进到这个目录
cp  ./atlas-application.properties /root/environments/apache-hive-3.1.0/conf/
scp ./atlas-application.properties hadoop02:/root/environments/apache-hive-3.1.0/conf/
scp ./atlas-application.properties hadoop03:/root/environments/apache-hive-3.1.0/conf/

hive相关配置

#3台机器均需要配置
cd /root/environments/apache-hive-3.1.0/conf/

#hive-env.sh中添加
export JAVA_HOME=/root/environments/jdk1.8.0_341
export HIVE_AUX_JARS_PATH=/root/environments/apache-hive-3.1.0/lib/

#hive-site.xml中添加:
<property>
      <name>hive.exec.post.hooks</name>
      <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>

启动atlas

cd /root/environments/apache-atlas-2.1.0/bin
./atlas_start.py

说明:第一次启动atlas需要经过漫长的等待,即使显示启动完成了也需要等待一段时间才能访问atlas web ui
可以在/app/atlas/logs目录下进行日志的查看以及报错情况

启动完成后导入hive元数据

cd /root/environments/apache-atlas-2.1.0/bin
./import-hive.sh

导入hbase数据

/root/environments/apache-atlas-2.1.0/hook-bin/import-hbase.sh

----------------------恭喜------------------Error报错!!!!------------------------------------

org.apache.atlas.AtlasException: Failed to load application properties
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:147)
at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:100)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:123)
Caused by: org.apache.commons.configuration.ConversionException: 
'atlas.graph.index.search.solr.wait-searcher' doesn't map to a List object: true, a java.lang.Boolean

解释: 这个问题主要是由于hbase使用的commons-configuration包是1.6的,而atlas使用的是1.10的,函数返回类型不一致起了冲突。

  • apache-atlas-2.1.0/hook/hbase/atlas-hbase-plugin-impl/commons-configuration-1.10.jar
  • hbase-2.0.2/lib/commons-configuration-1.6.jar

解决办法:

  • 方法一: 在import-hbase.sh脚本中调整一下CP的加载顺序,将atlas调整在最前,这样根据jvm的类加载的最先机制,就可以优先使用atlas hive-hook中的版本,同时还不会影响hive自己的版本。

    # 将import-hbase.sh中的ATLASCPPATH调整在最前
    vi  /root/environments/apache-atlas-2.1.0/hook-bin/import-hbase.sh
    

    将import-hbase.sh文件的 CP 变量改为如下

    CP="${ATLASCPPATH}:${HIVE_CP}:${HADOOP_CP}"
    

    不好意思,失败了! 虽然这个异常解决了,但是又出现了新的异常,出现了NoClassDefFoundError: com/fasterxml/jackson/core/exc/InputCoercion 。后续还会出现很多依赖找不到。

  • 方法二: 使用atlas的1.10包替换hbase自带的1.6包,操作步骤如下:

    #删除hbase的commons-configuration-1.6,
    #拷贝atlas下的1.10到hbase的lib下
    cd /root/environments/hbase-2.0.2/lib 
    rm -f commons-configuration-1.6.jar 
    cp /root/environments/apache-atlas-2.1.0/hook/hbase/atlas-hbase-plugin-impl/commons-configuration-1.10.jar /root/environments/hbase-2.0.2/lib
    

    注:这里解决办法只需要处理hadoop01机器,因为另外两个节点机器不需要执行这个导入,也没有安装atlas。

完成后就可查看正常的血缘关系了
http://hadoop01:21000

完结撒花!!!

二、Docker镜像启动


1. 加载镜像

# 进入到镜像文件路径,运行:
docker load -i mysql-5.7.tar # 加载hadoop01节点的镜像
docker load -i hadoop01-1.0.tar # 加载hadoop01节点的镜像
docker load -i hadoop02-1.0.tar # 加载hadoop02节点的镜像
docker load -i hadoop03-1.0.tar # 加载hadoop03节点的镜像

2. 创建容器

# 创建网络
docker network create -d bridge --subnet 192.168.0.0/24 --gateway 192.168.0.1 network_hadoop 
# 创建mysql容器
docker run -dit --name mysql5.7 -p 3306:3306 --hostname mysql57 --net network_hadoop --ip 192.168.0.2  -e MYSQL_ROOT_PASSWORD="123456" mysql:5.7 
# 创建节点容器
docker run -dit --name hadoop01 --privileged --hostname hadoop01 --net network_hadoop --ip 192.168.0.11 --add-host mysql57:192.168.0.1 --add-host hadoop02:192.168.0.12 --add-host hadoop03:192.168.0.13 -p 8042:8042 -p 8088:8088 -p 9870:9870 -p 9864:9864 -p 10002:10002 -p 16010:16010 -p 16000:16000 -p 8048:8048 -p 8983:8983 -p 21000:21000 -p 9868:9868 -p 10000:10000 -p 2181:2181 -p 9092:9092 hadoop01:1.0 /usr/sbin/init
docker run -dit --name hadoop02 --privileged --hostname hadoop02 --net network_hadoop --ip 192.168.0.12 --add-host mysql57:192.168.0.1 --add-host hadoop01:192.168.0.11 --add-host hadoop03:192.168.0.13 hadoop02:1.0 /usr/sbin/init
docker run -dit --name hadoop03 --privileged --hostname hadoop03 --net network_hadoop --ip 192.168.0.13 --add-host mysql57:192.168.0.1 --add-host hadoop01:192.168.0.11 --add-host hadoop02:192.168.0.12 hadoop03:1.0 /usr/sbin/init

3. 快速启动

注意:

  • hadoop01 & hadoop02 & hadoop03& 指都要启动。
  • hadoop01 | hadoop02 | hadoop03| 指启动任意一个或多个
  • hadoop01 ⊕ hadoop02 ⊕ hadoop03 指启动其中一个

(1)启动 Zookeeper hadoop01 & hadoop02 & hadoop03

zkServer.sh start # 启动zkServer,多台会自动集群,因此至少在两台机器启动

(2)启动 Hadoop hadoop01 ⊕ hadoop02

start-all.sh #启动hadoop集群,只需在集群主节点启动即可

(3)启动 Hive hadoop01 | hadoop02 | hadoop03

  1. 初始化hive元数据(首次安装hive或mysql才需要!!
    schematool -dbType mysql -initSchema # ,存mysql,一台机器运行就够了!!!!
    
  2. 启动hive元数据映射服务(后续启动只用开这个就行
    hive --service metastore & #后台启动单台机器hive元数据服务,一定要加 “&”
    hiveserver2 & #启动hiveserver2,支持JDBC和WebUI
    

注:schematool -dbType mysql -initSchema 初始化hive元数据(首次启动才需要或者mysql被重置了) ! !

(4)启动 Hbase hadoop01 ⊕ hadoop02

 start-hbase.sh # 在哪个节点启动那个就成为HMaster节点执。
 # 本例中hadoop03是备用HMaster,启动后将有两个HMaster节点。如果从hadoop03启动,就只有一个HMaster。

(5)启动 Kafka hadoop01 & hadoop02 & hadoop03

kafka-server-start.sh -daemon /root/environments/kafka_2.12-2.0.0/config/server.properties  

(6)启动 Solr hadoop01 & hadoop02 & hadoop03

/root/environments/solr-7.5.0/bin/solr start -force

(7)启动 Atlas hadoop01

/root/environments/apache-atlas-2.1.0/bin/atlas_start.py

(8)批量导入元数据(可选) hadoop01

# 导入hive元数据
/root/environments/apache-atlas-2.1.0/bin/import-hive.sh
# 导入hbase元数据
/root/environments/apache-atlas-2.1.0/hook-bin/import-hbase.sh

4. 访问端口

WEB UI访问地址作用
haddop: Node UI8042
haddop: YARN UI8088yarn的管理界面,查看hadoop集群信息
haddop: HDFS NN UI9870
haddop: DataNode UI9864
hiveserver2: webui10002
hbase16010,16000使用16010访问!
kafka eagle(未安装)8048
sorl8983
atlas21000
SecondaryNameNode
(高可用集群下未使用)
9868
Web Server连接端口作用
hdfs9000
hiveserver2: server10000支持JDBC
zookeeper2181
kafka9092

注:

  1. NN UI:访问时可能会重定向到ResourceManager处于活动状态的主机,想访问要么映射活动主机的端口;要么手动杀死活动主机的RM,使RM自动切换到本例映射端口的主机上 (推荐),查询状态命令: yarn rmadmin -getAllServiceState。也可以使用命令hdfs haadmin -failover -forcefence -forceactive nn2 nn1切换,但是必须将dfs.ha.automatic-failover.enabled的配置改为false
  2. 端口超链接主机名为docker,请通过docker machine的ip进行访问,或者在windows hosts文件中添加docker machine IP 到 ‘docker’ 的映射。
    在这里插入图片描述

附录

1. 常用命令集合

(1)通过进程名称找到它所占用的端口:

# 法一,立即推不好用
netstat -anp | grep hadoop	# 查hadoop相关进程的端口号
#[root@hadoop01 /] netstat -anp | grep hadoop
#[root@hadoop01 /]# 	              #毛也没有查到

# 法二,先查进程ID,再根据进程ID查端口。大智慧啊~
ps -ef | grep hadoop	# 查出进程ID 2419  
# [root@hadoop01 /]ps -ef | grep hadoop
# root  2419  2405 11 18:22 pts/1  00:02:14 /root/hadoop/bin/...
netstat -anp | grep 2419  # 端口16000,16010
# [root@hadoop01 /]# netstat -anp | grep 2419
# tcp        0      0 192.168.0.11:16000    0.0.0.0:*               LISTEN      2419/java
# tcp        0      0 0.0.0.0:16010           0.0.0.0:*               LISTEN      2419/java

(2)通过端口找到占用它的进程名称:

netstat -anp | grep 3690            ----->查到进程名为svnserver
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值