docker 安装hadoop,hive,spark,hbase

0:网络和主机规划
docker network create --subnet=172.18.0.0/16 mynetwork
主机规划
"172.18.0.30 master" 
"172.18.0.31 slave1" 
"172.18.0.32 slave2" 
   
1:安装基础环境


docker pull ubuntu:16.04
docker run -it  ubuntu:16.04 /bin/bash
apt-get  安装ssh服务  mysql   open-jdk 8
确定java HOME:
root@master:/# ls -lrt /usr/bin/java
lrwxrwxrwx 1 root root 22 Jun 23 08:28 /usr/bin/java -> /etc/alternatives/java
root@master:/# ls -lrt /etc/alternatives/java
lrwxrwxrwx 1 root root 46 Jun 23 08:28 /etc/alternatives/java -> /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
root@master:/# 


JAVA home 为:/usr/lib/jvm/java-8-openjdk-amd64


2:下载大数据安装包
wget http://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
wget http://archive.apache.org/dist/hive/hive-2.1.0/apache-hive-2.1.0-bin.tar.gz
wget http://archive.apache.org/dist/hbase/1.2.4/hbase-1.2.4-bin.tar.gz
wget  http://archive.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz
wget http://downloads.lightbend.com/scala/2.12.1/scala-2.12.1.tgz


解压到 /opt/tools 目录下;
建立五个软连接
140  ln -s hbase-1.2.4 hbase
  141  ln -s hadoop-2.7.2 hadoop
  142  ln -s apache-hive-2.1.0-bin hive
  143  ln -s spark-2.1.0-bin-hadoop2.7 spark
  144  ln -s scala-2.12.1 scala
  
修改环境变量文件,最后一行增加:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_PREFIX=/opt/tools/hadoop
export HADOOP_COMMON_HOME=/opt/tools/hadoop
export HADOOP_HDFS_HOME=/opt/tools/hadoop
export HADOOP_MAPRED_HOME=/opt/tools/hadoop
export HADOOP_YARN_HOME=/opt/tools/hadoop
export HADOOP_CONF_DIR=/opt/tools/hadoop/etc/hadoop
export YARN_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
export SCALA_HOME=/opt/tools/scala
export PATH=${SCALA_HOME}/bin:$PATH
export SPARK_HOME=/opt/tools/spark
export PATH="$SPARK_HOME/bin:$PATH"


export HIVE_HOME=/opt/tools/hive
export PATH=$PATH:$HIVE_HOME/bin


export HBASE_HOME=/opt/tools/hbase
export PATH=$PATH:$HBASE_HOME/bin


/etc/init.d/ssh start
/etc/init.d/mysql start
echo "172.18.0.30 master" >> /etc/hosts
echo "172.18.0.31 slave1" >> /etc/hosts
echo "172.18.0.32 slave2" >> /etc/hosts  
  
3:打基础大数据image
docker commit 8eb631a1a734 wx-bigdata-base


4:运行三个容器
docker run -i -t --name master -h master --net mynetwork  --ip 172.18.0.30    wx-bigdata-base   /bin/bash
docker run -i -t --name slave1 -h slave1 --net mynetwork  --ip 172.18.0.31    wx-bigdata-base   /bin/bash
docker run -i -t --name slave2 -h slave2 --net mynetwork  --ip 172.18.0.32    wx-bigdata-base   /bin/bash


打通三台机器的免ssh 登录。

方法:ssh-keygen -t rsa,一路会车,生成/root/.ssh/id_rsa.pub ;

三台机器做同样操作。

在三台机器上执行cat /root/.ssh/id_rsa.pub   ,把三个输出结果合并,写入到 /root/.ssh/authorized_keys 

最终每台机器的authorized_keys  文件为:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAyxS5rhm5etpm1eOSdBfaVKmRQPMI2TgY3PsUMWe1qo1NQAdNkpObSVN4Gq4HHso7SXLccd5Crb64fGrqYX9+jBVk3uUSQoKn8eoFtmnBU5Zpq7mRvGkctsubMa/EOh7DsjUWplo//p9+txvB45cvjwr8GSeBVPoTSyzRggleuERVVhRzDSXdg/z892JNoHukhGUrhOhtBnVemIV0wUlEoWFiuLJmJBo6Gj1yV7xJ5LDtWJ41XgkosKlKbEp8bc+w0e6NYN5k/DzaDtwfVc6utGE/7/mFs4gpWGzY0wRqP89QRnmlOYGm32v1I8+oXNqAmxfPKiWQdZ89jgZUS5RB root@master
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbHbwIO5zzzNBJX25rbIdUI0+fqA3YJIhcgqbY2cQxSfa1dK20Uy/JD3ZTlffajEJ20qrs3yDpzfRHP8E+0dyPET3CV7I7onzCy8eBOQSaYBqtWXiEvwzE8iOD4aJJ4ZA3G8dhE8jlSFphO62PoqblEpIfWgFS1WkLEmNMrqgyEUCwiwzxySs6StBQF1vQ4TT2rcG5+qXWOuKjeOjscekstA2DrYNBY8zOEP/kNF4tUPf7mf2uiMJCHg+keXP9b0aCDMvVqakMx4PJW36NYISQiKf6yvSt1RFTGY+SYMG2d4Ysx58iNTrk7ber2qwDBghgtcJhr2VvZbLC9xv2w4WN root@slave1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDg8FVvLhkeT1/xMA/fTbzk9k0cf+5AX514z9Pw8A78ofWDir65eMJBEqLTSX87ynTvtg2BEN4Ht+SlS7ZUrzW3wbUPZw9T045GbiFzSRdwzCAuyXUWAFa+pY3Pi4MJhL1zjwkfX8WzRlUM+a5PSJ+B3i/JnoKMUin0HmjQ1XxIwMeG66b7pxXRAs/9SVY7k+f0zACJzTBN3eD9tKEpujrJmjlOYLg4M17NssGNK9vE5nAkCCv86GCRixyS8FNAxh0a8GsezUjimT1XRWokw9FSZdDuAamVCREZ3j6LuveCx58XzoM8UQ6u4KtObeWOPbJCotxyKR5SdFEgsSjrOJYP root@slave2 


5:安装和运行
修改master 的hadoop hbase spark hive 的配置文件。
scp 拷贝hadoop hbase spark hive 配置文件到另外2台主机。
hadoop 格式化文件空间( hdfs namenode -format )。
在master 上分别启动hadoop,hbase,spark;检查另外2个机器上是否启动正常。

master 上 安装hive:schematool -dbType mysql -initSchema

hadoop 启动方式:

/opt/tools/hadoop/sbin# ./start-all.sh  

停止方式:

/opt/tools/hadoop/sbin# ./stop-all.sh  

hbase 启动方式:

/opt/tools/hbase/bin/start-hbase.sh

停止方式:

/opt/tools/hbase/bin/stop-hbase.sh

spark 启动方式:

/opt/tools/spark/sbin/start-all.sh

停止方式:

/opt/tools/spark/sbin/stop-all.sh


6:配置文件

6.1 hadoop

目录:/opt/tools/hadoop/etc/hadoop

core-site.xml 文件

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/data/hadoop/tmp</value>
  </property>

</configuration>

hdfs-site.xml 文件

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/data/hadoop/data</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/data/hadoop/name</value>
  </property>
  <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>master:9001</value>
  </property>

</configuration>

mapred-site.xml 文件

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>

</configuration>

yarn-site.xml 文件

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>


<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:8088</value>
    </property>


</configuration>


slaves文件:

slave1

slave2

hadoop-env.sh 文件:

修改java home为:export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

6.2 hbase

目录位置:/opt/tools/hbase/conf

hbase-env.sh 文件:

修改javahome:export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

hbase-site.xml文件:

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
      <property>
        <name>hbase.rootdir</name>
        <value>hdfs://master:9000/hbase_db</value>


      </property>
      <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
      </property>
      <property>
        <name>hbase.zookeeper.quorum</name> 
        <value>master,slave1,slave2</value> 
      </property>


       <property>
          <name>hbase.zookeeper.property.dataDir</name>
          <value>/opt/tools/hbase/zookeeper</value>
       </property>

</configuration>

regionservers 文件:

slave1

slave2

6.3 spark

目录位置:spark-env.sh

文件开头位置增加:

export SCALA_HOME=/opt/tools/scala
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/opt/tools/hadoop/etc/hadoop


slaves文件:

master
slave1

slave2

6.4 hive

目录位置:/opt/tools/hive/conf

hive-env.sh 文件:

行首增加:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/opt/tools/hadoop
export HIVE_HOME=/opt/tools/hive

export HIVE_CONF_DIR=/opt/tools/hive/conf 

hive-site.xml文件:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at


       http://www.apache.org/licenses/LICENSE-2.0


   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--><configuration>
        <property>
            <name>hive.exec.scratchdir</name>
            <value>/opt/tools/hive/tmp</value>
        </property>
        <property>
            <name>hive.metastore.warehouse.dir</name>
            <value>/opt/tools/hive/warehouse</value>
        </property>
        <property>
            <name>hive.querylog.location</name>
            <value>/opt/tools/hive/log</value>
        </property>


        <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>root</value>
  </property>

</configuration>


  • 2
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Docker是一种容器化技术,用于创建、部署和管理应用程序的容器。Hadoop是一个分布式计算框架,用于处理大数据集并在集群中进行高速计算。HBase是一个面向列的分布式数据库,用于存储和管理大规模的结构化数据。Hive是一个基于Hadoop的数据仓库基础架构,用于提供简化的查询和分析大数据的能力。而Spark是一个高级的分布式计算系统,用于加速大规模数据处理和分析。 在使用这些技术时,Docker可以用于快速搭建和部署容器化的HadoopHBaseHiveSpark环境。使用Docker容器,我们可以轻松地在任何机器上部署这些组件,而无需担心环境配置的问题。 Hadoop是一个开源的分布式计算框架,它可以容纳大规模数据并以可靠的方式在集群中进行处理。通过用Docker容器来运行Hadoop,我们可以更快地搭建和管理Hadoop集群,并且容易进行监控和维护。 HBase是一个分布式数据库系统,以表的形式存储数据,并提供高效的读写操作。通过Docker容器,我们可以轻松地部署HBase集群,并且可以根据需求进行水平扩展,以满足不同规模的数据存储需求。 Hive是一个基于Hadoop的数据仓库基础架构,它提供了类似于SQL的查询接口,方便用户进行大规模数据的查询和分析。使用Docker容器,我们可以轻松地搭建Hive环境,并通过对容器进行配置和管理,优化Hive的性能。 Spark是一个高级的分布式计算系统,它强调内存计算和迭代计算的能力,从而加速大规模数据处理和分析。通过Docker容器,我们可以快速部署和管理Spark集群,并且可以根据需求进行资源配置和任务调度,以实现高性能和高吞吐量的数据处理。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值