使用Docker搭建Hadoop

在Docker中下载镜像

docker pull ubuntu 

设置挂载点

如果需要共享文件的话在创建容器的时候设置挂载点,否则就要重新搭一遍环境,血淋淋的教训啊

docker run --name ContainerName -it -v SharePath:/data ubuntu /bin/bash  

ContainerName是容器名,SharePath是共享的挂载点,/data是容器里共享文件的路径, /bin/bash用命令行的形式创建容器

更新软件源

在命令行输入

apt-get update  
apt-get install vim  
vim /etc/apt/sources.list  

在文件的最后追加:

deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ bionic main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-proposed main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse  

在命令行输入:

apt-get update  
apt-get upgrade 

安装SSH

在命令行输入

apt-get install openssh-server

此时SSH服务还未开启

service ssh start  

输入ssh localhost,若是无法登陆修改sshd_config文件

vim /etc/ssh/sshd_config

在文件中找到

#Authentication:  
#LoginGraceTime 120  
#PermitRootLogin without-password  
#StrictModes yes  

修改为下面内容

#Authentication:  
#LoginGraceTime 120  
#PermitRootLogin without-password
PermitRootLogin yes   
#StrictModes yes     

设置无密码登录

ssh-keygen -t rsa  

一直按回车就可以了,然后把公钥加入到授权中:

cat ./id_rsa.pub >> ./authorized_keys  

使用以下命令就可以无密码登录到主机:

ssh localhost  

安装JDK

方法一:

安装OpenJDK8,可能会出现找不到环境变量的问题。而且OpenJDK是精简版的JDK,可能会出现问题。

apt-get install openjdk-8-jdk  

配置Java环境变量

vim /etc/profile  

在文件尾追加

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64  
export JRE_HOME=$JAVA_HOME/jre  
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH  
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH  

验证是否安装成功

java -version  
openjdk version "1.8.0_232"  
OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09)  
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

方法二:

安装Orcal JDK

将JDK文件放在开头提及的文件夹里,在容器里解压到/usr/lib/jdk,并按照步骤一的方法配置环境变量,记得修改路径。另外lib文件下的路径需要自己创建文件夹

安装Hadoop

首先安装links2,方便在命令行打开网页

apt-get install links2

打开Hadoop网站,下载需要的版本并安装

links https://mirrors.cnnic.cn/apache/hadoop/common/  
tar -zxf /hadoop-3.2.1.tar.gz -C /usr/local  
mv ./hadoop-3.2.1/ ./hadoop

验证安装是否成功

/usr/local/hadoop/bin/hadoop version  
Hadoop 3.2.1  
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r   b3cbbb467e22ea829b3808f4b7b01d07e0bf3842  
Compiled by rohithsharmaks on 2019-09-10T15:56Z  
Compiled with protoc 2.5.0  
From source with checksum 776eaf9eee9c0ffc370bcbc1888737  
This command was run using /usr/local/hadoop/share/hadoop/common/  hadoop-common-3.2.1.jar  

以伪分布式配置Hadoop

伪分布式需要修改2个配置文件core-site.xml和hdfs-site.xml

cd /usr/local/hadoop/etc/hadoop  
vim core-site.xml 
**加入下列信息:**
<configuration>  
    <property>  
            <name>hadoop.tmp.dir</name>  
            <value>/usr/local/hadoop/tmp</value>  
            <description>Abase for other temporary directories.</description>  
    </property>
    <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
    </property>
</configuration>
	    ```
```shell	    
vim hdfs-site.xml

     <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/usr/local/hadoop/tmp/dfs/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/usr/local/hadoop/tmp/dfs/data</value>
        </property>

切换回hadoop主目录:/usr/local/hadoop

NameNode的格式化:

./bin/hdfs namenode -format  

出现以下信息即格式化成功:

    2020-01-02 12:30:07,439 INFO namenode.NameNode: SHUTDOWN_MSG:  
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at 08c0610a5e3d/172.17.0.3
    ************************************************************/

开启NameNode和DataNode守护进程

vim ./sbin/start-dfs.sh  

加入如下信息:  

   #!/usr/bin/env bash  
    HDFS_DATANODE_USER=root  
    HADOOP_SECURE_DN_USER=hdfs  
    HDFS_NAMENODE_USER=root  
    HDFS_SECONDARYNAMENODE_USER=root  

vim ./sbin/stop-dfs.sh  
加入如下信息:  
    \#!/usr/bin/env bash  
    YARN_RESOURCEMANAGER_USER=root  
    HADOOP_SECURE_DN_USER=yarn  
    YARN_NODEMANAGER_USER=root   

./sbin/start-dfs.sh

若是报错找不到JAVA_HOME,则再手动配置环境变量

    # Technically, the only required environment variable is JAVA_HOME.
    # All others are optional.  However, the defaults are probably not
    # preferred.  Many sites configure these options outside of Hadoop,
    # such as in /etc/profile.d
    # The java implementation to use. By default, this environment
    # variable is REQUIRED on ALL platforms except OS X!
    # export JAVA_HOME=
    export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_231 

出现如下则成功:

Starting namenodes on [localhost]  
Starting datanodes  
Starting secondary namenodes [08c0610a5e3d]  

将Hadoop加入到环境变量

vim /etc/bash.bashrc   
export HADOOP_HOME=/usr/local/hadoop  
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
source /etc/bash.bashrc

Hadoop的使用

创建输入文件夹

./bin/hdfs dfs -mkdir -p /user/hadoop  
./bin/hdfs dfs -mkdir /user/hadoop/input
./bin/hdfs dfs -put /home/BigData/mag_papers_0/mag_papers_0.txt /user/hadoop/input  

执行

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcont /user/hadoop/input output 

查看输出结果

./bin/hdfs dfs -cat output/*`
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值