hadoop环境的配置

1、所需软件
  • 所需要的环境包括javassh必须保证sshd一直运行, 以便用Hadoop脚本管理远端Hadoop守护进程

Windows下的附加软件需求

  • Cygwin 提供上述软件之外的shell支持。
2、安装软件
sudo apt-get install ssh
sudo apt-get install rsync
  • 由于hadoop是基于java编写的,因此需要安装jdk
3、下载安装

参考资料:https://www.jianshu.com/p/cdae5bab030f

  • 为了获取Hadoop的发行版,从Apache的某个镜像服务器上下载最近的 稳定发行版
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/hadoop-3.3.0.tar.gz
tar -xvf hadoop-3.3.0.tar.gz -C /usr/local
cd /usr/local
mv hadoop-3.3.0 hadoop
  • hadoop配置环境变量
vim /etc/profile

结合之前安装的jdk1.8在文件末尾添加如下内容

export JAVA_HOME=/usr/local/jdk1.8
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source /etc/profile

测试是否安装成功

hadoop version
root@iZuf63fv674pbylkkxs48qZ:/usr/local# hadoop version
Hadoop 3.3.0
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r aa96f1871bfd858f9bac59cf2a81ec470da649af
Compiled by brahma on 2020-07-06T18:44Z
Compiled with protoc 3.7.1
From source with checksum 5dc29b802d6ccd77b262ef9d04d19c4
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar
root@iZuf63fv674pbylkkxs48qZ:/usr/local# 

4、修改配置文件
sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml

添加如下内容

<configuration>
        <property>
             <name>hadoop.tmp.dir</name>
             <value>file:/usr/local/hadoop/tmp</value>
             <description>Abase for other temporary directories.</description>
        </property>
        <property>
             <name>fs.defaultFS</name>
             <value>hdfs://localhost:9000</value>
        </property>
</configuration>

说明以上配置文件的内容

<!-- 指定HDFS老大(namenode)的通信地址 -->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://0.0.0.0:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储路径 -->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop/tmp</value>
 </property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop/tmp</value>
</property>
  • 也是在相同的路径下,修改hdfs-site.xml中添加

<configuration>
	<property>
		<name>dfs.data.dir</name>
		<value>/usr/local/hadoop/hdfs/data</value>
		<description>datanode上数据块的物理存储位置</description>
	</property>

	<!-- 设置hdfs副本数量 -->
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>

	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
</configuration>

  • hadoop-env.sh中更改JAVA-HOME,注释掉export JAVA_HOME=${JAVA_HOME}

    添加修改为

    export JAVA_HOME=/usr/local/jdk1.8
    
5、测试,启动

下面的操作均是在Hadoop的安装路径下

/usr/local/hadoop
  • 格式化namenode

    /usr/local/hadoop# ./bin/hdfs namenode -format
    
    
    
  • 启动hdfs。 开启NameNodeDataNode守护进程

  • 如果报错如下

root@iZuf63fv674pbylkkxs48qZ:/usr/local/hadoop# ./sbin/start-dfs.sh
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [iZuf63fv674pbylkkxs48qZ]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

解决方案

/hadoop/sbin路径下

start-dfs.shstop-dfs.sh文件的顶部添加:

HDFS_DATANODE_USER=root
HADOOP_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

start-yarn.shstop-yarn.sh顶部添加:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

sbin目录下重新执行./start-all.sh,即可启动

  • 重启的过程中,遇到下面的问题
root@iZuf63fv674pbylkkxs48qZ:/usr/local/hadoop/sbin# sudo ./start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [localhost]
localhost: root@localhost: Permission denied (publickey,password).
Starting datanodes
localhost: root@localhost: Permission denied (publickey,password).
Starting secondary namenodes [iZuf63fv674pbylkkxs48qZ]
iZuf63fv674pbylkkxs48qZ: root@izuf63fv674pbylkkxs48qz: Permission denied (publickey,password).

5.1 在虚拟机上搭建Hadoop集群
  • 在安装的时候,为每一台主机配置静态ip。配置静态ip的自我配置教程如下

http://note.youdao.com/s/dDpr8UkW

  • 更改Ubunut系统的下载源
sudo vim /etc/apt/source.list

将里面的 http://archive.ubuntu.com/ubuntu/修改为 http://mirrors.aliyun.com/ubuntu/

  • 首先安装好一台Ubunut系统后,作为master,在系统中安装配置好静态ip,安装jdk,hadoop后,按照相同的配置克隆master得到node1,node2。此处提到的名称需要使用如下命令修改
sudo vim /etc/hostname
  • 修改host文件
sudo vim /etc/hosts

在文件末尾添加如下内容,

192.168.8.6 master
192.168.8.7 node1
192.168.8.8 node2
  • 配置ssh无密码登录

    • 输入cd ~回到根目录
    • 使用ssh-keygen,一直回车得到下面的结果
    helloful@master:~$ cd ~
    helloful@master:~$ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/helloful/.ssh/id_rsa): 
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /home/helloful/.ssh/id_rsa
    Your public key has been saved in /home/helloful/.ssh/id_rsa.pub
    The key fingerprint is:
    SHA256:tGRzFOZansyT58ahQZyWOaIxESIVbPSkzYowY7AQyYM helloful@master
    The key's randomart image is:
    +---[RSA 3072]----+
    |*o.+=.+.  +.     |
    |E+ .oB . = +     |
    |=.... * * %      |
    |.+ . . B & +     |
    |  . . . S O o    |
    |           B .   |
    |          . +    |
    |           .     |
    |                 |
    +----[SHA256]-----+
    
    
    • 输入cd .ssh

    • 输入 cat ./id_rsa.pub >> authorized_keys

      helloful@master:~$ cd .ssh
      helloful@master:~/.ssh$ cat ./id_rsa.pub >> authorized_keys
      helloful@master:~/.ssh$ cat authorized_keys
      ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDffnfOM4rgcxtm8lkBzPojolSX1zz26r5+hOd0Iy5lgS7atDZgZqQ7JITShpwaENNJ7N8qumsjwnyulBsP5DSRGa0oXzJTafO+Drj47p5V+bI4Nejl+SjXrB6X5RIFD8VmuIrXNMtRx4bQQ4oZQyAF/qSa4wcnsBz8gMPuY3JAnArlsm9MCHfhvTg/zeVTbJjjbyc+8tGXVsa0AVmL5lcrxOcBPc0bP53/agwzPMHuBtlTbvpX2X57XxvKFov8WngSbMZYRWALsW9EvvBZg1oyPVEXo16WK80hWRlZKWiQANJgdWF3sFIiac22ml12NoH7KzmmDEDigd0pqAPaBOlcLvCzWigOJf22hmW8UDTP68kvjR8M4JPDjkwDC5UjO4mzRQUEukeXqGMOxM7drHlyqKpoVE1/zi9rKFSroCnd59a5HIv+0pobMkjwQATh8ZUBEGeEK7yXNBnQTvxFvA8qmJZ62WzGguaty4AWDDQ9HMTkA1twvmlCqBksFSQOpFM= helloful@master
      helloful@master:~/.ssh$ 
      
    • 对每一台主机都执行上面的生成秘钥操作,然后把node1.node2的秘钥复制到master中的 authorized_keys文件中,同理,把master,node2的复制到node1。以此类推。

  • 修改hadoop的配置文件,需要修改的文件为

    • cd ~
      cd /usr/local/hadoop/etc/hadoop
      
    • sudo vim core-site.xml
      

      添加如下内容

      <?xml version="1.0" encoding="UTF-8"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <!--
        Licensed under the Apache License, Version 2.0 (the "License");
        you may not use this file except in compliance with the License.
        You may obtain a copy of the License at
      
          http://www.apache.org/licenses/LICENSE-2.0
      
        Unless required by applicable law or agreed to in writing, software
        distributed under the License is distributed on an "AS IS" BASIS,
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        See the License for the specific language governing permissions and
        limitations under the License. See accompanying LICENSE file.
      -->
      
      <!-- Put site-specific property overrides in this file. -->
      
      <configuration>
      <property>
      
              <name>hadoop.tmp.dir</name>
              
              <value>/usr/local/hadoop/tmp</value>
      
              <description>Abase for other temporary directories.</description>
      
         </property>
      
         <property>
      
              <name>fs.default.name</name>
              
              <value>hdfs://master:9000</value> 
      
      
         </property>
      </configuration>
      
      
      
    • sudo vim hadoop-env.sh
      

      添加一行,内容如下,为自己安装的jdk的路径

      export JAVA_HOME=/usr/local/jdk1.8
      
    • 修改mapred-site.xml中的地址为自己的地址

      sudo vim mapred-site.xml
      
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <!--
        Licensed under the Apache License, Version 2.0 (the "License");
        you may not use this file except in compliance with the License.
        You may obtain a copy of the License at
      
          http://www.apache.org/licenses/LICENSE-2.0
      
        Unless required by applicable law or agreed to in writing, software
        distributed under the License is distributed on an "AS IS" BASIS,
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        See the License for the specific language governing permissions and
        limitations under the License. See accompanying LICENSE file.
      -->
      
      <!-- Put site-specific property overrides in this file. -->
      
      <configuration>
      <property>
      
         <name>mapred.job.tracker</name>
      
         <value>master:49001</value>
      
      </property>
      
      <property>
      
            <name>mapred.local.dir</name>
      
             <value>/usr/local/hadoop/var</value>
      
      </property>
      
      
      <property>
      
             <name>mapreduce.framework.name</name>
      
             <value>yarn</value>
      
      </property>
      </configuration>
      
      
      
    • 修改workers文件

      master修改为

      node1
      node2
      

      node1修改为

      master
      node2
      

      node2修改为

      master
      node1
      
    • 修改yarn-site.xml文件

    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    <property>
    
            <name>yarn.resourcemanager.hostname</name>
    
            <value>master</value>
    
       </property>
    
       <property>
    
            <description>The address of the applications manager interface in the RM.</description>
    
            <name>yarn.resourcemanager.address</name>
    
            <value>${yarn.resourcemanager.hostname}:8032</value>
    
       </property>
    
       <property>
    
            <description>The address of the scheduler interface.</description>
    
            <name>yarn.resourcemanager.scheduler.address</name>
    
            <value>${yarn.resourcemanager.hostname}:8030</value>
    
       </property>
    
       <property>
    
            <description>The http address of the RM web application.</description>
    
            <name>yarn.resourcemanager.webapp.address</name>
    
            <value>${yarn.resourcemanager.hostname}:8088</value>
    
       </property>
    
       <property>
    
            <description>The https adddress of the RM web application.</description>
    
            <name>yarn.resourcemanager.webapp.https.address</name>
    
            <value>${yarn.resourcemanager.hostname}:8090</value>
    
       </property>
    
       <property>
    
            <name>yarn.resourcemanager.resource-tracker.address</name>
    
            <value>${yarn.resourcemanager.hostname}:8031</value>
    
       </property>
    
       <property>
    
            <description>The address of the RM admin interface.</description>
    
            <name>yarn.resourcemanager.admin.address</name>
    
            <value>${yarn.resourcemanager.hostname}:8033</value>
    
       </property>
    
       <property>
    
            <name>yarn.nodemanager.aux-services</name>
    
            <value>mapreduce_shuffle</value>
    
       </property>
    
       <property>
    
            <name>yarn.scheduler.maximum-allocation-mb</name>
    
            <value>1024</value>
    
            <discription>每个节点可用内存,单位MB,默认8182MB</discription>
    
       </property>
    
       <property>
    
            <name>yarn.nodemanager.vmem-pmem-ratio</name>
    
            <value>2.1</value>
    
       </property>
    
       <property>
    
            <name>yarn.nodemanager.resource.memory-mb</name>
    
            <value>1024</value>
    
    </property>
    
       <property>
    
            <name>yarn.nodemanager.vmem-check-enabled</name>
    
            <value>false</value>
    
    </property>
    </configuration>
    
    
    
  • 启动hadoop脚本

    格式化脚本

    cd /usr/local/hadoop/bin
     sudo ./hadoop namenode -format
    
    

    启动脚本

    • 此处需要特别注意,因为上面的ssh-keygen是在用户权限下生成的,因此,运行过程不要加sudo
    cd /usr/local/hadoop/sbin
    ./start-all.sh
    
    • 运行此过程遇到的问题

      • 如果提示不能创建文件,在每个主机上执行下面的指令
      sudo chmod 777 -R /usr/local/hadoop/
      
    • 使用web查看运行

    • 下面查看DFS

    在win10下,虚拟机master ip+9870 port
    比如,其中192.168.8.6是master的ip
    

192.168.8.6:9870


- 查看`YARN`

192.168.8.6:8088





#### 5.2 Hive的安装

- 安装`mysql`并且进入为`hive`授权

grant all on . to root@’%’ identified by ‘12345678’;


- `Hive`下载安装

- ```
  https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/
  • sudo tar -xvf hive.tar.gz -C /usr/local
    

    修改下面的文件

    sudo vim ~/.bashrc
    

    添加如下内容

    export HIVE_HOME=/usr/local/hive
    
    export PATH=$PATH:$HIVE_HOME/bin
    

    保存退出后,运行source ~/.bashrc使配置立即生效

  • 修改/usr/local/hive/conf文件夹中的文件,具体如下

    • hive-default.xml.template重命名为hive-default.xml
    • 新建一个文件touch hive-site.xml,并在hive-site.xml中粘贴如下配置信息:
    <configuration>
      <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://47.117.137.112:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
        <description>
          JDBC connect string for a JDBC metastore.
    ql.jdbc.Driver     To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
          For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
        </description>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore,com.mysql.jdbc.Driver is depricated</description>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>Username to use against metastore database</description>
      </property>
    
      <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>12345678</value>
        <description>password to use against metastore database</description>
      </property>
    
    </configuration>
    
    

上面对应的jdbc版本为5.

  • 安装好mysql以后,新建hive数据库
create database hive

新建好数据库以后,配置数据库权限。配置mysql允许hive接入

mysql8的权限分配不能带密码隐式创建账号了,需要先创建账号再设置权限update

use mysql;
update user set host='%' where user='hive';
grant all privileges on *.* to 'hive'@'%';
alter user hive identifed with mysql_native_password by '123456';
5.3 使用Hive

HDFS上创建Hive所需路径

hadoop fs -mkdir /tmp
hadoop fs -mkdir /user/hive/warehouse

修改上述路径的访问权限,使用用户组具有写入权限

hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
  • 启动hive
  • 进入hive/bin,执行hive,出现报错如下
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path

解决办法

cd hive/conf
cp hive-env.sh.template hive-env.sh
sudo vim hive-env.sh
(加入export HADOOP_HOME=/usr/local/hadoop)
source hive-env.sh
  • 运行hive报错,提示没有实例化
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

具体解决方法为:执行下面的语句初始化

schematool -dbType mysql -initSchema

此处初始化的时候,是根据hive-site.xml中的配置信息进行,因此,我们可以不用自己创建数据库,但是需要分配用户权限

grant all on *.* to hive@'%' identified by '12345678';

初始化的过程中,继续报错

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver

可以知道是没有jdbc驱动

解决方案为把hive/jdbc下的驱动复制到hive/lib下面,但是并不行,于是自己去下载mysql-connector-java5.XX.XX.jarhive/lib下面

6、Hadoop系统学习

参考资料:https://www.zhihu.com/question/333417513

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值