Hive/Hbase/Sqoop的安装教程

Hive/Hbase/Sqoop的安装教程

 

HIVE INSTALL

1.下载安装包https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
2.上传到Linux指定目录,解压:

mkdir hive 
mv apache-hive-2.3.3-bin.tar.gz hive
tar -zxvf apache-hive-2.3.3-bin.tar.gz
mv apache-hive-2.3.3-bin apache-hive-2.3.3

### 安装目录为:/app/hive/apache-hive-2.3.3 


3.配置环境变量
sudo vi /etc/profile
添加环境变量:

export HIVE_HOME=/app/hive/apache-hive-2.3.3
export PATH=$PATH:$HIVE_HOME/bin

:wq #保存退出


4.修改HIVE配置文件:
配置文件hive-env.sh (在原有的基础上修改,没有的项就添加):

cd /app/hive/apache-hive-2.3.3/conf
cp hive-env.sh.template hive-env.sh
###在文件中添加如下内容-- 去掉#,并把目录改为自己设定的目录
export HADOOP_HEAPSIZE=1024
export HADOOP_HOME=/app/hadoop/hadoop-2.7.7 #hadoop的安装目录
export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
export HIVE_HOME=/app/hive/apache-hive-2.3.3
export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
export JAVA_HOME=/app/lib/jdk

  

创建hdfs文件目录:

cd /app/hive/apache-hive-2.3.3
mkdir hive_site_dir
cd hive_site_dir
hdfs dfs -mkdir -p warehouse #使用这条命令的前提是hadoop已经安装好了
hdfs dfs -mkdir -p tmp
hdfs dfs -mkdir -p log
hdfs dfs -chmod -R 777 warehouse
hdfs dfs -chmod -R 777 tmp
hdfs dfs -chmod -R 777 log
创建临时文件夹:
cd /app/hive/apache-hive-2.3.3
mkdir tmp

  

配置文件hive-site.xml (在原有的基础上修改):
cp hive-default.xml.template  hive-site.xml
vi hive-site.xml
>>配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName

<!--mysql database connection setting -->
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>szprd</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>szprd</value>
</property>

  

>>配置hdfs文件目录

<property>
<name>hive.exec.scratchdir</name>
<!--<value>/tmp/hive</value>-->
<value>/app/hive/apache-hive-2.3.3/hive_site_dir/tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/app/hive/apache-hive-2.3.3/hive_site_dir/warehouse</value>
</property>

<property>
<name>hive.exec.local.scratchdir</name>
<!--<value>${system:java.io.tmpdir}/${system:user.name}</value> -->
<value>/app/hive/apache-hive-2.3.3/tmp/${system:user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>

<property>
<name>hive.downloaded.resources.dir</name>
<!--<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>-->
<value>/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>

<property>
<name>hive.querylog.location</name>
<!--<value>${system:java.io.tmpdir}/${system:user.name}</value>-->
<value>/app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}</valu
<description>Location of Hive run time structured log file</description>
</property>


<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>

修改完配置文件后,:wq 保存退出

5.下载合适版本的mysql驱动包,复制到HIVE安装目录的 lib目录下
https://dev.mysql.com/downloads/connector/j/

6.初始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ )

cd /app/hive/apache-hive-2.3.3/bin
./schematool -initSchema -dbType mysql

  

7.启动hive
hive     #这里配置了环境变量后,可以在任意目录下执行 (/etc/profile)


8.实时查看日志启动hive命令(在hive安装目录的bin目录下执行):

./hive -hiveconf hive.root.logger=DEBUG,console

 

 

 


HBASE INSTALL


1.下载hbase安装包:  http://hbase.apache.org/downloads.html


2.解压: tar -zxvf  hbase-1.2.6.1-bin.tar.gz


3.配置环境变量: (加在最后面)
vi /etc/profile

#HBase Setting
export HBASE_HOME=/app/hbase/hbase-1.2.6.1
export PATH=$PATH:$HBASE_HOME/bin

  

4.编辑配置文件: hbase-env.sh

export HBASE_MANAGES_ZK=false
export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids #如果该目录不存在,则先创建
export JAVA_HOME=/app/lib/jdk #指定JDK的安装目录

 

编辑配置文件: hbase-site.xml
在configuration节点添加如下配置:

<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.1.202:9000/hbase</value>
</property>


<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/vc/dev/MQ/ZK/zookeeper-3.4.12</value>
</property>


<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>


<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
<description>
Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.
</description>
</property>

  

5.启动zookeeper
进入zookeeper的安装目录下的bin目录,执行 ./zkServer.sh
然后启动客户端: ./zkCli.sh
启动成功后,输入: create /hbase hbase

6.启动hbase
进入Hbase的bin目录: ./start-hbase.sh
./hbase shell  #这里启动成功后就可以开始执行hbase相关命令了
list  #没有报错表示成功

7.web访问HBASE: http://10.28.85.149:16010/master-status   #ip为当前服务器的ip,端口为16010

 

 


#Sqoop install
1.下载安装包: https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/


2.解压: tar -zxvf  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

更改文件名: mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0


3. 配置环境变量:/etc/profile

#Sqoop Setting
export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
export PATH=$PATH:$SQOOP_HOME/bin

  

4.将mysql的驱动包复制到 Sqoop安装目录的lib目录下

https://dev.mysql.com/downloads/connector/j/

 

5.编辑配置文件: sqoop的安装目录下的 conf下
vi sqoop-env.sh

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7

#set the path to where bin/hbase is available
export HBASE_HOME=/app/hbase/hbase-1.2.6.1

#Set the path to where bin/hive is available
export HIVE_HOME=/app/hive/apache-hive-2.3.3

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12

  

6,输入命令:

sqoop help  #查看相关的sqoop命令

sqoop version #查看sqoop的版本

 

 

 ps:

关于停止hbase的命令: stop-hbase.sh   ,出现关于pid的错误提示时,请参考这篇博文:https://blog.csdn.net/xiao_jun_0820/article/details/35222699

hadoop的安装教程:http://note.youdao.com/noteshare?id=0cae2da671de0f7175376abb8e705406

zookeeper的安装教程:http://note.youdao.com/noteshare?id=33e37b0967da40660920f755ba2c03f0

 

 

 

 

# hadoop 伪分布式模式安装
# 前提 JDK 安装成功

# 下载hadoop2.7.7
```
cd /home/vc/dev/hadoop

wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz 
```
# 解压缩

```
 tar -zxvf hadoop-2.7.7.tar.gz 
```

## 配置hadoop的环境变量,在/etc/profile下追加 hadoop配置

```
# hadoop home setting 

export HADOOP_HOME=/app/hadoop/hadoop-2.7.7
export HADOOP_INSTALL=${HADOOP_HOME}
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"

```

## 修改 hadoop安装目录/etc/hadoop/hadoop-env.sh 文件

```
# The java implementation to use.
export JAVA_HOME=/home/vc/dev/jdk/jdk1.8.0_161
```
### hadoop安装目录/etc/hadoop/core-site.xml


```
<configuration>

    <!-- 指定hadoop运行时产生文件的存储路径;指定被hadoop使用的目录,用于存储数据文件。-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/vc/dev/hadoop/hadoop-2.7.7/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <!-- 指定HDFS老大(namenode)的通信地址指定默认的文件系统。 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.1.202:9000</value>
    </property>
</configuration>
                    
```
 
### 配置HDFS ,etc/hadoop/hdfs-site.xml

```
<configuration>
        <!-- 设置namenode存放的路径 -->
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/name</value>
        </property>
        <!-- 设置hdfs副本数量 -->
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <!-- 设置datanode存放的路径 -->
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/data</value>
        </property>

</configuration>
~                   
```
### 设置hadoop 伪分布式下免密登入,Hadoop集群节点之间的免密登入务必配置成功,不然有各种问题

如果是单节点情况下免密登入测试`ssh localhost`,如果不能登入成功,执行下面命令:

```
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

```
### 伪分布式下不需要配置/etc/hosts文件,真分布式下需要配置各主机和IP的映射关系。


# hadoop伪分布式下启动
## 配置 hdfs
```
# 第一次启动hdfs需要格式化:出现询问输入Y or N,全部输Y即可
bin/hdfs namenode -format
# 启动 Start NameNode daemon and DataNode daemon: 启动HDFS,这个命令启动hadoop单节点集群
sbin/start-dfs.sh
```
通过上面启动后即可在web页面浏览 NameNode 节点信息:
![](http://one17356s.bkt.clouddn.com/18-8-24/97813052.jpg)

```
# 通过hadoop 命令在hdfs上创建目录
hadoop fs -mkdir /test
# 或者通过这个命令
 hdfs dfs -mkdir /user
 
# 上传文件

```
![](http://one17356s.bkt.clouddn.com/18-8-24/33727958.jpg)

## 关闭 HDFS

```
./sbin/stop-dfs.sh

```
## 配置 yarn
### etc/hadoop/mapred-site.xml

```
<configuration>


 <!-- 通知框架MR使用YARN -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

```
### etc/hadoop/yarn-site.xml
```
<configuration>

<!-- Site specific YARN configuration properties -->

  <!-- reducer取数据的方式是mapreduce_shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
  <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>
```




![](http://one17356s.bkt.clouddn.com/18-8-24/53993777.jpg)
![](http://one17356s.bkt.clouddn.com/18-8-24/28989509.jpg)


## yarn 启动和停止

```
./sbin/start-yarn.sh
./sbin/stop-yarn.sh

```

## 查看集群状态

```
./bin/hadoop dfsadmin  -report   
```
# 伪分布式下测试

```
//服务器上新建目录
 mkdir ~/input
 //进入服务器目录并将hadoop配置文件当做数据文件复制到input目录
 cd ~/input
 cp /app/hadoop/hadoop-2.7.7/etc/hadoop/*.xml ./
 //将 input下的文件上传到hdfs分布式文件系统中/one目录下
 hdfs dfs -put ./* /one
 //检查上传到hdfs中的文件
 hdfs dfs -ls /one
 //执行jar文件,务必保证计算结果目录 /output 在hdfs上不存在。不然报错
 hadoop jar /app/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep /one /output 'dfs[a-z.]+'
 //将计算结果目录导出到服务器下~/input目录中
hdfs dfs -get /output
// 查看内容
cat output/*

```
--- 

# ZK 安装
# 下载zk解压并安装:(zookeeper-3.4.9.tar.gz)
# 设置环境变量
![](http://one17356s.bkt.clouddn.com/17-11-2/30838835.jpg)
# 改配置文件(配置文件存放在$ZOOKEEPER_HOME/conf/目录下,将zoo_sample.cfg文件名称改为zoo.cfg)
配置说明:
- tickTime:这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个 tickTime 时间就会发送一个心跳。
- dataDir:顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存在这个目录里。
- clientPort:这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。

![](http://one17356s.bkt.clouddn.com/18-7-8/79348236.jpg)
4.1单机模式
- 下载zookeeper的安装包之后, 解压到合适目录. 进入zookeeper目录下的conf子目录, 创建`cp zoo_sample.cfg zoo.cfg`根据模板创建配置文件,并配置如下参数。
- tickTime=2000 
- dataDir=/home/vc/dev/MQ/ZK/data
- dataLogDir=/home/vc/dev/MQ/ZK/log
- clientPort=2181 
## 每个参数的含义说明

- tickTime: zookeeper中使用的tick基本时间单位, 毫秒值.
- dataDir: 数据目录. 可以是任意目录.
- dataLogDir: log目录, 同样可以是任意目录. 如果没有设置该参数, 将使用和dataDir相同的设置.
- clientPort: 监听client连接的端口号


# 启动zk
`/dev/Zk/zookeeper-3.4.9/bin$ ./zkServer.sh start`
![](http://one17356s.bkt.clouddn.com/17-11-2/76638495.jpg)

# 查看是否起来
使用命令:`netstat -antp | grep 2181`
![](http://one17356s.bkt.clouddn.com/17-11-2/15616237.jpg)

# 通过zCl.sh链接到zk服务

```
 ./zkCli.sh -server localhost:2181 链接到本机zk服务
 history 执行命令
 quit 客户端断开zkserver链接
 
```
![](http://one17356s.bkt.clouddn.com/18-8-27/4122129.jpg)

# 关闭Zk服务
`./zkServer.sh stop`

---



# [HIVE SQOOP HBASE安装博客链接:](https://www.cnblogs.com/DFX339/p/9550213.html)

# HIVE-INSTALL
- 下载安装包:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
- 上传到Linux指定目录,解压:  
	
```
mkdir hive    
mv apache-hive-2.3.3-bin.tar.gz  hive
tar -zxvf apache-hive-2.3.3-bin.tar.gz
mv apache-hive-2.3.3-bin apache-hive-2.3.3
### 安装目录为:/app/hive/apache-hive-2.3.3
```

- 配置环境变量:

```
sudo  vi /etc/profile
添加:export HIVE_HOME=/app/hive/apache-hive-2.3.3
	  export PATH=$PATH:$HIVE_HOME/bin
:wq    #保存退出
```

- 修改HIVE配置文件:
	- 配置文件hive-env.sh (在原有的基础上修改,没有的项就添加):	
	
    ```
    cd /app/hive/apache-hive-2.3.3/conf
    cp hive-env.sh.template   hive-env.sh
    在文件中添加如下内容(去掉#,并把目录改为自己设定的目录)
    export HADOOP_HEAPSIZE=1024
    export HADOOP_HOME=/app/hadoop/hadoop-2.7.7   #hadoop的安装目录
    export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
    export HIVE_HOME=/app/hive/apache-hive-2.3.3
    export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
    export JAVA_HOME=/app/lib/jdk
    ```

	
- 创建hdfs文件目录:

    ```
    cd /app/hive/apache-hive-2.3.3
    mkdir hive_site_dir
    cd hive_site_dir
    hdfs dfs -mkdir -p warehouse   #使用这条命令的前提是hadoop已经安装好了
    hdfs dfs -mkdir -p tmp
    hdfs dfs -mkdir -p log
    hdfs dfs -chmod -R 777 warehouse
    hdfs dfs -chmod -R 777 tmp
    hdfs dfs -chmod -R 777 log
    创建临时文件夹:
    cd  /app/hive/apache-hive-2.3.3
    mkdir  tmp
    ```

	
- 配置文件hive-site.xml (在原有的基础上修改):	

    ```
    cp hive-default.xml.template   hive-site.xml 
    vi hive-site.xml
    ```

	- 配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName
	
    ```
    <!--mysql database connection setting -->
    <property>
    	<name>javax.jdo.option.ConnectionDriverName</name>
    	<value>com.mysql.jdbc.Driver</value>
    </property>
    
    <property>
    	<name>javax.jdo.option.ConnectionURL</name>
    	<value>jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8</value>
    </property>
    
    <property>
    	<name>javax.jdo.option.ConnectionUserName</name>
    	<value>szprd</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>szprd</value>
    </property>
    ```


- 配置hdfs文件目录

    ```
    <property>
    		<name>hive.exec.scratchdir</name>
    		<!--<value>/tmp/hive</value>-->
    		<value>/app/hive/apache-hive-2.3.3/hive_site_dir/tmp</value>
    		<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
           </property>
    	
           <property>
                 <name>hive.metastore.warehouse.dir</name>
                 <value>/app/hive/apache-hive-2.3.3/hive_site_dir/warehouse</value>
           </property>
    	
    	<property>
    		<name>hive.exec.local.scratchdir</name>
    		<!--<value>${system:java.io.tmpdir}/${system:user.name}</value> -->
    		<value>/app/hive/apache-hive-2.3.3/tmp/${system:user.name}</value>
    		<description>Local scratch space for Hive jobs</description>
    	</property>
      
         <property>
            <name>hive.downloaded.resources.dir</name>
            <!--<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>-->
    	<value>/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources</value>
            <description>Temporary local directory for added resources in the remote file system.</description>
         </property>
      
         <property>
             <name>hive.querylog.location</name>
             <!--<value>${system:java.io.tmpdir}/${system:user.name}</value>-->
    	 <value>/app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}</valu
             <description>Location of Hive run time structured log file</description>
         </property>
      
      
      <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
        <description>
          Enforce metastore schema version consistency.
          True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
                schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
                proper metastore schema migration. (Default)
          False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
        </description>
      </property>
    ```

  
**修改完hive-site.xml 配置文件后,wq 保存退出**
	
- 下载合适版本的mysql驱动包,放到HIVE安装目录的 lib目录下
	https://dev.mysql.com/downloads/connector/j/
	
- 始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ )
	
    ```
    cd  /app/hive/apache-hive-2.3.3/bin
    ./schematool -initSchema -dbType mysql
    ```

	 
- 启动hive

	`hive   #这里配置了环境变量后(/etc/profile),可以在任意目录下执行 `
	
- 实时查看日志启动hive命令(在hive安装目录的bin目录下执行): `./hive -hiveconf hive.root.logger=DEBUG,console`

--- 	 
	 
	 
# HBASE INSTALL
- [下载hbase安装包:](http://hbase.apache.org/downloads.html)


- 解压: `tar -zxvf  hbase-1.2.6.1-bin.tar.gz`

- 配置环境变量:	(加在最后面)

    ```
    vi  /etc/profile
    #HBase Setting
    export HBASE_HOME=/app/hbase/hbase-1.2.6.1
    export PATH=$PATH:$HBASE_HOME/bin
    ```

- 编辑配置文件:  `hbase-env.sh`
    
    ```
    # 默认为ture,表示使用内建的zk,false使用外部zk系统
    export HBASE_MANAGES_ZK=false
    export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids   #如果该目录不存在,则先创建
    export JAVA_HOME=/app/lib/jdk   #指定JDK的安装目录
    ```

 
- 编辑配置文件:   `hbase-site.xml`
 在configuration节点添加如下配置:

```
<configuration>
<!-- 备份数据份数 -->
<name>dfs.replication</name>

    <value>1</value>

</property>

<!-- 配置hbase 在hadoop 中的根目录 -->
<property>
   <name>hbase.rootdir</name>
   <value>hdfs://10.28.85.149:9000/hbase</value>
</property>

<!-- zk 监听的端口号,必须和zk系统监听的端口一致 -->
<property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
  </property>
<!-- zk 属性文件中dataDir属性设置值一致 -->
<property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/app/zookeeper/data</value>
</property>

<!-- zk 根 znode 节点 -->
<property>
        <name>zookeeper.znode.parent</name>
        <value>/hbase</value>
</property>

<!-- hbase 是否是集群安装 -->
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
<!-- 如果你使用本地文件系统,LocalFileSystem 这个属性设置成 false -->
 <property>
         <name>hbase.unsafe.stream.capability.enforce</name>
         <value>true</value>
        <description>
                Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
                 WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.
         </description>
</property>
</configuration>
```


- 启动zookeeper
进入zookeeper的安装目录下的bin目录,执行  `./zkServer.sh`

然后启动客户端: `  ./zkCli.sh`

启动成功后,输入:   ` create /hbase hbase`

- 启动hbase

进入Hbase的bin目录:   `./start-hbase.sh`
		
```
./hbase shell   #这里启动成功后就可以开始执行hbase相关命令了
list  #查看当前hbase库中的所有表,没有报错表示成功
```

					
- web访问HBASE:   http://10.28.85.149:16010/master-status   #ip为当前服务器的ip,端口为16010
		
--- 



# SQOOP INSTALL

- [下载安装包](https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/)


- 解压 ` tar -zxvf  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz`

    更改文件名:    `mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0`
		
- 配置环境变量:

    ```
    #Sqoop Setting
    export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
    export PATH=$PATH:$SQOOP_HOME/bin
    ```


- 将mysql的驱动包复制到 Sqoop安装目录的lib目录下
    下载地址:https://dev.mysql.com/downloads/connector/j/

- 编辑配置文件:  sqoop的安装目录下的 conf下

```
vi sqoop-env.sh

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7

#set the path to where bin/hbase is available
export HBASE_HOME=/app/hbase/hbase-1.2.6.1

#Set the path to where bin/hive is available
export HIVE_HOME=/app/hive/apache-hive-2.3.3

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12
```


- 测试sqoop的安装
	- sqoop help  #可以查看到sqoop的相关命令
	
	- 测试sqoop的连接: 查看此连接信息下的所有数据库
	
    ```
    sqoop list-databases \
    --connect jdbc:mysql://10.28.85.148:3306/data_mysql2hive \
    --username root \
    --password Abcd1234
    ```



--- 

# oozie 安装 
# 安装基于oozie-4.0.0-cdh5.3.6.tar.gz oozie 版本
安装之前准备条件:
- 可用的mysql数据库
- 已经安装好的hadoop集群
- oozie 最终编译好的安装包中 `oozie-server` 就是一个tomcat环境,不用另外安装tomcat 环境。
## 安装
- 下载编译后的压缩包:`wget http://archive.cloudera.com/cdh5/cdh/5/oozie-4.0.0-cdh5.3.6.tar.gz`
- 解压缩到所指定的目录 :`tar -zxvf oozie-4.0.0-cdh5.3.6.tar.gz`,这里使用的目录是: `/app/oozie`
- 设置全局环境变量:`sudo vim /etc/profile`
```

#oozie setting
export OOZIE_HOME=/app/oozie/oozie-4.0.0-cdh5.3.6
export PATH=$PATH:$OOZIE_HOME/bin
```

- 设置 ` Oozie安装目录/conf/oozie-env.sh   ` 设置环境变量
同时oozie的web console 的端口也在这里进行设置:
`OOZIE_HTTP_PORT ` 设置 oozie web 服务的监听端口,默认是11000
```

export OOZIE_CONF=${OOZIE_HOME}/conf
export OOZIE_DATA=${OOZIE_HOME}/data
export OOZIE_LOG=${OOZIE_HOME}/logs
export CATALINA_BASE=${OOZIE_HOME}/oozie-server
export CATALINA_HOME=${OOZIE_HOME}/oozie-server
```

- 在Oozie根目录下创建libext文件夹,并将Oozie依赖的其他第三方jar移动到该目录下面。`mkdir libext`
    
    - 将下载的ext2.2添加到 libext 目录 :` cp ext-2.2.zip oozie-5.0.0/libext/`
    - 添加hadoop lib下的包到libext目录,进入libext目录`cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/*.jar ./`和 ` cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/lib/*.jar ./`
    - 添加对于存储元数据的mysql数据库的驱动(`mysql-connector-java-5.1.41.jar`)

- hadoop 设置oozie 代理用户设置:
只需要替换xxx 为你oozie提交任务的用户名即可。
    - hadoop.proxyuser.**xxx**.hosts
    
    - hadoop.proxyuser.**xxx**.groups
```
<!-- oozie    -->
<property>
    <name>hadoop.proxyuser.imodule.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.imodule.groups</name>
    <value>*</value>
  </property>
```

- 在hdfs上设置Oozie的公用jar文件夹,

hadoop的默认端口号是8020,我改成了9000,所以这里注意一下:

遇到一个问题是:NameNode 处于 safe mode,需要关闭安全模式:`hdfs dfsadmin -safemode leave`

```
 oozie-setup.sh sharelib create -fs hdfs://10.28.85.149:9000 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
```
- 创建Oozie的war文件

先将hadoop相关包,mysql相关包,ext相关压缩包放到libext文件夹中,然后运行:`oozie-setup.sh prepare-war` 命令创建war包。


- oozie 安装目录conf/oozie-site.xml


oozie.service.HadoopAccessorService.hadoop.configurations属性的值为本地hadoop目录的配置文件路径:
```
 <configuration>
<property>
        <name>oozie.services</name>
        <value>
        org.apache.oozie.service.JobsConcurrencyService,
            org.apache.oozie.service.SchedulerService,
            org.apache.oozie.service.InstrumentationService,
            org.apache.oozie.service.MemoryLocksService,
            org.apache.oozie.service.CallableQueueService,
            org.apache.oozie.service.UUIDService,
            org.apache.oozie.service.ELService,
            org.apache.oozie.service.AuthorizationService,
            org.apache.oozie.service.UserGroupInformationService,
            org.apache.oozie.service.HadoopAccessorService,
            org.apache.oozie.service.URIHandlerService,
            org.apache.oozie.service.DagXLogInfoService,
            org.apache.oozie.service.SchemaService,
            org.apache.oozie.service.LiteWorkflowAppService,
            org.apache.oozie.service.JPAService,
            org.apache.oozie.service.StoreService,
            org.apache.oozie.service.CoordinatorStoreService,
            org.apache.oozie.service.SLAStoreService,
            org.apache.oozie.service.DBLiteWorkflowStoreService,
            org.apache.oozie.service.CallbackService,
            org.apache.oozie.service.ActionService,
            org.apache.oozie.service.ShareLibService,
            org.apache.oozie.service.ActionCheckerService,
            org.apache.oozie.service.RecoveryService,
            org.apache.oozie.service.PurgeService,
            org.apache.oozie.service.CoordinatorEngineService,
            org.apache.oozie.service.BundleEngineService,
            org.apache.oozie.service.DagEngineService,
            org.apache.oozie.service.CoordMaterializeTriggerService,
            org.apache.oozie.service.StatusTransitService,
            org.apache.oozie.service.PauseTransitService,
            org.apache.oozie.service.GroupsService,
            org.apache.oozie.service.ProxyUserService,
            org.apache.oozie.service.XLogStreamingService,
            org.apache.oozie.service.JvmPauseMonitorService
        </value>
    </property>
    <!-- 配置hadoop etc/hadoop目录  -->
    <property>
        <name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
        <value>*=/app/hadoop/hadoop-2.7.7/etc/hadoop</value>
    </property>
    <property>
        <name>oozie.service.JPAService.create.db.schema</name>
        <value>true</value>
    </property>

    <property>
        <name>oozie.service.JPAService.jdbc.driver</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>oozie.service.JPAService.jdbc.url</name>
        <value>jdbc:mysql://10.28.85.148:3306/ooize?createDatabaseIfNotExist=true</value>
    </property>

    <property>
        <name>oozie.service.JPAService.jdbc.username</name>
        <value>root</value>
    </property>
    
    <property>
        <name>oozie.service.JPAService.jdbc.password</name>
        <value>Abcd1234</value>
    </property>

</configuration>
```

- 运行Oozie服务并检查是否安装完成
`oozied.sh run 或者oozied.sh start` (前者在前端运行,后者在后台运行)
- 关闭oozie 服务: `oozied.sh stop`
- 命令行检查oozie web 状态(`oozie admin -oozie http://10.28.85.149:11000/oozie -status `)  返回:`System mode: NORMAL`
- 然后使用shareliblist命令查看相关内容 `oozie admin -shareliblist -oozie http://localhost:11000/oozie`
- 页面访问:`http://10.28.85.149:11000/oozie/`

**遇到 了一个问题**

```
Sep 03, 2018 4:36:47 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NullPointerException
        at org.apache.jsp.index_jsp._jspInit(index_jsp.java:25)
        at org.apache.jasper.runtime.HttpJspBase.init(HttpJspBase.java:52)
        at org.apache.jasper.servlet.JspServletWrapper.getServlet(JspServletWrapper.java:164)
        at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:340)
        at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:154)
        at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:594)
        at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:553)
        at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:159)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:84)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:620)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:745)
```
问题原因是工程目录下`WEB-INF/lib`目录和tomcat下lib目录都有servlet-api.jar ,jsp-api.jar 文件造成的。
`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `下 和`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/lib`两个目录下有具有相同的jar包造成了冲突。`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server`这个目录下就是oozie-server的tomcat 环境。目录下的lib目录就是tomcat运行时jar包。

解决办法:将`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `目录下的:servlet-api-2.5-6.1.14.jar, servlet-api-2.5.jar, jsp-api-2.1.jar 三个文件删除即可。

然后就可以顺利启动了
![](http://one17356s.bkt.clouddn.com/18-9-3/48205608.jpg)

---		

Pig的安装
# 前提 ### hadoop 2.7.7 已安装 ### jdk1.7+ # 安装 ``` tar -xzvf pig-0.17.0.tar.gz # Pig setting export PIG_HOME=/app/pig/pig-0.17.0 export PATH=$PATH:$PIG_HOME/bin ``` # 测试 ``` -- 本地模式 pig -x local -- mapreduce模式 pig -x mapreduce ``` ![](http://one17356s.bkt.clouddn.com/18-8-28/13040171.jpg) ---

 

转载于:https://www.cnblogs.com/DFX339/p/9550213.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
要使用Sqoop将数据从关系型数据库导入到HiveHBase中,需要进行以下基本配置: 1. Hive配置 在使用Sqoop将数据导入到Hive之前,需要在Hive的配置文件hive-site.xml中添加以下配置: ``` <property> <name>hive.metastore.uris</name> <value>thrift://<hive_metastore_host>:9083</value> </property> ``` 其中,`<hive_metastore_host>`是Hive元数据存储的主机名或IP地址。 2. HBase配置 在使用Sqoop将数据导入到HBase之前,需要在HBase的配置文件hbase-site.xml中添加以下配置: ``` <property> <name>hbase.zookeeper.quorum</name> <value><zookeeper_quorum_host></value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> ``` 其中,`<zookeeper_quorum_host>`是ZooKeeper集群的主机名或IP地址。 3. Sqoop配置 在使用Sqoop将数据导入到HiveHBase之前,还需要在Sqoop的配置文件sqoop-site.xml中进行以下配置: - 如果要将数据导入到Hive中,需要添加以下配置: ``` <property> <name>hive.server2.authentication.kerberos.principal</name> <value>hive/[email protected]</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/path/to/hive.keytab</value> </property> ``` 其中,`_HOST`将被替换为Hive Server 2的主机名或IP地址,`/path/to/hive.keytab`是Hive Server 2的Keytab文件路径。 - 如果要将数据导入到HBase中,需要添加以下配置: ``` <property> <name>hbase.security.authentication</name> <value>kerberos</value> </property> <property> <name>hbase.master.kerberos.principal</name> <value>hbase/[email protected]</value> </property> <property> <name>hbase.regionserver.kerberos.principal</name> <value>hbase/[email protected]</value> </property> <property> <name>hbase.master.keytab.file</name> <value>/path/to/hbase.keytab</value> </property> <property> <name>hbase.regionserver.keytab.file</name> <value>/path/to/hbase.keytab</value> </property> ``` 其中,`_HOST`将被替换为HBase Master和RegionServer的主机名或IP地址,`/path/to/hbase.keytab`是HBase的Keytab文件路径。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值