Sqoop2 相比Sqoop1 升级幅度太大,可以说两个软件完全没有关系.
Sqoop2 相比Sqoop1 增加了server端, sqoop1 是那种解压出来配置个环境变量就能直接使用的软件, sqoop2 安装部署使用复杂,而且官方给出来的文档有几个坑要踩踩.
Sqoop2 Version 1.99.7 安装部署
下载软件:wget http://apache.fayea.com/sqoop/1.99.7/sqoop-1.99.7-bin-hadoop200.tar.gz
解压软件:tar -zvxf sqoop-1.99.7-bin-hadoop200.tar.gz
存放到指定路径:mv sqoop-1.99.7-bin-hadoop200 /usr/local/sqoop
赋权限:chmod -R 755 /usr/local/sqoop
修改环境变量:vim /etc/profile
最下方添加如下变量:
export SQOOP_HOME=/usr/local/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
export CATALINA_BASE=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs
export SQOOP_SERVER_EXTRA_LIB=/usr/local/sqoop/server/lib
指定SQOOP_SERVER_EXTRA_LIB 后续要把连接mysql的jar包 copy 至这个文件加下
copy mysql-connector jar包至指定文件:
cp /Users/luhuijun/Downloads/mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar $SQOOP_HOME/server/lib
修改配置文件:sudo vim /usr/local/sqoop/conf/sqoop.properties
修改指向我的hadoop配置文件目录:org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/etc/hadoop
配置catalina.properties 此文件不存在,需要自已建立:vim /usr/local/sqoop/conf/catalina.properties
内容如下:
common.loader=/usr/local/hadoop/share/hadoop/common/*.jar,/usr/local/hadoop/share/hadoop/common/lib/*.jar,/usr/local/hadoop/share/hadoop/hdfs/*.jar,/usr/local/hadoop/share/hadoop/hdfs/lib/*.jar,/usr/local/hadoop/share/hadoop/mapreduce/*.jar,/usr/local/hadoop/share/hadoop/mapreduce/lib/*.jar,/usr/local/hadoop/share/hadoop/tools/*.jar,/usr/local/hadoop/share/hadoop/tools/lib/*.jar,/usr/local/hadoop/share/hadoop/yarn/*.jar,/usr/local/hadoop/share/hadoop/yarn/lib/*.jar,/usr/local/hadoop/share/hadoop/httpfs/tomcat/lib/*.jar,
修改启动脚本:
vim /usr/local/sqoop/bin/sqoop.sh
添加如下内容,注意自己的路径:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home
HADOOP_COMMON_HOME=/usr/local/hadoop/share/hadoop/common
HADOOP_HDFS_HOME=/usr/local/hadoop/share/hadoop/hdfs
HADOOP_MAPRED_HOME=/usr/local/hadoop/share/hadoop/mapreduce
HADOOP_YARN_HOME=/usr/local/hadoop/share/hadoop/yarn
修改hadoop的yarn-site.xml:
vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
添加如下属性:
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
修改hadoop的container-executor.cfg:
b
vim /usr/local/hadoop/etc/hadoop/container-executor.cfg
修改如些配置:
allowed.system.users=luhuijun
配置为当前用户名, 不然sqoop2-shell去访问hdfs 文件的时候会报错
(user: luhuijun is not allowed to impersonate luhuijun)
这个问题坑了我好久
修改core-site.xml:
vim /usr/local/hadoop/etc/hadoop/core-site.xml
添加如下配置:
<property>
<name>hadoop.proxyuser.luhuijun.groups</name>
<value>staff</value>
</property>
<property>
<name>hadoop.proxyuser.luhuijun.hosts</name>
<value>localhost</value>
</property>
跟修改上一个配置是解决同一个问题,
sqoop2-shell去访问hdfs的时候会报(user: luhuijun is not allowed to impersonate luhuijun),staff 为当前系统用户的用户组, 使用groups能查到当前用户组名.
启动sqoop 服务:sqoop.sh server start
验证启动是否成功
sqoop.sh client 或 sqoop2-shell 进入客户端
set server --host hadoopMaster --port 12000 --webapp sqoop 设置服务器,注意hadoopMaster为hdfs主机名
show connector --all 查看连接类型
show link 查看连接
show job 查看job