SQOOP2 1.99安装和使用以及各种踩坑

一、sqoop安装和配置

1.下载和解压,设置环境变量(略)

这里吐槽一下:sqoop2的配置和1差很多,网上很多都是1的。更坑的是1.99.4和1.99.7页差很多。

2.配置

修改sqoop/server/conf/catalina.properties(1.99.4需要设置,1.99.7不用设置这个)
把common.loader改成

common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/local/src/hadoop-2.
6.1/share/hadoop/common/*.jar,/usr/local/src/hadoop-2.6.1/share/hadoop/common/lib/*.jar,/usr/local/src/hadoop-2.6.1/share/hadoop/hdfs/*.jar,/usr/local/src/hadoop-2.
6.1/share/hadoop/hdfs/lib/*.jar,/usr/local/src/hadoop-2.6.1/share/hadoop/mapreduce/*.jar,/usr/local/src/hadoop-2.6.1/share/hadoop/mapreduce/lib/*.jar,/usr/local/src
/hadoop-2.6.1/share/hadoop/tools/*.jar,/usr/local/src/hadoop-2.6.1/share/hadoop/tools/lib/*.jar,/usr/local/src/hadoop-2.6.1/share/hadoop/yarn/*.jar,/usr/local/src/h
adoop-2.6.1/share/hadoop/yarn/lib/*.jar,/usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib/*.jar 

修改sqoop/server/conf/sqoop.properties

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/src/hadoop-2.6.1/etc/hadoop

sqoop2支持hadoop的simple和kerberos两种验证机制。所以添加simple验证

org.apache.sqoop.security.authentication.type=SIMPLE  
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler  
org.apache.sqoop.security.authentication.anonymous=true

现在sqoop的日志是哪里启动,就在哪里,可以把有LOGDIR, BASEDIR引用的均替换为实际的绝对路径,这样就有固定日志了。

3.然后到对应目录新建目录 mkdir hadoop_lib

把hadoop相关依赖jar包拷贝到该目录,把sqoop/server/bin/*.jar和sqoop/server/lib/*.jar拷贝到该目录

cp /usr/local/src/hadoop-2.6.1/share/hadoop/common/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp /usr/local/src/hadoop-2.6.1/share/hadoop/common/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp /usr/local/src/hadoop-2.6.1/share/hadoop/hdfs/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp /usr/local/src/hadoop-2.6.1/share/hadoop/hdfs/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/mapreduce/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/mapreduce/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/tools/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/tools/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/yarn/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/yarn/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/hadoop-2.6.1/share/hadoop/httpfs/tomcat/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/sqoop-1.99.4-bin-hadoop200/server/bin/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

cp -rf /usr/local/src/sqoop-1.99.4-bin-hadoop200/server/lib/*.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/hadoop_lib

4.赋予权限(这个我在1.99.7没有设置也可以成功)

sudo chmod 777 -R /usr/local/src/sqoop-1.99.4-bin-hadoop200

5.配置Hadoop代理

    Sqoop server 需要模拟用户访问集群内外的HDFS和其他资源,所以,需要配置Hadoop通过所谓proxyuser系统显示地允许这种模拟。也就是要在hadoop 目录的etc/hadoop/core-site.xml 中增加下面两个属性。两个value的地方写*或实际用户名均可。

<property>
  <name>hadoop.proxyuser.sqoop2.hosts</name>
  <value>*</value>       
</property>
<property>
  <name>hadoop.proxyuser.sqoop2.groups</name>
  <value>*</value>             
</property>

这里我踩坑了,,把sqoop2换成自己的用户名,比如root,而且需要重启Hadoop才能生效。

6.验证

输入sqoop2-tool verify

7.开启

sqoop2-server start

也可以通过批处理的方式启动sqoop2-shell /path/script.sqoop 

批处理里面可以这样写

# Specify company server
set server --host sqoop2.company.net

# Executing given job
start job --name 1

#号代表注释

 

 

二、下载链接驱动

1.到微软官网下载sqljdbc

https://www.microsoft.com/en-us/download/confirmation.aspx?id=11774&6B49FDFB-8E5B-4B07-BC31-15695C5A2143=1

把jar文件放到sqoop安装目录的lib文件夹里

cp sqljdbc41.jar /usr/local/src/sqoop-1.99.4-bin-hadoop200/server/lib

2.下载SQL Server-Hadoop Connector

Sqoop2开始不再需要驱动了,网上教程都是sqoop1的,坑

 

三、操作

参考官网文档https://sqoop.apache.org/docs/1.99.4/Sqoop5MinutesDemo.html

1.进入sqoop客户端

sqoop.sh client或者sqoop2-shell

set server --host master --port 12000 --webapp sqoop (host这里写自己的服务器名称)

最后一个--webapp官方文档说是指定的sqoop jetty服务器名称

show version --all 可以看到client和server都启动了

2.创建连接

查看连接show connector

创建连接create link -c 2(这里的1是指第一个connector也就是hdfs,2是指第二个connector,也就是jdbc,1.99.7就没有这个id了,需要通过-n 名字   来指定)

输入下面信息

Name: testlink
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://192.168.116.10:3306/test
Username: root
Password: ******
JDBC Connection Properties: 
There are currently 0 values in the map:
entry# protocol=tcp 
There are currently 1 values in the map:
protocol = tcp
entry# 
New link was successfully created with validation status OK and persistent

可以看到一件连接成功

接下来连接Hadoop试试,URI是hadoop中配置hdfs-site.xml中的属性fs.defaultFS的值,可以看到有两个连接了

3.创建任务(1.99.7是通过名字的方式)

create job -f 1 -t 2

-f指定from,,-t指定to。参数值是link里面的id

 create job -f 1 -t 2
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: sqoop

From database configuration

Schema name: test
Table name: student
Table SQL statement: 
Table column names: 
Partition column name: 
Null value allowed for the partition column: 
Boundary query: 

ToJob configuration

Output format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
Choose: 0
Compression format: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom compression format: 
Output directory: #这里写的是hdfs的文件夹目录,而且是存在的空目录

Throttling resources

Extractors: 2
Loaders: 2
New job was successfully created with validation status OK  and persistent id 1

4.开启任务

start job --jid 1

或者start job -n sqoop

查看任务状态status job --jid 1

删除任务和连接

delete job --jid 1

delete link --lid 1

 

5.报错处理:

如果遇到报错Exception has occurred during processing command 
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception

设置可查看具体出错信息set option --name verbose --value true

就可以看到报错的原因了。

如果遇到报错GENERIC_HDFS_CONNECTOR_0007:Invalid input/output directory

首先查看一下hdfs是否有对应的文件夹,如果有对应的文件夹还是报错,可能是core-site里面的代理设置错了。比如我就是,图中要写上当前的用户名,官网写sqoop2,我也写了,最开始查了很久才找到原因。改完之后,要重启Hadoop才能生效。

如果遇到报错: MAPREDUCE_0003:Can't get RunningJob instance - 

是因为没有开启historyserver,

检查一下Hadoop里面mapred-site.xml里面是否已经设置了

<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>

如果已经设置了,那就到Hadoop的sbin目录里面开启服务。

  ./mr-jobhistory-daemon.sh start historyserver

如果运行完了报错Exception: Job Failed with status:3,

我们上yarn里面看看情况http://192.168.116.10:8088/cluster,果然是失败了

可以到http://192.168.116.10:19888/jobhistory/查看一下具体报错信息。

可以看到报错信息是com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. 

是MySQL连接出问题了。我的原因是没有授权,在mysql里面输入下面两句话授权所有ip都能通过密码访问。

GRANT ALL PRIVILEGES ON *.* TO root @'%' IDENTIFIED BY "123456";

FLUSH PRIVILEGES;

写在最后,,,建议使用1.99.7,,,,1.99.4好多莫名其妙问题。。。。楼主已弃坑,重新安装1.99.7了。。。。。。。

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值