sqoop配置
1. sqoop2
1.1服务器端
选择node4 来安装sqoop2(请删除zookeeper日志、hadoop日志)
首先安装一个命令(四台机器都安装)
下载:
yum search dos2unix
安装:
yum install dos2unix
首先进入hadoop此目录下core-site.xml文件中增加如下的配置,
cd /data/hadoop/hadoop/etc/hadoop
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
然后进入到node4中创建sqoop文件夹,将sqoop-1.99.7-bin-hadoop200.tar.gz上传到sqoop文件夹中
解压sqoop2
tar -xzf sqoop-1.99.7-bin-hadoop200.tar.gz
rm -rf sqoop-1.99.7-bin-hadoop200.tar.gz
mv sqoop-1.99.7-bin-hadoop200/ sqoop-2
rm -rf docs/
将mysql的驱动包放到%Sqoop_home%\server\lib下,jar包要匹配自己mysql版本
我用的是mysql-connector-java-8.0.19
修改%Sqoop_home%/conf/sqoop.properties
/data/hadoop/hadoop/etc/hadoop
之后修改bin目录下的sqoop.sh文件
在开头增加如下内容,指定hadoop目录
export HADOOP_HOME=/data/hadoop/hadoop
要主注意jar包冲突,因为我的hadoop版本较高,所以把hadoop中的jar包复制一个到sqoop中,解决jar包冲突问题
hadoop:guava-27.0-jre.jar
sqoop: guava-11.0.2.jar
hadoop\share\hadoop\common\lib\guava-27.0-jre.jar拷贝到/data/sqoop/sqoop-2/server/lib和/data/sqoop/sqoop-2/tools/lib
删除老的,留下新的
然后进入node4
/data/sqoop/sqoop2/server/lib
目录下将comm-lang2.5
下载到桌面
拷贝到hadoop此目录下
/data/hadoop/hadoop/share/hadoop/common/lib/
因为2.5与3.5结构不同,不用担心冲突问题
之后启动zookeeper,hadoop集群,
然后执行初始化命令
如果出现如下错误:
则是换行问题,执行如下命令转换格式即可解决
dos2unix + demo.sh 文件
bin/sqoop2-tool verify
成功:
然后可以启动服务:
bin/sqoop2-server start
单节点配置完成!!
1.2工具(导入、导出元数据)
把元数据从derby导出
bin/sqoop2-tool repositorydump -o ~/data.json
元数据导入
sqoop2-tool repositoryload -i ~/data.json
1.3客户端
首先保证hadoop集群已开启,单节点sqoop服务已关闭
然后随便选择一台机器(以node3为例)创建一个sqoop文件夹
mkdir sqoop
将node4上面的sqoop分发给node3
scp -r sqoop2/ node3:`pwd`
先在node4开启服务器
然后node3上启动客户端
bin/sqoop2-shell
启动客户端图形界面服务(页面暂时无法打开node4:12000):
set option --name verbose --value true
set server --host node4 --port 12000 --webapp sqoop
检查是否连接成功
show version -all
启动jobhistoryServer(最好shell客户端打开之前启动)
bin/mapred --daemon start historyserver
1.4 创建hdfs link
create link --connector hdfs-connector
Name:
hdfs
URI:
hdfs://node1:8020
Conf directory:
/data/hadoop/hadoop/etc/hadoop
其他的回车跳过
查看
show link
1.5创建一个mysql的link
create link --connector generic-jdbc-connector
Name:
mysql
Driver class:
com.mysql.cj.jdbc.Driver
Connection String:
jdbc:mysql://192.168.56.1/userscenter?useSSL=false&serverTimezone=GMT%2B8&allowMultiQueries=true&allowPublicKeyRetrieval=true
数据库账号与密码:
Username: root
Password: *************
Identifier enclose:
内容是一个空格
然后完成
1.6 创建一个job,从link(mysql)到link(hdfs)
create job --from mysql --to hdfs
然后启动job
start job --name myjob
查看job状态
status job --name myjob
查看日志
tail -f \@LOGDIR\@/sqoop.log
查看hdfs
接下来等待job任务完成即可
1.7 创建一个job 从hdfs–>mysql
create job --from hdfs --to mysql
启动
start job --name myjob2
查看状态
status job --name myjob2
2. sqoop1
同sqoop2,上传tar包,解压,改名字为sqoop-1,然后删除docs
之后
复制conf/sqoop-env-template.sh为sqoop-env.sh
然后替换为此内容
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*
# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/data/hadoop/hadoop
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME/share/hadoop/mapreduce/
#set the path to where bin/hbase is available
export HBASE_HOME=/data/hbase/hbase
#Set the path to where bin/hive is available
export HIVE_HOME=/data/hive/apache-hive
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/data/zookeeper/apache-zookeeper/
实验启动命令
bin/sqoop help
bin/sqoop help import
bin/sqoop import --help
2.1 导入
拷贝mysql的驱动包;(mysql)
Sqoop与hadoop-mapperreduce的jar有问题,手动将hadoop-mapper的jar包拷贝到sqoop下面;
不然会报错:
2.1.1 mysql–>hdfs
查询mysql提供的数据
bin/sqoop import --driver com.mysql.cj.jdbc.Driver --connect jdbc:mysql://192.168.56.1/userscenter?serverTimezone=GMT%2B8 --table a_users --username root --password lzds010329--+ --verbose --target-dir=/sqoop/a_users
导入数据
-query和–table是互斥的;
$CONDITIONS sql语句执行的条件,这个条件由sqoop自己控制
–split-by;拆分的这一列不能重复,一般是主键
bin/sqoop import --connect jdbc:mysql://192.168.56.1/userscenter?serverTimezone=GMT%2B8 --username root --password lzds010329--+ --verbose --fetch-size 30 --query 'select * from a_users where $CONDITIONS' --split-by id --target-dir=/sqoop/a_users
新建一个文件;(一行一个,使用#注释);文件名随便写
import
--connect
jdbc:mysql://192.168.56.1/userscenter?serverTimezone=GMT%2B8
--username
root
--password
Lzds010329--+
--verbose
--fetch-size
30
--query
'select * from a_users where $CONDITIONS'
--split-by
id
--target-dir=/sqoop/a_users
执行目录带上文件
bin/sqoop --options-file /root/sqoop_file.txt
bin/sqoop import --connect jdbc:mysql://192.168.56.1:3306/mydata?useSSL=false \
--username root --password lzds010329--+ --verbose --fetch-size 30 --query 'select * from a_king where $CONDITIONS' --split-by id \
--target-dir=/sqoop/a_king
2.1.2 mysql–hive
将hive-common-3.1.1.jar拷贝到sqoop/lib下面
fields-terminated-by;列的分隔符
命令
bin/sqoop import --connect jdbc:mysql://192.168.56.1/test_hive?serverTimezone=GMT%2B8 \
--username root --password lzds010329--+ --verbose --fetch-size 30 --query 'select * from a_dynasty where $CONDITIONS' --split-by id \
--target-dir=/sqoop/a_dynasty \
--fields-terminated-by ',' \
--hive-import \
--hive-table mydata.a_dynasty --hive-overwrite
2.1.3 Mysql-hbase
拷贝 hbase-client-2.2.5.jar包sqoop中
bin/sqoop import --connect jdbc:mysql://192.168.56.1:3306/mydata?useSSL=false \
--username root --password lzds010329--+ --verbose --fetch-size 30 --query 'select * from a_dynasty where $CONDITIONS' --split-by id \
--target-dir=/sqoop/a_dynasty \
--fields-terminated-by ',' \
--hbase-table a_dynasty \
---column-family cf --hbase-row-key id --hbase-create-table
2.1.4 sqoop-import-all-tables
将一个库里面所有的表都导出来
bin/sqoop import-all-tables --connect jdbc:mysql://192.168.56.1:3306/test_hive?serverTimezone=GMT%2B8 \
--username root --password lzds010329--+ --verbose --fetch-size 30 \
--warehouse-dir=/sqoop/a_dynasty
2.2 导出
2.2.1 Hdfs–>mysql
bin/sqoop export --connect jdbc:mysql://192.168.56.1:3306/test_hive?serverTimezone=GMT%2B8 --username root --password lzds010329--+ --verbose --table a_dynasty \
--export-dir=/sqoop/a_dynasty/a_dynasty
2.2.2 Job
将json的包,上传到sqoop/lib下面
不然会报如下错
创建job
bin/sqoop job --create myjob -- import --connect jdbc:mysql://192.168.56.1:3306/test_hive?serverTimezone=GMT%2B8 --username root --password lzds010329--+ --verbose --fetch-size 30 --query 'select * from a_king where $CONDITIONS' --split-by id --target-dir=/sqoop/a_king
查看所有job
bin/sqoop job --list
查看单个job
bin/sqoop job --show myjob
执行job
bin/sqoop job --exec myjob
删除job
bin/sqoop job --delete myjob
3 常见问题
启动job任务时报错
解决:
创建generic-jdbc-connector link 时:
Identifier enclose:指定SQL中标识符的定界符,也就是说,有的SQL标示符是一个引号:select * from “table_name”,这种定界符在MySQL中是会报错的。这个属性默认值就是双引号,所以不能使用回车,必须将之覆盖,使用空格覆盖了这个值
Identifier enclose: 注意 这里不能直接回车!要打一个空格符号!因为如果不打,查询mysql表的时候会在表上加上“”,导致查询出错!