https://blog.csdn.net/gdmzlhj1/article/details/50483171
sqoop1.4.6或sqoop1.4.7 安装和使用,对应hadoop版本hadoop-2.7.1
将mysql中的数据导入到hive中报错:
ERROR tool.ImportTool:Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
这是因为sqoop需要一个hive的包,将hive/lib中的hive-common-2.3.3.jar拷贝到sqoop的lib目录中
Exception in thread “main” java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosTicket(Ljavax/security/auth/Subject;)Z
缺少hadoop-common-2.7.1.jar 包,注意hadoop版本和sqoop版本对应
sqoop1.4.6安装
sqoop2不建议生产中使用,这里测试sqoop1.
–下载sqoop-1.4.6
http://mirror.bit.edu.cn/apache/sqoop/1.4.6/
[ ] sqoop-1.4.6.bin__hadoop-0.23.tar.gz 08-May-2015 16:28 16M
[ ] sqoop-1.4.6.bin__hadoop-1.0.0.tar.gz 08-May-2015 16:28 16M
[ ] sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz 08-May-2015 16:28 16M
[ ] sqoop-1.4.6.tar.gz 08-May-2015 16:28 2.1M
选择对应的hadoop版本,我的hadoop版本是hadoop-2.7.1
[hadoop@node1 ~]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
–设置环境变量
[hadoop@node1 bin]$ sudo vi /etc/profile
export JAVA_HOME=/home/hadoop/jdk1.7.0_67
export HADOOP_HOME=/home/hadoop/hadoop-2.7.1
export ZK_HOME=/home/hadoop/zookeeper-3.4.6
export HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin
export HBASE_HOME=/home/hadoop/hbase-1.1.2
export SQOOP_HOME=/home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha
export PATH=
P
A
T
H
:
PATH:
PATH:{JAVA_HOME}/bin:
H
A
D
O
O
P
H
O
M
E
/
b
i
n
:
{HADOOP_HOME}/bin:
HADOOPHOME/bin:{HADOOP_HOME}/sbin:
Z
K
H
O
M
E
/
b
i
n
:
{ZK_HOME}/bin:
ZKHOME/bin:{HIVE_HOME}/bin:
H
B
A
S
E
H
O
M
E
/
b
i
n
:
{HBASE_HOME}/bin:
HBASEHOME/bin:{SQOOP_HOME}/bin
[hadoop@node1 bin]$ sqoop help
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/…/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/…/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/…/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
15/11/24 13:44:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]
–把sqoop/bin/configure-sqoop里面的两段内容注释掉就可以了。根据fail搜索
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See ‘sqoop help COMMAND’ for information on a specific command.
–设置配置文件
[hadoop@node1 conf]$ cp sqoop-env-template.sh sqoop-env.sh
[hadoop@node1 conf]$ vi sqoop-env.sh
[hadoop@node1 conf]$ vi sqoop-site.xml
–复制需要的类
[hadoop@node1 ~]$ cp $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.1.jar
S
Q
O
O
P
H
O
M
E
/
l
i
b
[
h
a
d
o
o
p
@
n
o
d
e
1
m
y
s
q
l
−
c
o
n
n
e
c
t
o
r
−
j
a
v
a
−
5.1.37
]
SQOOP_HOME/lib [hadoop@node1 mysql-connector-java-5.1.37]
SQOOPHOME/lib[hadoop@node1mysql−connector−java−5.1.37] cp mysql-connector-java-5.1.37-bin.jar $SQOOP_HOME/lib
mysql-connector-java-5.1.37-bin.jar这个包才有用
–附配置:
[hadoop@node1 conf]$ vi sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.1/
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.1/
export HBASE_HOME=/home/hadoop/hbase-1.1.2
export HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin
export ZOOCFGDIR=/home/hadoop/zookeeper-3.4.6/conf
[hadoop@node1 conf]$ vi sqoop-site.xml
sqoop list-databases --connect jdbc:mysql://node1 --username root --password 123456
–测试:
[hadoop@node1 hadoop]$ sudo service mysqld start
正在启动 mysqld: [确定]
[hadoop@node1 hadoop]$ mysql -uroot -p123456
mysql>use
mysql> create table a(id int,name varchar(50));
mysql> insert into a values(1,‘a1’);
mysql> insert into a values(2,‘a2’);
mysql> commit;
mysql> select * from a;
±-----±-----+
| id | name |
±-----±-----+
| 1 | a1 |
| 2 | a2 |
±-----±-----+
[hadoop@node1 lib]$ sqoop create-hive-table --connect jdbc:mysql://node1/hive --username root --password 123456 --table a --hive-table a --fields-terminated-by ‘,’ --hive-overwrite
[hadoop@node1 conf]$ sqoop list-tables --connect jdbc:mysql://node1/hive --username root --password 123456
a
mysql导入到hive中,导出格式如上方
[hadoop@node1 lib]$ sqoop import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a --hive-table a --hive-import --fields-terminated-by ‘,’ --hive-overwrite -m 1
mysql> create table b(id int,name varchar(50)); --先建立表
Query OK, 0 rows affected (0.13 sec)
–导出:
mysql> select * from b;
±-----±-----+
| id | name |
±-----±-----+
| 1 | a1 |
| 2 | a2 |
±-----±-----+
2 rows in set (0.15 sec)
–将a文件夹导出到mysql中的b表
[hadoop@node1 lib]$ sqoop export --connect jdbc:mysql://node1/hive --username root --password 123456 --table b --export-dir /user/hive/warehouse/a --input-fields-terminated-by ‘,’
–sqoop eval连接mysql直接select和dml
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 --query ‘select * from a’
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 -e ‘select * from a’
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 -e “insert into a values (4,‘a4’)”
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 --query “insert into a values (5,‘a5’)”
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 -e “select * from a”
sqoop job --create myjob – import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a -m 1 --target-dir /test/a_old
sqoop job --list
sqoop job --show myjob
sqoop job --exec myjob
sqoop job --exec myjob – --username root -P
sqoop job --delete myjob
-sqoop codegen生成java代码
[hadoop@node1 ~]$ sqoop codegen --connect jdbc:mysql://node1/hive --username root --password 123456 --table a
…
15/11/25 00:25:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/0fc68731200a4f397cac20ef4a4c718f/a.jar
[hadoop@node1 ~]$ ll /tmp/sqoop-hadoop/compile/0fc68731200a4f397cac20ef4a4c718f/
总用量 28
-rw-rw-r–. 1 hadoop hadoop 8715 11月 25 00:25 a.class
-rw-rw-r–. 1 hadoop hadoop 3618 11月 25 00:25 a.jar
-rw-rw-r–. 1 hadoop hadoop 10346 11月 25 00:25 a.java
–mysql数据增量导入hive:
Incremental import arguments: --增量导入
–check-column Source column to check for incremental
change
–incremental Define an incremental import of type
‘append’ or ‘lastmodified’
–last-value Last imported value in the incremental
check column
–append不支持
Append mode for hive imports is not yet supported. Please remove the parameter --append-mode
–1.mysql中建表
drop table a;
create table a(id int,name varchar(50),crt_date timestamp);
insert into a values(1,‘a1’,sysdate());
insert into a values(2,‘a2’,sysdate());
insert into a values(3,‘a3’,sysdate());
select * from a;
mysql> select * from a;
±-----±-----±--------------------+
| id | name | crt_date |
±-----±-----±--------------------+
| 1 | a1 | 2015-11-25 12:41:39 |
| 2 | a2 | 2015-11-25 12:41:39 |
| 3 | a3 | 2015-11-25 12:41:39 |
±-----±-----±--------------------+
–2.第一次mysql导出到a_1,a_1不要创建
sqoop import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a -m 1 --target-dir /test/a_1
–3.插入数据
mysql> insert into a values(4,‘a4’,sysdate());
mysql> insert into a values(5,‘a5’,sysdate());
mysql> select * from a;
±-----±-----±--------------------+
| id | name | crt_date |
±-----±-----±--------------------+
| 1 | a1 | 2015-11-25 12:41:39 |
| 2 | a2 | 2015-11-25 12:41:39 |
| 3 | a3 | 2015-11-25 12:41:39 |
| 4 | a4 | 2015-11-25 13:46:42 |
| 5 | a5 | 2015-11-25 13:46:42 |
±-----±-----±--------------------+
–4.第二次导出
sqoop import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a -m 1 --target-dir /test/a_2 --incremental lastmodified --check-column crt_date --last-value “2015-11-25 12:41:40”
–where crt_date>=“2015-11-25 12:41:40”,时间要比id=3大一点,不然会把前面3条导进去
[hadoop@node1 ~]$ hadoop fs -cat /test/a_old/*
1,a1,2015-11-25 12:41:39.0
2,a2,2015-11-25 12:41:39.0
3,a3,2015-11-25 12:41:39.0
[hadoop@node1 ~]$ hadoop fs -cat /test/a_new/*
4,a4,2015-11-25 13:46:42.0
5,a5,2015-11-25 13:46:42.0
–5.生成a.jar
sqoop codegen --connect jdbc:mysql://node1/hive --username root --password 123456 --table a
/tmp/sqoop-hadoop/compile/6e3034f9fa9b0b46716ff31aee94c2e4/a.jar
[hadoop@node1 ~]$ ll /tmp/sqoop-hadoop/compile/6e3034f9fa9b0b46716ff31aee94c2e4/
-rw-rw-r–. 1 hadoop hadoop 10321 11月 25 14:31 a.class
-rw-rw-r–. 1 hadoop hadoop 4201 11月 25 14:31 a.jar
-rw-rw-r–. 1 hadoop hadoop 12969 11月 25 14:31 a.java
–6.合并,a_merge不要创建,–class-name a(这里是表名)
sqoop merge --new-data /test/a_2 --onto /test/a_1 --target-dir /test/a_merge --jar-file /tmp/sqoop-hadoop/compile/6e3034f9fa9b0b46716ff31aee94c2e4/a.jar --class-name a --merge-key id
[hadoop@node1 ~]$ hadoop fs -ls /test/a_merge
-rw-r–r-- 3 hadoop supergroup 0 2015-11-25 15:57 /test/a_merge/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 135 2015-11-25 15:57 /test/a_merge/part-r-00000 --hive后面load进去后会在这里删除
[hadoop@node1 6e3034f9fa9b0b46716ff31aee94c2e4]$ hadoop fs -cat /test/a_merge/part*
1,a1,2015-11-25 12:41:39.0
2,a2,2015-11-25 12:41:39.0
3,a3,2015-11-25 12:41:39.0
4,a4,2015-11-25 13:46:42.0
5,a5,2015-11-25 13:46:42.0
–7.导入hive
hive> create table a(id int,name string,crt_date string) row format delimited fields terminated by ‘,’;
hive> load data inpath ‘/test/a_merge/part*’ into table a;
hive> show create table a;
OK
CREATE TABLE a
(
id
int,
name
string,
crt_date
string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://odscluster/user/hive/warehouse/a’
TBLPROPERTIES (
‘COLUMN_STATS_ACCURATE’=‘true’,
‘numFiles’=‘1’,
‘totalSize’=‘135’,
‘transient_lastDdlTime’=‘1448437545’)
Time taken: 0.485 seconds, Fetched: 17 row(s)
–8.检查数据文件,会从hdfs中移动到hive
[hadoop@node1 ~]$ hadoop fs -ls /test/a_merge
-rw-r–r-- 3 hadoop supergroup 0 2015-11-25 15:57 /test/a_merge/_SUCCESS
[hadoop@node1 ~]$ hadoop fs -ls /user/hive/warehouse/a
-rwxr-xr-x 3 hadoop supergroup 135 2015-11-25 15:57 /user/hive/warehouse/a/part-r-00000
hive> select * from a;
OK
1 a1 2015-11-25 12:41:39.0
2 a2 2015-11-25 12:41:39.0
3 a3 2015-11-25 12:41:39.0
4 a4 2015-11-25 13:46:42.0
5 a5 2015-11-25 13:46:42.0