sqoop问题

https://blog.csdn.net/gdmzlhj1/article/details/50483171

sqoop1.4.6或sqoop1.4.7 安装和使用,对应hadoop版本hadoop-2.7.1

将mysql中的数据导入到hive中报错:

ERROR tool.ImportTool:Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf

这是因为sqoop需要一个hive的包,将hive/lib中的hive-common-2.3.3.jar拷贝到sqoop的lib目录中

Exception in thread “main” java.lang.NoSuchMethodError: org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosTicket(Ljavax/security/auth/Subject;)Z
缺少hadoop-common-2.7.1.jar 包,注意hadoop版本和sqoop版本对应

sqoop1.4.6安装

sqoop2不建议生产中使用,这里测试sqoop1.

–下载sqoop-1.4.6
http://mirror.bit.edu.cn/apache/sqoop/1.4.6/
[ ] sqoop-1.4.6.bin__hadoop-0.23.tar.gz 08-May-2015 16:28 16M

[ ] sqoop-1.4.6.bin__hadoop-1.0.0.tar.gz 08-May-2015 16:28 16M

[ ] sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz 08-May-2015 16:28 16M

[ ] sqoop-1.4.6.tar.gz 08-May-2015 16:28 2.1M

选择对应的hadoop版本,我的hadoop版本是hadoop-2.7.1

[hadoop@node1 ~]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

–设置环境变量
[hadoop@node1 bin]$ sudo vi /etc/profile
export JAVA_HOME=/home/hadoop/jdk1.7.0_67
export HADOOP_HOME=/home/hadoop/hadoop-2.7.1
export ZK_HOME=/home/hadoop/zookeeper-3.4.6
export HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin
export HBASE_HOME=/home/hadoop/hbase-1.1.2
export SQOOP_HOME=/home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha
export PATH= P A T H : PATH: PATH:{JAVA_HOME}/bin: H A D O O P H O M E / b i n : {HADOOP_HOME}/bin: HADOOPHOME/bin:{HADOOP_HOME}/sbin: Z K H O M E / b i n : {ZK_HOME}/bin: ZKHOME/bin:{HIVE_HOME}/bin: H B A S E H O M E / b i n : {HBASE_HOME}/bin: HBASEHOME/bin:{SQOOP_HOME}/bin

[hadoop@node1 bin]$ sqoop help
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/…/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/…/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/hadoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/…/zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
15/11/24 13:44:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]

–把sqoop/bin/configure-sqoop里面的两段内容注释掉就可以了。根据fail搜索

Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information

See ‘sqoop help COMMAND’ for information on a specific command.

–设置配置文件
[hadoop@node1 conf]$ cp sqoop-env-template.sh sqoop-env.sh
[hadoop@node1 conf]$ vi sqoop-env.sh
[hadoop@node1 conf]$ vi sqoop-site.xml

–复制需要的类
[hadoop@node1 ~]$ cp $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.1.jar S Q O O P H O M E / l i b [ h a d o o p @ n o d e 1 m y s q l − c o n n e c t o r − j a v a − 5.1.37 ] SQOOP_HOME/lib [hadoop@node1 mysql-connector-java-5.1.37] SQOOPHOME/lib[hadoop@node1mysqlconnectorjava5.1.37] cp mysql-connector-java-5.1.37-bin.jar $SQOOP_HOME/lib

mysql-connector-java-5.1.37-bin.jar这个包才有用


–附配置:

[hadoop@node1 conf]$ vi sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.1/
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.1/
export HBASE_HOME=/home/hadoop/hbase-1.1.2
export HIVE_HOME=/home/hadoop/apache-hive-1.2.1-bin
export ZOOCFGDIR=/home/hadoop/zookeeper-3.4.6/conf

[hadoop@node1 conf]$ vi sqoop-site.xml
sqoop list-databases --connect jdbc:mysql://node1 --username root --password 123456

sqoop.metastore.client.autoconnect.url jdbc:hsqldb:file:/tmp/sqoop-meta/meta.db;shutdown=true jdbc:mysql://node1/hive?useUnicode=true&characterEncoding=utf-8 The connect string to use when connecting to a job-management metastore. If unspecified, uses ~/.sqoop/. You can specify a different path here. sqoop.metastore.client.autoconnect.username root The username to bind to the metastore. sqoop.metastore.client.autoconnect.password 123456 The password to bind to the metastore.

–测试:

[hadoop@node1 hadoop]$ sudo service mysqld start
正在启动 mysqld: [确定]

[hadoop@node1 hadoop]$ mysql -uroot -p123456
mysql>use
mysql> create table a(id int,name varchar(50));
mysql> insert into a values(1,‘a1’);
mysql> insert into a values(2,‘a2’);
mysql> commit;
mysql> select * from a;
±-----±-----+
| id | name |
±-----±-----+
| 1 | a1 |
| 2 | a2 |
±-----±-----+

[hadoop@node1 lib]$ sqoop create-hive-table --connect jdbc:mysql://node1/hive --username root --password 123456 --table a --hive-table a --fields-terminated-by ‘,’ --hive-overwrite
[hadoop@node1 conf]$ sqoop list-tables --connect jdbc:mysql://node1/hive --username root --password 123456
a
mysql导入到hive中,导出格式如上方
[hadoop@node1 lib]$ sqoop import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a --hive-table a --hive-import --fields-terminated-by ‘,’ --hive-overwrite -m 1

mysql> create table b(id int,name varchar(50)); --先建立表
Query OK, 0 rows affected (0.13 sec)

–导出:
mysql> select * from b;
±-----±-----+
| id | name |
±-----±-----+
| 1 | a1 |
| 2 | a2 |
±-----±-----+
2 rows in set (0.15 sec)

–将a文件夹导出到mysql中的b表
[hadoop@node1 lib]$ sqoop export --connect jdbc:mysql://node1/hive --username root --password 123456 --table b --export-dir /user/hive/warehouse/a --input-fields-terminated-by ‘,’

–sqoop eval连接mysql直接select和dml
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 --query ‘select * from a’
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 -e ‘select * from a’
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 -e “insert into a values (4,‘a4’)”
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 --query “insert into a values (5,‘a5’)”
[hadoop@node1 lib]$ sqoop eval --connect jdbc:mysql://node1/hive --username root --password 123456 -e “select * from a”

sqoop job --create myjob – import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a -m 1 --target-dir /test/a_old
sqoop job --list
sqoop job --show myjob
sqoop job --exec myjob
sqoop job --exec myjob – --username root -P
sqoop job --delete myjob

-sqoop codegen生成java代码
[hadoop@node1 ~]$ sqoop codegen --connect jdbc:mysql://node1/hive --username root --password 123456 --table a

15/11/25 00:25:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/0fc68731200a4f397cac20ef4a4c718f/a.jar

[hadoop@node1 ~]$ ll /tmp/sqoop-hadoop/compile/0fc68731200a4f397cac20ef4a4c718f/
总用量 28
-rw-rw-r–. 1 hadoop hadoop 8715 11月 25 00:25 a.class
-rw-rw-r–. 1 hadoop hadoop 3618 11月 25 00:25 a.jar
-rw-rw-r–. 1 hadoop hadoop 10346 11月 25 00:25 a.java


–mysql数据增量导入hive:

Incremental import arguments: --增量导入
–check-column Source column to check for incremental
change
–incremental Define an incremental import of type
‘append’ or ‘lastmodified’
–last-value Last imported value in the incremental
check column

–append不支持
Append mode for hive imports is not yet supported. Please remove the parameter --append-mode

–1.mysql中建表
drop table a;
create table a(id int,name varchar(50),crt_date timestamp);
insert into a values(1,‘a1’,sysdate());
insert into a values(2,‘a2’,sysdate());
insert into a values(3,‘a3’,sysdate());
select * from a;
mysql> select * from a;
±-----±-----±--------------------+
| id | name | crt_date |
±-----±-----±--------------------+
| 1 | a1 | 2015-11-25 12:41:39 |
| 2 | a2 | 2015-11-25 12:41:39 |
| 3 | a3 | 2015-11-25 12:41:39 |
±-----±-----±--------------------+

–2.第一次mysql导出到a_1,a_1不要创建
sqoop import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a -m 1 --target-dir /test/a_1

–3.插入数据
mysql> insert into a values(4,‘a4’,sysdate());
mysql> insert into a values(5,‘a5’,sysdate());
mysql> select * from a;
±-----±-----±--------------------+
| id | name | crt_date |
±-----±-----±--------------------+
| 1 | a1 | 2015-11-25 12:41:39 |
| 2 | a2 | 2015-11-25 12:41:39 |
| 3 | a3 | 2015-11-25 12:41:39 |
| 4 | a4 | 2015-11-25 13:46:42 |
| 5 | a5 | 2015-11-25 13:46:42 |
±-----±-----±--------------------+

–4.第二次导出
sqoop import --connect jdbc:mysql://node1/hive --username root --password 123456 --table a -m 1 --target-dir /test/a_2 --incremental lastmodified --check-column crt_date --last-value “2015-11-25 12:41:40”

–where crt_date>=“2015-11-25 12:41:40”,时间要比id=3大一点,不然会把前面3条导进去

[hadoop@node1 ~]$ hadoop fs -cat /test/a_old/*
1,a1,2015-11-25 12:41:39.0
2,a2,2015-11-25 12:41:39.0
3,a3,2015-11-25 12:41:39.0

[hadoop@node1 ~]$ hadoop fs -cat /test/a_new/*
4,a4,2015-11-25 13:46:42.0
5,a5,2015-11-25 13:46:42.0

–5.生成a.jar
sqoop codegen --connect jdbc:mysql://node1/hive --username root --password 123456 --table a
/tmp/sqoop-hadoop/compile/6e3034f9fa9b0b46716ff31aee94c2e4/a.jar

[hadoop@node1 ~]$ ll /tmp/sqoop-hadoop/compile/6e3034f9fa9b0b46716ff31aee94c2e4/
-rw-rw-r–. 1 hadoop hadoop 10321 11月 25 14:31 a.class
-rw-rw-r–. 1 hadoop hadoop 4201 11月 25 14:31 a.jar
-rw-rw-r–. 1 hadoop hadoop 12969 11月 25 14:31 a.java

–6.合并,a_merge不要创建,–class-name a(这里是表名)
sqoop merge --new-data /test/a_2 --onto /test/a_1 --target-dir /test/a_merge --jar-file /tmp/sqoop-hadoop/compile/6e3034f9fa9b0b46716ff31aee94c2e4/a.jar --class-name a --merge-key id

[hadoop@node1 ~]$ hadoop fs -ls /test/a_merge
-rw-r–r-- 3 hadoop supergroup 0 2015-11-25 15:57 /test/a_merge/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 135 2015-11-25 15:57 /test/a_merge/part-r-00000 --hive后面load进去后会在这里删除
[hadoop@node1 6e3034f9fa9b0b46716ff31aee94c2e4]$ hadoop fs -cat /test/a_merge/part*
1,a1,2015-11-25 12:41:39.0
2,a2,2015-11-25 12:41:39.0
3,a3,2015-11-25 12:41:39.0
4,a4,2015-11-25 13:46:42.0
5,a5,2015-11-25 13:46:42.0

–7.导入hive
hive> create table a(id int,name string,crt_date string) row format delimited fields terminated by ‘,’;
hive> load data inpath ‘/test/a_merge/part*’ into table a;
hive> show create table a;
OK
CREATE TABLE a(
id int,
name string,
crt_date string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘hdfs://odscluster/user/hive/warehouse/a’
TBLPROPERTIES (
‘COLUMN_STATS_ACCURATE’=‘true’,
‘numFiles’=‘1’,
‘totalSize’=‘135’,
‘transient_lastDdlTime’=‘1448437545’)
Time taken: 0.485 seconds, Fetched: 17 row(s)

–8.检查数据文件,会从hdfs中移动到hive
[hadoop@node1 ~]$ hadoop fs -ls /test/a_merge
-rw-r–r-- 3 hadoop supergroup 0 2015-11-25 15:57 /test/a_merge/_SUCCESS

[hadoop@node1 ~]$ hadoop fs -ls /user/hive/warehouse/a
-rwxr-xr-x 3 hadoop supergroup 135 2015-11-25 15:57 /user/hive/warehouse/a/part-r-00000

hive> select * from a;
OK
1 a1 2015-11-25 12:41:39.0
2 a2 2015-11-25 12:41:39.0
3 a3 2015-11-25 12:41:39.0
4 a4 2015-11-25 13:46:42.0
5 a5 2015-11-25 13:46:42.0

阅读终点,创作起航,您可以撰写心得或摘录文章要点写篇博文。去创作
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

qq_18219755

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值