Sqoop常用操作

首先保证HDFS和HiveServer2正常运行,集群运行在debugo01,debugo02,debugo03三台主机上。

1. 准备mysql数据

在debugo03的MySQL中新建一个测试数据库,并建测试表employee_salary。

mysql -uroot -p
mysql> create database test_sqoop;
Query OK, 1 row affected (0.00 sec)
mysql> use test_sqoop;
SET FOREIGN_KEY_CHECKS=0;
DROP TABLE IF EXISTS `employee_salary`;
CREATE TABLE `employee_salary` (
  `name` text,
  `id` int(8) NOT NULL AUTO_INCREMENT,
  `salary` int(8) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=latin1;
INSERT INTO `employee_salary` VALUES ('zhangsan', '1', '5000');
INSERT INTO `employee_salary` VALUES ('lisi', '2', '5500');
commit;
 
CREATE USER 'test'@'%' IDENTIFIED BY 'test';
GRANT ALL PRIVILEGES ON test_sqoop.* TO 'test'@'%';

2. 安装sqoop

yum install sqoop
cp /usr/share/java/mysql-connector-java.jar /usr/lib/sqoop/lib

3. 常用命令

(1) sqoop help

[root@hadoop01 ~]# sqoop help
Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

(2) 列出所有数据库(可用于测试连接)
一般用于测试连接,得到的结果只是该mysql用户拥有权限的数据库。

sqoop list-databases --connect jdbc:mysql://debugo03 --username test --password test
information_schema
test_sqoop

(3) 列出所有表

sqoop list-tables --connect jdbc:mysql://debugo03/test_sqoop --username test --password test
employee_salary

(4) 导出mysql表到hdfs
其中的-D mapred.job.queue.name=lesson是用来指定yarn的执行队列。
–m 1用来指定map任务个数为1。

sqoop import -D mapred.job.queue.name=lesson --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --m 1 --target-dir /user/sqoop

在hdfs上创建测试目录

su - hdfs
hdfs dfs -mkdir /user/sqoop
hdfs dfs -chown sqoop:hadoop /user/sqoop

执行导出

su - sqoop -s /bin/sh
sqoop import -D mapred.job.queue.name=lesson --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --m 1 --target-dir /user/sqoop/employee_salary

如果出现下面的错误,请更新/usr/lib/sqoop/mysql-java-connector.jar文件。ISSUE: https://issues.apache.org/jira/browse/SQOOP-1400

ERROR manager.SqlManager: Error reading from database: java
.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2c176ab7 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2c1
76ab7 is still active.

查看hdfs数据,与mysql数据库中employee_salary表一致:

ERROR manager.SqlManager: Error reading from database: java
.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2c176ab7 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@2c1
76ab7 is still active. 

(5) 导出mysql表全部数据到hive
在hive创建对应的数据库和表

0: jdbc:hive2://debugo02:10000> create database test_sqoop;
No rows affected (0.147 seconds)
0: jdbc:hive2://debugo02:10000> show databases;            
+----------------+--+
| database_name  |
+----------------+--+
| default        |
| sales          |
| test_sqoop     |
+----------------+--+
3 rows selected (0.046 seconds)

使用sqoop创建表

sqoop create-hive-table -D mapred.job.queue.name=work --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --hive-table test_sqoop.employee_salary
OK
Time taken: 1.515 seconds

使用sqoop将数据从mysql导入hive
需要确认集群中的任何一个节点都可以登录mysql。

sqoop import -D mapred.job.queue.name=lesson --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --hive-import  --hive-table test_sqoop.employee_salary
OK
Time taken: 0.698 seconds

检查结果,成功!

(6) 将数据从hive导入mysql
先删除mysql employee_salary表中的数据

mysql> use test_sqoop;
Database changed
mysql> show tables;
+----------------------+
| Tables_in_test_sqoop |
+----------------------+
| employee_salary      |
+----------------------+
1 row in set (0.00 sec)
mysql> truncate employee_salary;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from employee_salary;
Empty set (0.00 sec)

导数据

sqoop export -D mapred.job.queue.name=work --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --export-dir /user/hive/warehouse/test_sqoop.db/employee_salary/ --input-fields-terminated-by '\0001'
......
15/04/01 19:40:55 INFO mapreduce.ExportJobBase: Exported 2 records

查看结果,记录已经成功导出到mysql中。

(7) 从mysql表增量导入数据到hive
在mysql表中插入增量数据
插入一条wangwu

mysql> insert into employee_salary values('wangwu',3,6000);
Query OK, 1 row affected (0.00 sec)
mysql> commit;
Query OK, 0 rows affected (0.00 sec)
 
mysql> select * from employee_salary;
+----------+----+--------+
| name     | id | salary |
+----------+----+--------+
| lisi     |  2 |   5500 |
| zhangsan |  1 |   5000 |
| wangwu   |  3 |   6000 |
+----------+----+--------+
3 rows in set (0.00 sec)

增量导入

sqoop import -D mapred.job.queue.name=lesson --connect jdbc:mysql://debugo03/test_sqoop --username test --password test --table employee_salary --hive-import  --hive-table  
test_sqoop.employee_salary --check-column id --incremental append --last-value 2
OK
Time taken: 0.77 seconds
Loading data to table test_sqoop.employee_salary
Table test_sqoop.employee_salary stats: [numFiles=3, numRows=0, totalSize=42,
 rawDataSize=0]OK
Time taken: 0.625 seconds

查看结果

select * from test_sqoop.employee_salary;

转载于:https://my.oschina.net/boltwu/blog/719798

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值