===================================安装sqoop====================================
1、下载解压
2、修改sqoop-env.sh
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/modules/hadoop-2.5.0-cdh5.3.6
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/modules/hadoop-2.5.0-cdh5.3.6
#Set the path to where bin/hive is available
export HIVE_HOME=/opt/modules/hive-0.13.1-cdh5.3.6
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/modules/zookeeper-3.4.5-cdh5.3.6/conf
3、拷贝驱动包
$ cp /home/beifeng/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar ./lib/
===================================sqoop的介绍=========================================
sqoop.apache.org
1、RDBMS --> HDFS
2、HDFS --> RDBMS(mysql)
import: RDBMS数据写入HDFS/Hive
export HDFS/Hive写入RDBMS
===================================sqoop使用=========================================
0、
bin/sqoop list-databases \
--connect jdbc:mysql://master:3306 \
--username root \
--password LEle.516
1、mysql导入到HDFS
* import
* 默认情况,-m = 4
bin/sqoop import \
--connect jdbc:mysql://master/hadoop \
--username root \
--password LEle.516 \
--table my_user \
--target-dir /user/hive/warehouse/hadoop/my_user \
--delete-target-dir \
-m 1 \
--fields-terminated-by "\t"
bin/sqoop import \
--connect jdbc:mysql://hadoop-senior.ibeifeng.com:3306/db_0625 \
--username root \
--password 123456 \
--table my_user \
--target-dir /user/hive/warehouse/db_0625/my_user \
--delete-target-dir \
-m 1 \
--fields-terminated-by "\t" \
--direct --使用这个参数,一定要安装mysqldump命令
$ rpm -qf /usr/bin/mysqldump
MySQL-client-5.6.24-1.el6.x86_64
2、增量导入
--check-column <column> Source column to check for incremental
change
--incremental <import-type> Define an incremental import of type
'append' or 'lastmodified'
--last-value <value> Last imported value in the incremental
check column
bin/sqoop import \
--connect jdbc:mysql://hadoop-senior.ibeifeng.com:3306/db_0625 \
--username root \
--password 123456 \
--table my_user \
--target-dir /user/hive/warehouse/db_0625/my_user \
-m 1 \
--fields-terminated-by "\t" \
--direct \
--check-column id \
--incremental append \
--last-value 4
3、指定部分字段导入
bin/sqoop import \
--connect jdbc:mysql://hadoop-senior.ibeifeng.com:3306/db_0625 \
--username root \
--password 123456 \
--table my_user \
--target-dir /user/hive/warehouse/db_0625/my_user \
-m 1 \
--delete-target-dir \
--fields-terminated-by "\t" \
--direct \
--where "id>5"
或者
--columns id,account
或者
--query 'select * from db_0625.my_user where $CONDITIONS'
--query 'select * from db_0625.my_user where id>5 and $CONDITIONS'
4、HDFS/Hive数据导入RDBMS
Hive分析的结构 --落地--> RDBMS(mysql) --> 报表echarts、execl...
* sqoop 小、大数据
* JDBC 小数据
$ bin/sqoop export --help
(1)mysql里面创建表
(2)
bin/sqoop export \
--connect jdbc:mysql://master:3306/hadoop \
--username root \
--password LEle.516 \
--table my_user_2 \
--export-dir /user/hive/warehouse/hadoop/my_user \
-m 1 \
--input-fields-terminated-by "\t"
HDFS --> Mysql
Hive --> Mysql
5、mysql导入Hive
(1)hive里面先创建表
create table mysql2hive(
id int,
account string,
passwd string
)
row format delimited fields terminated by "\t" ;
(2)
bin/sqoop import \
--connect jdbc:mysql://hadoop-senior.ibeifeng.com:3306/db_0625 \
--username root \
--password 123456 \
--table my_user \
-m 1 \
--delete-target-dir \
--hive-database db_mysql2hive \
--hive-import \
--hive-table mysql2hive \
--fields-terminated-by "\t"
总结:
HDFS/Hive --> Mysql export
Mysql --> HDFS import
Mysql --> Hive import
6、Sqoop执行方式
* 命令方式 + shell
* --options-file
bin/sqoop --options-file 文件(常用)
编辑mysql2hive.opt
import
--connect
jdbc:mysql://hadoop-senior.ibeifeng.com:3306/db_0625
--username
root
--password
123456
--delete-target-dir
--table
my_user
-m
1
--hive-database
db_mysql2hive
--hive-import
--hive-table
mysql2hive
--fields-terminated-by
"\t"
执行:
bin/sqoop --options-file mysql2hive.opt