Sqoop部署和操作

Sqoop介绍

sqoop是把数据从RDBMS和hadoop之间进行导入导出,底层是使用MapReduce来实现的,只有Map没有Reduce

部署

  1. 下载CDH版本wget http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.16.2.tar.gz
  2. 解压 tar -zxvf sqoop-1.4.6-cdh5.16.2.tar.gz -C ~/app/
  3. 配置系统环境变量
    vi .bashrc
    export SQOOP_HOME=/home/hadoop/app/sqoop-1.4.6-cdh5.16.2
    export PATH=$SQOOP_HOME/bin:$PATH
    source .bashrc
    
  4. 配置文件
    cp sqoop-env-template.sh sqoop-env.sh
    export HADOOP_COMMON_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2
    export HADOOP_MAPRED_HOME=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2
    export HIVE_HOME=/home/hadoop/app/hive-1.1.0-cdh5.16.2
    
  5. 拷贝驱动包
    # json & mysql jdbc jar包
    cp mysql-connector-java-5.1.27-bin.jar $SQOOP_HOME/lib
    cp java-json.jar $SQOOP_HOME/lib
    # hive jar包
    cp hive-common-1.1.0-cdh5.16.2.jar $SQOOP_HOME/lib
    cp hive-shims* $SQOOP_HOME/lib
    # hive 配置文件
    cp hive-site.xml $SQOOP_HOME/conf
    

操作命令

  1. 查看数据库和表
    sqoop list-databases \
    --connect jdbc:mysql://hadoop02:3306 \
    --password abc123 \
    --username root
    
    sqoop list-tables \
    --connect jdbc:mysql://hadoop02:3306/feidata \
    --password abc123 \
    --username root
    

MySql导入到HDFS上

  1. 表的数据导入到HDFS:sqoop import

    sqoop import \
    --connect jdbc:mysql://hadoop01:3306/feidata \
    --password abc123 \
    --username root \
    --delete-target-dir \
    --table salgrade \
    --split-by 'GRADE'
    [hadoop@hadoop01 ~]$ hdfs dfs -ls salgrade
    19/12/27 14:49:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 5 items
    -rw-r--r--   1 hadoop supergroup          0 2019-12-27 14:49 salgrade/_SUCCESS
    -rw-r--r--   1 hadoop supergroup         11 2019-12-27 14:49 salgrade/part-m-00000
    -rw-r--r--   1 hadoop supergroup         12 2019-12-27 14:49 salgrade/part-m-00001
    -rw-r--r--   1 hadoop supergroup         12 2019-12-27 14:49 salgrade/part-m-00002
    -rw-r--r--   1 hadoop supergroup         24 2019-12-27 14:49 salgrade/part-m-00003
    [hadoop@hadoop01 ~]$ hdfs dfs -cat salgrade/part*
    19/12/27 14:50:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    1,700,1200
    2,1201,1400
    3,1401,2000
    4,2001,3000
    5,3001,9999
    

    注意:sqoop java.lang.NoClassDefFoundError: org/json/JSONObject,sqoop缺少java-json.jar包.下载java in json的jar包放入进去

  2. Mysql导入到HDFS上,指定字段和分割符,空字段指定为’’,非空字段指定为’0’

    sqoop import \
    --connect jdbc:mysql://hadoop01:3306/feidata \
    --password abc123 --username root \
    --table emp  \
    --delete-target-dir \
    --mapreduce-job-name FromMySQL2HDFS \
    --columns "EMPNO,ENAME,JOB,SAL,COMM" \
    --target-dir EMP_COLUMN_QUERY \
    --fields-terminated-by '\t' \
    --null-string '' \
    --null-non-string '0' \
    -m 1
    
    [hadoop@hadoop01 ~]$ hdfs dfs -cat EMP_COLUMN_QUERY/part*
    19/12/27 15:08:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    7369	SMITH	CLERK	800.00	0
    7499	ALLEN	SALESMAN	1600.00	300.00
    7521	WARD	SALESMAN	1250.00	500.00
    7566	JONES	MANAGER	2975.00	0
    7654	MARTIN	SALESMAN	1250.00	1400.00
    7698	BLAKE	MANAGER	2850.00	0
    7782	CLARK	MANAGER	2450.00	0
    7788	SCOTT	ANALYST	3000.00	0
    7839	KING	PRESIDENT	5000.00	0
    7844	TURNER	SALESMAN	1500.00	0.00
    7876	ADAMS	CLERK	1100.00	0
    7900	JAMES	CLERK	950.00	0
    7902	FORD	ANALYST	3000.00	0
    7934	MILLER	CLERK	1300.00	0
    

    注意:This transfer can be faster! Use the --direct

  3. Mysql通过查询语句执行导出到HDFS上

    sqoop import \
    --connect jdbc:mysql://hadoop01:3306/sqoop \
    --password abc123 --username root \
    --delete-target-dir \
    --mapreduce-job-name FromMySQL2HDFS \
    --target-dir JOIN \
    --fields-terminated-by '\t' \
    --null-string '' \
    --null-non-string '0' \
    --query 'select e.empno,e.ename,e.deptno,d.dname from emp e join dept d on e.deptno=d.deptno and $CONDITIONS' \
    -m 1
    [hadoop@hadoop01 ~]$ hdfs dfs -cat JOIN/part*
    19/12/27 15:19:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    7369	SMITH	20	RESEARCH
    7499	ALLEN	30	SALES
    7521	WARD	30	SALES
    7566	JONES	20	RESEARCH
    7654	MARTIN	30	SALES
    7698	BLAKE	30	SALES
    7782	CLARK	10	ACCOUNTING
    7788	SCOTT	20	RESEARCH
    7839	KING	10	ACCOUNTING
    7844	TURNER	30	SALES
    7876	ADAMS	20	RESEARCH
    7900	JAMES	30	SALES
    7902	FORD	20	RESEARCH
    7934	MILLER	10	ACCOUNTING
    
  4. Mysql指定某个字段的某个值之后数据追加到之前的文件

    sqoop import \
    --connect jdbc:mysql://hadoop01:3306/sqoop \
    --password abc123 --username root \
    --table emp  \
    --mapreduce-job-name FromMySQL2HDFS \
    --target-dir EMP_APPEND \
    --fields-terminated-by '\t' \
    --null-string '' \
    --incremental append \
    --check-column EMPNO \
    --last-value 7788 \
    --null-non-string '0' \
    -m 1
    

    在这里插入图片描述

  5. 只执行mysql的语句

    sqoop eval \
    --connect jdbc:mysql://hadoop01:3306/feidata \
    --password mysqladminroot --username root \
    --query "insert into dept values (60,'RD', 'BEIJING')"
    

HDFS导出到MySql上

  • 操作命令

    sqoop export \
    --connect jdbc:mysql://hadoop01:3306/feidata \
    --password mysqladminroot \
    --username root \
    --table emp_demo \
    --export-dir /user/hadoop/emp \
    -m 1
    
    指定列导出数据
    sqoop export \
    -Dsqoop.export.records.per.statement=10 \  #批量导出
    --connect jdbc:mysql://hadoop01:3306/feidata \
    --password mysqladminroot \
    --username root \
    --table emp_demo \
    --export-dir /user/hadoop/EMP_COLUMN_QUERY \
    --columns "EMPNO,ENAME,JOB,SAL,COMM" \
    --fields-terminated-by '\t' \
    -m 1
    

Mysql导入到Hive

  • 在Hive创建表

    CREATE TABLE emp_import_partition(
    empno int,
    ename string,
    job string,
    mgr int,
    hiredate string,
    sal double,
    comm double,
    deptno int
    )
    partitioned by (pt string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
    sqoop import \
    --connect jdbc:mysql://hadoop01:3306/feidata \
    --password mysqladminroot \
    --username root \
    --table emp \
    --hive-overwrite \
    --delete-target-dir \
    --hive-import --hive-database fei_hive \
    --hive-table emp_import_partition \
    --hive-partition-key 'pt' \
    --hive-partition-value '2019-12-30' \
    --fields-terminated-by '\t' \
    -m 1
    

    注意: ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.解决方案:将hive 里面的lib下的hive-exec-**.jar 放到sqoop 的lib 下可以解决以下问题

Hive数据导出到Mysql

  • 操作命令
    sqoop export \
    --connect jdbc:mysql://hadoop01:3306/feidata \
    --password mysqladminroot \
    --username root \
    --table dept_demo \
    --export-dir /user/hive/warehouse/fei_hive.db/dept \
    --fields-terminated-by '\t' \
    -m 1
    

创建sqoop的job

  • 操作命令
    sqoop job --create feidata-sqoop-job -- \
    import --connect jdbc:mysql://hadoop01:3306/sqoop \
    --password abc123 \
    --username root \
    --table emp \
    --delete-target-dir
    
    注意:密码明文的,如何从文件中加载密码???

ETL操作

需求:emp和dept表是在Mysql中,把Mysql的数据抽取到Hive进行统计分析,然后将统计结果会写到Mysql中

  1. Hive中创建emp_etl和dept_etl两张表

    CREATE TABLE emp_etl(
    empno int,
    ename string,
    job string,
    mgr int,
    hiredate string,
    sal double,
    comm double,
    deptno int
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
    CREATE TABLE dept_etl(
    deptno int,
    dname string,
    loc string
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
  2. 将mysql数据导入到hive中

    sqoop import \
    --connect jdbc:mysql://hadoop01:3306/sqoop \
    --password abc123 \
    --username root \
    --table emp \
    --hive-overwrite \
    --delete-target-dir \
    --hive-import --hive-database fei_hive \
    --hive-table emp_etl \
    --fields-terminated-by '\t' \
    -m 1
    
    sqoop import \
    --connect jdbc:mysql://hadoop01:3306/sqoop \
    --password abc123 \
    --username root \
    --table dept \
    --hive-overwrite \
    --delete-target-dir \
    --hive-import --hive-database fei_hive \
    --hive-table dept_etl \
    --fields-terminated-by '\t' \
    -m 1
    
  3. 在Hive中创建结果表并加载数据

    CREATE TABLE result_etl(
    empno int,
    ename string,
    deptno int,
    dname string
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
    
    insert overwrite table result_etl select e.empno, e.ename, e.deptno, d.dname from emp_etl e join dept_etl d on e.deptno=d.deptno;
    
  4. 把结果数据导出到Mysql中

    在mysql中创建结果表result_etl
    create table result_etl(
    empno int,
    ename varchar(10),
    deptno int,
    dname varchar(20)
    );
    
    把hive统计结果导出到mysql中
    sqoop export \
    --connect jdbc:mysql://hadoop01:3306/sqoop \
    --password abc123 \
    --username root \
    --table result_etl \
    --export-dir /user/hive/warehouse/fei_hive.db/result_etl \
    --fields-terminated-by '\t' \
    -m 1
    
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值