6.1.CentOS7下安装Sqoop安装与使用

python编程快速上手(持续更新中…)

推荐系统基础


一、Sqoop介绍

作用:数据交换工具,可以实现数据在mysql/oracle<–>hdfs之间互相传递
原理:通过写sqoop命令把sqoop命令翻译成mapreduce,通过maperdece连接各种数据,实现数据传递

二、Sqoop原理

在这里插入图片描述

三、Sqoop安装

事前准备安装包

sqoop链接
提取码:4tkp

解压安装

将安装包托入/software目录下

tar -zxvf sqoop-1.4.6-cdh5.14.2.tar.gz -C /opt

进入/opt对sqoop改名

cd /opt/
mv sqoop-1.4.6-cdh5.14.2/ sqoop

配置环境变量

vi /etc/profile

export SQOOP_HOME=/opt/sqoop
export PATH= S Q O O P H O M E / b i n : SQOOP_HOME/bin: SQOOPHOME/bin:PATH

让配置文件生效

source /etc/profile

修改配置文件

cd sqoop/conf
mv sqoop-env-template.sh sqoop-env.sh

vi sqoop-env.sh

export HADOOP_COMMON_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=/opt/hadoop
export HIVE_HOME=/opt/hive
export ZOOKEEPER_HOME=/opt/zookeeper
export ZOOCFGDIR=/opt/zookeeper
export HBASE_HOME=/opt/hbase

将准备的两个jar包拖到/opt/sqoop/lib 目录下
在这里插入图片描述
验证输入
sqoop help

有命令出来就表示成功

运行sqoop1.4.5报Warning: does not exist! HCatalog jobs will fail.
进入bin

cd vi configure-sqoop

注释

## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi

四、Sqoop使用

1、MySQL->HDFS

准备sql脚本,放入自己知道的目录下
准备工作:mysql中建库建表

mysql> create database sqoop;

mysql> use sqoop;

mysql> source /tmp/retail_db.sql

mysql> show tables;

在这里插入图片描述
使用sqoop将customers表导入到hdfs上
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop //mysql上的数据库
–driver com.mysql.jdbc.Driver
–table customers //mysql上的表
–username root //mysql用户名
–password root //密码
–target-dir /tmp/customers //目标HDFS路径
–m 3 //map数量

sqoop import --connect jdbc:mysql://localhost:3306/sqoop --driver com.mysql.jdbc.Driver --table customers --username root --password root --target-dir /tmp/customers --m 3

使用where过滤
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–where “order_id<500”
–username root
–password root
–target-dir /data1/retail_db/orders
–m 3

使用colum 过滤
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop1
–driver com.mysql.jdbc.Driver
–table emp
–columns “EMPNO,ENAME,JOB,HIREDATE”
–where “SAL>2000”
–username root
–password root
–delete-target-dir
–target-dir /data1/sqoop1/emp
–m 3

使用查询语句
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–query “select * from orders where order_status!=‘CLOSED’ and $CONDITIONS”
–username root
–password root
–split-by order_id
–delete-target-dir
–target-dir /data1/retail_db/orders
–m 3

追加导入

sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–username root
–password root
–incremental append
–check-column order_date
–last-value ‘2014-07-24 00:00:00’
–target-dir /data1/retail_db/orders
–m 3

2、创建job

创建job 注意import前必须有空格
sqoop job
–create mysqlToHdfs
– import
–connect jdbc:mysql://localhost:3306/sqoop
–table orders
–username root
–password root
–incremental append
–check-column order_date
–last-value ‘0’
–target-dir /data1/retail_db/orders
–m 3

查看job

sqoop job --list

执行job

sqoop job --exec mysqlToHdfs

定时执行

crontab -e

  • 2 */1 * *
    sqoop job --exec mysqlToHdfs

3、导入数据到Hive中

先在Hive中创建表

hive -e “create database if not exists retail_db;”

如果目标路径存在会报错 删除已存在的目录
hdfs dfs -rmr hdfs://hadoop1:9000/user/root/orders1
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–table orders
–username root
–password root
–hive-import
–create-hive-table
–hive-database retail_db
–hive-table orders1
–m 3

导入数据到Hive分区中
删除Hive表

drop table if exists orders;

导入
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–query “select order_id,order_status from orders where order_date>=‘2013-11-03’ and order_date <‘2013-11-04’ and $CONDITIONS”
–username root
–password ok
–delete-target-dir
–target-dir /data1/retail_db/orders
–split-by order_id
–hive-import
–hive-database retail_db
–hive-table orders
–hive-partition-key “order_date”
–hive-partition-value “2013-11-03”
–m 3

注意:分区字段不能当成普通字段导入表中

4、导入数据到HBase中

1.在HBase中建表

create ‘products’,‘data’,‘category’

2.sqoop导入

sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–username root
–password ok
–table products
–hbase-table products
–column-family data
–m 3

5、HDFS向MySQL中导出数据

1.MySQL中建表

create table customers_demo as select * from customers where 1=2;

2.上传数据

hdfs dfs -mkdir /customerinput
hdfs dfs -put customers.csv /customerinput

3.导出数据

sqoop export
–connect jdbc:mysql://localhost:3306/sqoop
–driver com.mysql.jdbc.Driver
–username root
–password root
–table customers_demo
–export-dir /customerinput
–m 1

6、编写sqoop脚本

1.编写脚本 job_01.opt
import
--connect
jdbc:mysql://localhost:3306/sqoop
--driver com.mysql.jdbc.Driver
--table customers
--username root
--password root
--target-dir
/data/retail_db/customers
--delete-target-dir
--m 3
2.执行脚本

sqoop --options-file job_01.opt

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值