Sqoop import OR export

数据来源:
1:关系性数据库 sqoop(Sql To Hadoop)
2:文件(Flume实时抽取数据)

任务调度:Oozie

hadoop生态系统中重要的框架,需要监控(统一WEB UI界面)

1:下载Sqoop

官网:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

Sqoop依赖于hadoop,Sqoop底层实现就是MapReduce

sqoop使用:以mysql数据库为例

1:查看mysql中有几个database:
sqoop list-databases \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123

2:import(-m,--num-mappers <n> Use 'n' map)

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--table emp \
-m 1

2.1:import(--target-dir  HDFS plain)
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--table emp \
--target-dir /user/beifeng/sqoop/import_emp \
-m 1

2.2:import
--where <where clause>            WHERE clause to use during
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--where "city='sec-bad'" \
--target-dir  /user/beifeng/sqoop/import_emp_add \
--table emp_add \
--m 1

2.3:import(指定导入文件格式)
--as-avrodatafile                Imports data to Avro data files
--as-parquetfile                 Imports data to Parquet files
--as-sequencefile               Imports data to SequenceFiles
--as-textfile                       Imports data as plain text

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp2 \
--table emp \
--as-parquetfile \
--m 1

2.4:import(只导出某个特定的列)
--columns <col,col,col...>     Columns to import from table
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp4 \
--table emp \
--m 1 \
--columns name,salary

2.5:在实际项目中,要处理的数据,需要进行初步清洗和过滤(查询语句:query())
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--query 'select name,salary from emp where $CONDITIONS and salary < 50000' \
--target-dir  /user/beifeng/sqoop/import_emp5 \
--m 1

2.6:压缩

-z,--compress                                    Enable compression
--compression-codec <codec>            Compression codec to use for import
--fields-terminated-by <char>             Sets the field separator character
--delete-target-dir                               Imports data in delete mode

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp6 \
--table emp \
--m 1 \
--compress \
--compression-codec org.apache.hadoop.io.compress.DefaultCodec \
--fields-terminated-by '\t' \
--delete-target-dir

2.7:import(increment:增量数据导入)

增量数据的导入:有一个唯一标识符,通常这个表都有一个字段,类似于插入时间createtime
query: createtime >= startTime and create <= endTime;

sqoop:
Incremental import arguments:
--check-column <column>        Source column to check for incremental change
--incremental <import-type>    Define an incremental import of type 'append' or 'lastmodified'
--last-value <value>        Last imported value in the incremental check column

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp7 \
--table emp \
--incremental append \
--check-column salary \
--last-value 4 \
--m 1


log截取:
INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`salary`) FROM `emp`
INFO tool.ImportTool: Incremental import based on column `salary`
INFO tool.ImportTool: Lower bound value: 4
INFO tool.ImportTool: Upper bound value: 50000


2.8:import(--direct)

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_emp7 \
--delete-target-dir \
--table emp \
--direct \
--m 1


3:export
把mysql数据库的数据导入到文件系统中并且进行压缩,在导入到Hive表中(必须要先将数据进行压缩);
在HIVE数据库中创建表:
drop table if exists default.hive_emp_zlib;
CREATE TABLE default.hive_emp_zlib (
id string,
name string,
deg string,
salary string,
dept string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data inpath '/user/beifeng/sqoop/import_emp6' into table default.hive_emp_zlib;


#####################导出数据RDBMS#########################
HIVE TABLE

table:hiveserver2进行jdbc方式查询数据
hdfs file:export ---> MySQL/oracle/db2

创建my_user表,准备数据:
create table my_user(
id varchar(10),
name varchar(10),
password varchar(10)
);

hadoop dfs -mkdir /user/beifeng/sqoop/exp/user
hadoop -put user.txt /user/beifeng/sqoop/exp/user

1:export --> MYSQL
--export-dir <dir>                  HDFS source path for the export
--fields-terminated-by <char>      Sets the field separator character

sqoop export \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--export-dir  /user/beifeng/sqoop/exp/user \
--table my_user \
--m 1

##################### mysql import To Hive#########################

Hive arguments:
--create-hive-table                   
Fail if the target hive table exists

--hive-database <database-name>       
Sets the database name to use when importing to hive

--hive-delims-replacement <arg>       
Replace Hive record \0x01 and row delimiters (\n\r) from imported string fields with user-defined string

--hive-drop-import-delims             
Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields

--hive-home <dir>                    
Override $HIVE_HOME

--hive-import                         
Import tables into Hive(Uses Hive's default delimiters if none are set.)

--hive-overwrite                            
Overwrite existing data in the Hive table

--hive-partition-key <partition-key>       
Sets the partition key to use when importing to hive

--hive-partition-value <partition-value>    
Sets the partition value to use when importing to hive

--hive-table <table-name>                
Sets the table name to use when importing to hive

--map-column-hive <arg>                     
Override mapping for specific column to hive types.

1:创建Hive表:

create table default.my_user(
id string,
name string,
password string
)
row format delimited fields terminated by "\t";

2:执行导入

sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir  /user/beifeng/sqoop/import_my_user \
--delete-target-dir \
--table my_user \
--hive-database default \
--hive-import \
--hive-table my_user \
--fields-terminated-by '\t' \
--m 1

3:执行导出(将Hive中的数据导出到MySQL中)
--input-fields-terminated-by <char>

sqoop export \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--export-dir  /user/hive/warehouse/my_user \
--table my_user2 \
--input-fields-terminated-by '\t' \
--m 1

4:MYSQL中创建my_user2表

create table my_user2(
id varchar(10),
name varchar(10),
password varchar(10)
);

############################ --options-file ###################################

创建一个sqoop-import-hdfs.txt文件:

import
--connect
jdbc:mysql://localhost:3306/sqoop
--username
wql
--password
root123
--target-dir
/user/beifeng/sqoop/import_my_user
--delete-target-dir
--table
my_user
--hive-database
default
--hive-import
--hive-table
my_user
--fields-terminated-by
'\t'
--m
1

sqoop --options-file /home/wql/app/sqoop/testdata/sqoop-import-hdfs.txt


实例:
$ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST
等价于:
$ sqoop --options-file /users/homer/work/import.txt --table TEST

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值