数据来源:
1:关系性数据库 sqoop(Sql To Hadoop)
2:文件(Flume实时抽取数据)
任务调度:Oozie
hadoop生态系统中重要的框架,需要监控(统一WEB UI界面)
1:下载Sqoop
官网:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html
Sqoop依赖于hadoop,Sqoop底层实现就是MapReduce
sqoop使用:以mysql数据库为例
1:查看mysql中有几个database:
sqoop list-databases \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123
2:import(-m,--num-mappers <n> Use 'n' map)
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--table emp \
-m 1
2.1:import(--target-dir HDFS plain)
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--table emp \
--target-dir /user/beifeng/sqoop/import_emp \
-m 1
2.2:import
--where <where clause> WHERE clause to use during
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--where "city='sec-bad'" \
--target-dir /user/beifeng/sqoop/import_emp_add \
--table emp_add \
--m 1
2.3:import(指定导入文件格式)
--as-avrodatafile Imports data to Avro data files
--as-parquetfile Imports data to Parquet files
--as-sequencefile Imports data to SequenceFiles
--as-textfile Imports data as plain text
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir /user/beifeng/sqoop/import_emp2 \
--table emp \
--as-parquetfile \
--m 1
2.4:import(只导出某个特定的列)
--columns <col,col,col...> Columns to import from table
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir /user/beifeng/sqoop/import_emp4 \
--table emp \
--m 1 \
--columns name,salary
2.5:在实际项目中,要处理的数据,需要进行初步清洗和过滤(查询语句:query())
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--query 'select name,salary from emp where $CONDITIONS and salary < 50000' \
--target-dir /user/beifeng/sqoop/import_emp5 \
--m 1
2.6:压缩
-z,--compress Enable compression
--compression-codec <codec> Compression codec to use for import
--fields-terminated-by <char> Sets the field separator character
--delete-target-dir Imports data in delete mode
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir /user/beifeng/sqoop/import_emp6 \
--table emp \
--m 1 \
--compress \
--compression-codec org.apache.hadoop.io.compress.DefaultCodec \
--fields-terminated-by '\t' \
--delete-target-dir
2.7:import(increment:增量数据导入)
增量数据的导入:有一个唯一标识符,通常这个表都有一个字段,类似于插入时间createtime
query: createtime >= startTime and create <= endTime;
sqoop:
Incremental import arguments:
--check-column <column> Source column to check for incremental change
--incremental <import-type> Define an incremental import of type 'append' or 'lastmodified'
--last-value <value> Last imported value in the incremental check column
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir /user/beifeng/sqoop/import_emp7 \
--table emp \
--incremental append \
--check-column salary \
--last-value 4 \
--m 1
log截取:
INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`salary`) FROM `emp`
INFO tool.ImportTool: Incremental import based on column `salary`
INFO tool.ImportTool: Lower bound value: 4
INFO tool.ImportTool: Upper bound value: 50000
2.8:import(--direct)
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir /user/beifeng/sqoop/import_emp7 \
--delete-target-dir \
--table emp \
--direct \
--m 1
3:export
把mysql数据库的数据导入到文件系统中并且进行压缩,在导入到Hive表中(必须要先将数据进行压缩);
在HIVE数据库中创建表:
drop table if exists default.hive_emp_zlib;
CREATE TABLE default.hive_emp_zlib (
id string,
name string,
deg string,
salary string,
dept string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
load data inpath '/user/beifeng/sqoop/import_emp6' into table default.hive_emp_zlib;
#####################导出数据RDBMS#########################
HIVE TABLE
table:hiveserver2进行jdbc方式查询数据
hdfs file:export ---> MySQL/oracle/db2
创建my_user表,准备数据:
create table my_user(
id varchar(10),
name varchar(10),
password varchar(10)
);
hadoop dfs -mkdir /user/beifeng/sqoop/exp/user
hadoop -put user.txt /user/beifeng/sqoop/exp/user
1:export --> MYSQL
--export-dir <dir> HDFS source path for the export
--fields-terminated-by <char> Sets the field separator character
sqoop export \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--export-dir /user/beifeng/sqoop/exp/user \
--table my_user \
--m 1
##################### mysql import To Hive#########################
Hive arguments:
--create-hive-table
Fail if the target hive table exists
--hive-database <database-name>
Sets the database name to use when importing to hive
--hive-delims-replacement <arg>
Replace Hive record \0x01 and row delimiters (\n\r) from imported string fields with user-defined string
--hive-drop-import-delims
Drop Hive record \0x01 and row delimiters (\n\r) from imported string fields
--hive-home <dir>
Override $HIVE_HOME
--hive-import
Import tables into Hive(Uses Hive's default delimiters if none are set.)
--hive-overwrite
Overwrite existing data in the Hive table
--hive-partition-key <partition-key>
Sets the partition key to use when importing to hive
--hive-partition-value <partition-value>
Sets the partition value to use when importing to hive
--hive-table <table-name>
Sets the table name to use when importing to hive
--map-column-hive <arg>
Override mapping for specific column to hive types.
1:创建Hive表:
create table default.my_user(
id string,
name string,
password string
)
row format delimited fields terminated by "\t";
2:执行导入
sqoop import \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--target-dir /user/beifeng/sqoop/import_my_user \
--delete-target-dir \
--table my_user \
--hive-database default \
--hive-import \
--hive-table my_user \
--fields-terminated-by '\t' \
--m 1
3:执行导出(将Hive中的数据导出到MySQL中)
--input-fields-terminated-by <char>
sqoop export \
--connect jdbc:mysql://localhost:3306/sqoop \
--username wql \
--password root123 \
--export-dir /user/hive/warehouse/my_user \
--table my_user2 \
--input-fields-terminated-by '\t' \
--m 1
4:MYSQL中创建my_user2表
create table my_user2(
id varchar(10),
name varchar(10),
password varchar(10)
);
############################ --options-file ###################################
创建一个sqoop-import-hdfs.txt文件:
import
--connect
jdbc:mysql://localhost:3306/sqoop
--username
wql
--password
root123
--target-dir
/user/beifeng/sqoop/import_my_user
--delete-target-dir
--table
my_user
--hive-database
default
--hive-import
--hive-table
my_user
--fields-terminated-by
'\t'
--m
1
sqoop --options-file /home/wql/app/sqoop/testdata/sqoop-import-hdfs.txt
实例:
$ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST
等价于:
$ sqoop --options-file /users/homer/work/import.txt --table TEST