利用sqoop,数据从mysql导入到hdfs、hbase、hive

     首先是转载的文章,介绍从mysql导数据到hdfs、hbase和hive的方式方法。 后面是自己在实际项目中导mysql数据到hdfs上的实例,供读者参考。

1.测试MySQL连接

bin/sqoop list-databases --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111'

2.检验SQL语句

bin/sqoop eval --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111' --query "SELECT * FROM TB_REGION WHERE REGION_ID = '00A1719A489D4F49906A8CA9661CCBE8'"

3.导入hdfs

3.1 导入

bin/sqoop import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111' --table TB_REGION --target-dir /sqoop/mysql/trade_dev/tb_region -m 5 --columns "code,name,category,farthercode,visible,regionlevel,region_id" --direct

3.2验证

hdfs dfs -cat /sqoop/mysql/trade_dev_tb_region/*01

4.导入hbase

4.1新建hbase表

hbase shell
create 'mysql_trade_dev', 'region'

4.2导入mysql数据到hbase

bin/sqoop import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111' --table TB_REGION --hbase-table mysql_trade_dev --hbase-row-key REGION_ID --column-family region

4.3验证

scan 'mysql_trade_dev'
count 'mysql_trade_dev'

5.导入hive

bin/sqoop import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111' --table TB_REGION --hive-import --create-hive-table --target-dir /user/hive/warehouse/tb_region --hive-table tb_region

6.增量hive

6.1 初始化导入hdfs

bin/sqoop job import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username mysql --password 111111 --table TB_DICTIONARY -m 1 --target-dir /sqoop/mysql/trade_dev/tb_dic --incremental append --check-column DIC_ID

返回数据:

16/09/07 10:27:06 INFO tool.ImportTool: --incremental append
16/09/07 10:27:06 INFO tool.ImportTool: --check-column DIC_ID
16/09/07 10:27:06 INFO tool.ImportTool: --last-value 287
16/09/07 10:27:06 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
6.2 创建hive外部表

CREATE EXTERNAL TABLE tb_dic (DIC_ID int, DOMAIN_ID STRING, DIC_TYPE_ID int, DESCRIPTION STRING, CODE int, NAME STRING, MNEMONIC STRING, ATTRIBUTE STRING, MARK_FOR_DEFAULT int, MARK_FOR_DELETE int, OPT_COUNTER int, CREATE_DATE STRING, CREATE_BY STRING, LAST_MODIFIED_DATE STRING, LAST_MODIFIED_BY STRING, ATTRIBUTE1 int, ATTRIBUTE2 int, ATTRIBUTE3 STRING, ATTRIBUTE4 STRING, ATTRIBUTE5 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/sqoop/mysql/trade_dev/tb_dic';

mysql建表语句
DROP TABLE IF EXISTS `TB_DICTIONARY`;
CREATE TABLE `TB_DICTIONARY` (
 `DIC_ID` int(11) NOT NULL AUTO_INCREMENT COMMENT '字典ID',
 `DOMAIN_ID` varchar(45) NOT NULL DEFAULT 'domain1' COMMENT '服务域区分的ID',
 `DIC_TYPE_ID` int(11) NOT NULL COMMENT '字典类型ID-外键-TB_DICTIONARY_TYPE',
 `DESCRIPTION` varchar(1024) NOT NULL COMMENT '转义码解释',
 `CODE` tinyint(2) NOT NULL COMMENT '转义码',
 `NAME` varchar(45) NOT NULL COMMENT '转义码对应含义',
 `MNEMONIC` varchar(45) DEFAULT NULL COMMENT '助记码',
 `ATTRIBUTE` varchar(45) DEFAULT NULL COMMENT '当前字典属性:如计量单位的量纲类型',
 `MARK_FOR_DEFAULT` tinyint(2) NOT NULL DEFAULT '0' COMMENT '默认标记(1为默认,0为非默认)',
 `MARK_FOR_DELETE` tinyint(2) NOT NULL DEFAULT '1' COMMENT '是否有效,1:有效;0:无效',
 `OPT_COUNTER` int(5) DEFAULT NULL COMMENT '版本管理标志',
 `CREATE_DATE` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建日期',
 `CREATE_BY` varchar(45) NOT NULL DEFAULT 'admin' COMMENT '创建人ID',
 `LAST_MODIFIED_DATE` datetime DEFAULT NULL COMMENT '修改日期',
 `LAST_MODIFIED_BY` varchar(45) DEFAULT NULL COMMENT '修改人ID',
 `ATTRIBUTE1` int(11) DEFAULT NULL,
 `ATTRIBUTE2` int(11) DEFAULT NULL,
 `ATTRIBUTE3` varchar(45) DEFAULT NULL,
 `ATTRIBUTE4` varchar(45) DEFAULT NULL,
 `ATTRIBUTE5` date DEFAULT NULL,
 PRIMARY KEY (`DIC_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=290 DEFAULT CHARSET=utf8 COMMENT='字典表-李思宇';

SET FOREIGN_KEY_CHECKS = 1;

6.3 更新增量数据

bin/sqoop job --create incjob -- import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username mysql --password 111111 --table TB_DICTIONARY -m 1 --target-dir /sqoop/mysql/trade_dev/tb_dic --incremental append --check-column DIC_ID --last-value 287
bin/sqoop job --exec incjob

6.4 验证

select count(*) from tb_dic;

返回数据:
第一次

Time taken: 0.068 seconds, Fetched: 489 row(s)

第二次

Time taken: 0.068 seconds, Fetched: 490 row(s)

7.整库导入测试

7.1新建hbase表

hbase shell
create 'new_table','data'

7.2导入mysql数据到hbase

bin/sqoop import-all-tables --connect jdbc:mysql://192.168.1.187:3306/new_schema --username mysql --password 111111 --hbase-create-table --hbase-table new_table --column-family data --hbase-bulkload

注意

整库导入要求每个表都有主键,不然会报错

16/09/08 15:03:50 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-xdata/compile/070fa1eda1e77fc70eaa0c532cfa94b8/nopk.jar
16/09/08 15:03:50 ERROR tool.ImportAllTablesTool: Error during import: No primary key could be found for table nopk. Please specify one with --split-by or perform a sequential import with '-m 1'.

或者每个表同一个可以做rowkey的字段(如id),增加--hbase-row-key id

7.3验证

scan 'new_table'
count 'new_table'

以上介绍转载自 作者:水他
链接:https://www.jianshu.com/p/5dcb893f2843
來源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

项目实践

接下来是我在实际项目中迁移mysql数据库中内容到hdfs中,私密信息已更改或者注释掉,读者可根据自己的实际情况填写。

一.mysql表

CREATE TABLE `auth_service_log` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'ID自增',
`userid` int(11) unsigned NOT NULL COMMENT '用户ID',
`appid` varchar(40) NOT NULL COMMENT 'appid',
`codedesc` varchar(50) NOT NULL COMMENT '约定错误码',
`type` int(4) NOT NULL COMMENT '1:身份认证,2:学历认证',
`code` int(4) NOT NULL COMMENT '认证结果',
`insertTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`updateTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新时间',
`isActive` tinyint(1) NOT NULL DEFAULT '1' COMMENT '是否有效',
`idNum` varchar(40) DEFAULT NULL COMMENT '身份证号',
`requestIp` bigint(15) DEFAULT NULL,
`responseIp` bigint(15) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_inserttime` (`insertTime`),
KEY `idx_updatetime` (`updateTime`),
KEY `idx_userid` (`userid`),
KEY `idx_codedesc` (`codedesc`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

注意:mysql建表,inserttime,updatetime,isActive皆为默认字段。

二.phoenix表以及索引

CREATE TABLE IF NOT EXISTS PERSON.AuthServiceLog (
 UserID INTEGER NOT NULL,
 AppId VARCHAR(40) NOT NULL,
 Type INTEGER NOT NULL,
 InsertTime DATE NOT NULL,
 Codedesc VARCHAR(100) NULL,
 Code INTEGER NULL,
 IdNum VARCHAR(40) NULL,
 RequestIp BIGINT NULL,
 ResponseIp BIGINT NULL,
 IsActive BOOLEAN NULL,
 UpdateTime DATE NULL,
 CONSTRAINT pk_AuthServiceLog PRIMARY KEY (UserId, AppId, Type, InsertTime)
) default_column_family='uas', SALT_BUCKETS=32;
CREATE INDEX IF NOT EXISTS idx_AuthServiceLog_UpdateTime ON PERSON.AuthServiceLog (UpdateTime)
 INCLUDE (UserId, AppId,Type,InsertTime,Codedesc,Code,IdNum,RequestIp,ResponseIp,IsActive)
 SALT_BUCKETS=32;
CREATE INDEX IF NOT EXISTS idx_AuthServiceLog_codedesc ON PERSON.AuthServiceLog (codedesc)
 INCLUDE (UserId, AppId,Type,InsertTime,Code,IdNum,RequestIp,ResponseIp,IsActive,UpdateTime)
 SALT_BUCKETS=32;

三.Sqoop操作命令

./sqoop import  --fields-terminated-by , --escaped-by \\ --enclosed-by '\"'   --connect 'jdbc:mysql://url:port/***' --username=*** --password=*** --query "SELECT UserId
,IF(AppId='',' ' , AppId) as APPID
,Type
,DATE_SUB(InsertTime,interval 8 hour ) InsertTime
,replace(replace(replace(replace(replace(replace(replace(IFNULL(codedesc,''),'\"',''),',',''),'''',''),'\r',''),'\n',''), '\\u00a0',''),'\u200B','') as codedesc
,code
,replace(replace(replace(replace(replace(replace(replace(IFNULL(IdNum,''),'\"',''),',',''),'''',''),'\r',''),'\n',''), '\\u00a0',''),'\u200B','') as IdNum
,IFNULL(RequestIp,'')
,IFNULL(ResponseIp ,'')
,IsActive
,DATE_SUB(updateTime,interval 8 hour ) updateTime
 FROM auth_service_log where  \$CONDITIONS "  --target-dir /AuthServiceLog -m 1

如此,即可完成数据从mysql导入到对应的hdfs中。

另:从sqlserver到数据到hdfs,前面操作基本类同,最后的sqoop操作命令样式如下:

 ./sqoop import  --fields-terminated-by , --escaped-by \\ --enclosed-by '\"'   --connect 'jdbc:sqlserver://url:port;database=***' --username=*** --password=*** --query "SELECT UserId
,Event_type
,DATEADD(HOUR,-8,CreationDate) CreationDate
,Action
,replace(replace(replace(replace(replace(replace(replace(replace(isnull(Description,''),'\"',''),',',''),'\',''),'''',''),'\r',''),'\n',''), '\\u00a0',''),'\u200B','') as Description
,IP
,IPLong
FROM Person_Activity where  \$CONDITIONS "  --target-dir /Person_Activity2 -m 1

    

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值