DBMS、hdfs、hive之间的数据转换之sqoop

最新推荐文章于 2024-04-06 22:20:19 发布

涂作权的博客

最新推荐文章于 2024-04-06 22:20:19 发布

阅读量4k

点赞数

分类专栏： # Sqoop（Hadoop和RDBMS数据转换工具）

本文链接：https://blog.csdn.net/tototuzuoquan/article/details/81479319

版权

Sqoop（Hadoop和RDBMS数据转换工具）专栏收录该内容

9 篇文章 0 订阅

订阅专栏

1 使用sqoop进行数据导入导出

将B线数据中心所有的数据表放到xxxxx数据库下。

1.1 导入区域编码表

bin/sqoop import –connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1?useSSL=false –username root –password 123456 –target-dir /xxxx/xxxx/sys_area –table tb_sys_area –m 1;

成功之后导入的文件目录为：

/bplan/data-center/sys_area/part-m-00000

如下图：
这里写图片描述

1.2 导入行业基础数据

bin/sqoop import –connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1?useSSL=false –username root –password 123456 –target-dir /bplan/data-center/sys_industry –table tb_sys_industry –m 1;

1.3 特别注意

如果按照上面的方式导入数据，那么数据间的间隔符号默认为”,”，若需自定义分割符则加入—fields-terminated-by ‘\t’;如：

bin/sqoop import --connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1?useSSL=false --username root --password 123456 --target-dir /xxxx/xxxx/sys_industry_1  --table tb_sys_industry --m 1 --fields-terminated-by '\t';

成功后导入的文件目录为：

/bplan/data-center/sys_industry/part-m-00000

如下图：
这里写图片描述

1.4 将区域数据和行业数据导入到hive中

root@bigdata2 hive-2.3.2]# cd $HIVE_HOME
root@bigdata2 hive-2.3.2]# bin/hive

创建新数据库data_center;
# create database data_center;

这里写图片描述

切换数据库 data_center;
# user data_center;

这里写图片描述

创建hive 区域信息表
#CREATE TABLE IF NOT EXISTS tb_sys_area (
id int comment '主键id',
code string comment '编码',
name string comment "地区名称",
parent_code int comment "父级地区编码",
short_name string comment "地区简称",
level_type smallint comment '地区层级',
city_code string comment '城市编码',
zip_code string comment '邮政编码',
merger_name string comment '地区全称',
pinyin string comment '地区拼音',
pingan_area_name string comment '平安银行地区名称'
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

将在hdfs中的数据导入到hive中

load data inpath '/bplan/data-center/sys_area/part-m-00000' into table data_center.tb_sys_area ;

1.5 Sqoop直接将数据导入到hive中

在hive中创建tb_sys_industry

# CREATE TABLE IF NOT EXISTS tb_sys_industry (
id int comment '主键id',
category_id string comment '类目id ',
parent_category_id string comment "上级类目id ",
root_category_id int comment "根类目id ",
category_name string comment "类目名称",
weixin_category_id smallint comment '微信类目id ',
merger_name string comment '地区全称',
mybank_category_id string comment '网商银行类目id '
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';   
//如果是通过mysql直接导入到hive中，此步骤可以不做

使执行sqoop导入

#cd $SQOOP_HOME
#bin/sqoop import --connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1 --username root --password 123456 --table tb_sys_industry --fields-terminated-by ',' --delete-target-dir --num-mappers 1 --hive-import --hive-database data_center --hive-table tb_sys_industry;

注意：在此过程中可能会出现如下：
执行异常：ERROR tool.ImportTool: Import failed: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
解决：sqoop需要一个hive的包，将hive/lib中的hive-common-2.3.3.jar拷贝到sqoop的lib目录中。

1.6 Hive中的数据导入到mysql中

在MySQL中新建hive导入的表

CREATE TABLE `tb_sys_industry_1` (
  `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `category_id` VARCHAR(50) NOT NULL COMMENT '类目id',
  `parent_category_id` VARCHAR(50) DEFAULT NULL COMMENT '上级类目id',
  `root_category_id` VARCHAR(50) NOT NULL COMMENT '根类目id',
  `category_name` VARCHAR(100) NOT NULL COMMENT '类目名称',
  `weixin_category_id` VARCHAR(50) DEFAULT NULL COMMENT '微信类目id',
  `merger_name` VARCHAR(100) DEFAULT NULL,
  `mybank_category_id` VARCHAR(50) DEFAULT NULL COMMENT '网商银行类目id',
  PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=157 DEFAULT CHARSET=utf8mb4 COMMENT='行业信息’

在sqoop中执行：

# bin/sqoop export --connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1 --username root --password 123456 --table tb_sys_industry_1 --export-dir /user/hive/warehouse/data_center.db/tb_sys_industry/part-m-00000

 如果分割符不一样的话 则命令后+ --input-fields-terminated-by '\t'；

注意：
hive默认的字段分隔符为’\001’,sqoop默认的分隔符是’,’。
–input-fields-terminated-by：表示用于hive或hdfs数据导出到外部存储分隔参数；
–fields-terminated-by：表示用于外面存储导入到hive或hdfs中需要实现字段分隔的参数；

1.7 HIVE数据备份

进入hive;

use nginx_log;

仿照MySQL方式进行表数据备份

create table nginx_log_info_20180724 as select * from nginx_log_info;

把Hive中的表数据备份到磁盘中。
备份示例：

insert overwrite local directory '/home/bigdata_bak/nginx_log /nginx_log_info_20180724' ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE select * from nginx_log_info;

1.8 把磁盘数据导入到hive中

创建表：

CREATE TABLE IF NOT EXISTS nginx_log_info_20180724 (
id bigint comment '主键id',
product_name string comment '所属业务',
remote_addr string comment "远程服务器ip",
access_time int comment "访问时间，格式为：yyyyMMdd",
access_timestamp double comment "时间戳",
time_zone string comment '时区',
request_type string comment '请求类型',
request_url string comment '请求url',
request_protocol string comment '请求协议',
status smallint comment '请求状态',
body_bytes_sent int comment '发送内容大小',
request_body string comment '请求体',
http_referer string comment 'http引用页',
http_user_agent string comment 'http_user_agent',
os_name string comment '操作系统名称',
os string comment '操作系统',
browser_name string comment '浏览器名称',
browser_version string comment '浏览器版本',
device_type string comment '设备类型',
browser string comment '浏览器',
access_tool string comment '类型',
http_x_forwarded_for string comment 'http_x_forwarded_for',
request_time double comment '请求响应时间'
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '/home/bigdata_bak/nginx_log /nginx_log_info_20180724' OVERWRITE INTO TABLE nginx_log_info_20180724;

清空表数据：

insert overwrite table nginx_log_info_20180724 select * from nginx_log_info_20180724 where 1=0;

涂作权的博客

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
DBMS、hdfs、hive之间的数据转换之sqoop

1 使用sqoop进行数据导入导出将B线数据中心所有的数据表放到xxxxx数据库下。1.1 导入区域编码表bin/sqoop import –connect jdbc:mysql://xxx.xxx.xxx.142:3306/db1?useSSL=false –username root –password 123456 –target-dir /xxxx/xxxx/sys_...
复制链接

扫一扫