sqoop 导数据从 mysql 到 hdfs,load 进 hive

sqoop 从 mysql 导数据到 hive 命令:

sqoop import --connect 'jdbc:mysql://127.0.0.1:8066/int_vst_wx' --username 'user' --password 'password' --table fact_interfaces_visits_wx1  --hive-import --create-hive-table --target-dir /warehouse/tablespace/managed/hive/fact_interfaces_visits_wx/fact_interfaces_visits_wx1 --hive-table fact_interfaces_visits_wx

sqoop 从 mysql 导数据到 hdfs:

sqoop import --connect 'jdbc:mysql://127.0.0.1:8066/int_vst_wx' --username 'user' --password 'password' --table fact_interfaces_visits_wx13 --target-dir /warehouse/tablespace/managed/hive/fact_interfaces_visits_wx/fact_interfaces_visits_wx13 -m 4 --columns "id,user_id,account,user_name,user_type,school_id,school_name,code,url,module,name,detail_info,ip,user_agent,visit_client,visit_env,create_time,duration,rescode,resmsg,req_id,sessionid,recv_time" --direct

操作 hdfs 文件备份:

sudo -u hdfs hdfs dfs -cp /warehouse/tablespace/managed/hive/fact_interfaces_visits_wx/* /staging/jzproduct/hive/mysql/fact_interfaces_visits_wx

hive 建表语句,分隔符注意最好不用空格,可以用 ,

CREATE TABLE `fact_interfaces_visits_wx`(`id` int ,
`user_id` string,
`account` string ,
`user_name` string,
`user_type` int,
`school_id` int,
`school_name` string,
`code` string,
`url` string,
`module` string,
`name` string,
`detail_info` string,
`ip` string,
`user_agent` string,
`visit_client` string,
`visit_env` string,
`create_time` string,
`duration` int,
`rescode` string,
`resmsg` string,
`req_id` string,
`sessionid` string,
`recv_time` string )                               
  ROW FORMAT SERDE                                    
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'   
  WITH SERDEPROPERTIES (                              
'field.delim'=',',                                
'line.delim'=',',                                
'serialization.format'=',')                       
  STORED AS INPUTFORMAT                               
'org.apache.hadoop.mapred.TextInputFormat'        
  OUTPUTFORMAT                                        
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  
  LOCATION                                            
	'hdfs://node1.etonedu.cn:8020/warehouse/tablespace/managed/hive/fact_interfaces_visits_wx'  
  TBLPROPERTIES (                                     
'bucketing_version'='2',                          
'transactional'='true',                           
'transactional_properties'='insert_only',         
'transient_lastDdlTime'='1597738181')  ;      

load hdfs 文件夹到指定表:

hive -e "load data inpath '/warehouse/tablespace/managed/hive/fact_interfaces_visits_wxs' into table default.fact_interfaces_visits_wx ;"

hive 创建内部表和外部表:(区别在于 external)

创建外部表:

CREATE external TABLE IF NOT EXISTS FACT_INTERFACES_VISITS_WX(ID int ,USER_ID string,ACCOUNT string ,USER_NAME string,USER_TYPE int,SCHOOL_ID int,SCHOOL_NAME string,CODE string,URL string,MODULE string,NAME string,DETAIL_INFO string,IP string,USER_AGENT string,VISIT_CLIENT string,VISIT_ENV string,CREATE_TIME DATE,DURATION int,RESCODE string,RESMSG string,REQ_ID string,SESSIONID string,RECV_TIME DATE )ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS ORC;

创建内部表:

CREATE  TABLE IF NOT EXISTS FACT_INTERFACES_VISITS_WX(ID int ,USER_ID string,ACCOUNT string ,USER_NAME string,USER_TYPE int,SCHOOL_ID int,SCHOOL_NAME string,CODE string,URL string,MODULE string,NAME string,DETAIL_INFO string,IP string,USER_AGENT string,VISIT_CLIENT string,VISIT_ENV string,CREATE_TIME DATE,DURATION int,RESCODE string,RESMSG string,REQ_ID string,SESSIONID string,RECV_TIME DATE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS ORC;

内部表和外部表的区别:
未被external修饰的是内部表(managed table),被external修饰的为外部表(external table);
区别:
内部表数据由Hive自身管理,外部表数据由HDFS管理;
内部表数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse),外部表数据的存储位置由自己制定(如果没有LOCATION,Hive将在HDFS上的/user/hive/warehouse文件夹下以外部表的表名创建一个文件夹,并将属于这个表的数据存放在这里);
删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除;
对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值