sqoop 从 mysql 导数据到 hive 命令:
sqoop import --connect 'jdbc:mysql://127.0.0.1:8066/int_vst_wx' --username 'user' --password 'password' --table fact_interfaces_visits_wx1 --hive-import --create-hive-table --target-dir /warehouse/tablespace/managed/hive/fact_interfaces_visits_wx/fact_interfaces_visits_wx1 --hive-table fact_interfaces_visits_wx
sqoop 从 mysql 导数据到 hdfs:
sqoop import --connect 'jdbc:mysql://127.0.0.1:8066/int_vst_wx' --username 'user' --password 'password' --table fact_interfaces_visits_wx13 --target-dir /warehouse/tablespace/managed/hive/fact_interfaces_visits_wx/fact_interfaces_visits_wx13 -m 4 --columns "id,user_id,account,user_name,user_type,school_id,school_name,code,url,module,name,detail_info,ip,user_agent,visit_client,visit_env,create_time,duration,rescode,resmsg,req_id,sessionid,recv_time" --direct
操作 hdfs 文件备份:
sudo -u hdfs hdfs dfs -cp /warehouse/tablespace/managed/hive/fact_interfaces_visits_wx/* /staging/jzproduct/hive/mysql/fact_interfaces_visits_wx
hive 建表语句,分隔符注意最好不用空格,可以用 ,
CREATE TABLE `fact_interfaces_visits_wx`(`id` int ,
`user_id` string,
`account` string ,
`user_name` string,
`user_type` int,
`school_id` int,
`school_name` string,
`code` string,
`url` string,
`module` string,
`name` string,
`detail_info` string,
`ip` string,
`user_agent` string,
`visit_client` string,
`visit_env` string,
`create_time` string,
`duration` int,
`rescode` string,
`resmsg` string,
`req_id` string,
`sessionid` string,
`recv_time` string )
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'line.delim'=',',
'serialization.format'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://node1.etonedu.cn:8020/warehouse/tablespace/managed/hive/fact_interfaces_visits_wx'
TBLPROPERTIES (
'bucketing_version'='2',
'transactional'='true',
'transactional_properties'='insert_only',
'transient_lastDdlTime'='1597738181') ;
load hdfs 文件夹到指定表:
hive -e "load data inpath '/warehouse/tablespace/managed/hive/fact_interfaces_visits_wxs' into table default.fact_interfaces_visits_wx ;"
hive 创建内部表和外部表:(区别在于 external)
创建外部表:
CREATE external TABLE IF NOT EXISTS FACT_INTERFACES_VISITS_WX(ID int ,USER_ID string,ACCOUNT string ,USER_NAME string,USER_TYPE int,SCHOOL_ID int,SCHOOL_NAME string,CODE string,URL string,MODULE string,NAME string,DETAIL_INFO string,IP string,USER_AGENT string,VISIT_CLIENT string,VISIT_ENV string,CREATE_TIME DATE,DURATION int,RESCODE string,RESMSG string,REQ_ID string,SESSIONID string,RECV_TIME DATE )ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS ORC;
创建内部表:
CREATE TABLE IF NOT EXISTS FACT_INTERFACES_VISITS_WX(ID int ,USER_ID string,ACCOUNT string ,USER_NAME string,USER_TYPE int,SCHOOL_ID int,SCHOOL_NAME string,CODE string,URL string,MODULE string,NAME string,DETAIL_INFO string,IP string,USER_AGENT string,VISIT_CLIENT string,VISIT_ENV string,CREATE_TIME DATE,DURATION int,RESCODE string,RESMSG string,REQ_ID string,SESSIONID string,RECV_TIME DATE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS ORC;
内部表和外部表的区别:
未被external修饰的是内部表(managed table),被external修饰的为外部表(external table);
区别:
内部表数据由Hive自身管理,外部表数据由HDFS管理;
内部表数据存储的位置是hive.metastore.warehouse.dir(默认:/user/hive/warehouse),外部表数据的存储位置由自己制定(如果没有LOCATION,Hive将在HDFS上的/user/hive/warehouse文件夹下以外部表的表名创建一个文件夹,并将属于这个表的数据存放在这里);
删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除;
对内部表的修改会将修改直接同步给元数据,而对外部表的表结构和分区进行修改,则需要修复(MSCK REPAIR TABLE table_name;)