ETL工作中的小技巧总结
- beeline 直接连接hive
beeline -u jdbc:hive2://node01:10000/cpdl_raw -n root -p 123456 - yarn
查看yarn任务:yarn application -list
关闭yarn任务:yarn application -kill 任务号 - 在shell中连接oracle
sqlplus -L 用户/密码@IP:1521/库 <<EOF
修改,查询,删除
set serveroutput on
set linesize 120
set pagesize 0
set TAB off
set FEEDBACK off
set HEADING off
set TRIMOUT off
set Verify off
DELETE FROM test WHERE name=‘test’;
insert into test values(‘xiaoming’,15);
COMMIT;
QUIT
EOF
查询
YS_ABBR=sqlplus -S '用户/密码'@IP:1521/库 << ! set heading off set feedback off set pagesize 0 set verify off set echo off select * from test; exit !
- 集群之间数据传输
hadoop distcp -overwite -m 200 hdfs://node01:8020/tmp/test /tmp/test
distcp跳过检查
第一种方法:
-update -skipcrccheck
第二种方法:加-D参数
hadoop distcp -Ddfs.checksum.type=CRC32 -update src dst - mapjoin
mapjoin: /+mapjoin()/