1, 官网指南
a, 获取源码
http://sqoop.apache.org/docs/1.4.7/SqoopDevGuide.html
---->git源码:https://git-wip-us.apache.org/repos/asf/sqoop.git
b, sqoop抽数到hbase的实现逻辑
http://sqoop.apache.org/docs/1.4.7/SqoopDevGuide.html#_hbase_serialization_extensions
四个java实现类
wang@wang-pc:~$ git clone https://git-wip-us.apache.org/repos/asf/sqoop.git
Cloning into 'sqoop'...
remote: Counting objects: 49600, done.
remote: Compressing objects: 100% (13040/13040), done.
remote: Total 49600 (delta 24008), reused 48240 (delta 22898)
Receiving objects: 100% (49600/49600), 8.98 MiB | 426.00 KiB/s, done.
Resolving deltas: 100% (24008/24008), done.
Checking connectivity... done.
wang@wang-pc:~$ cd sqoop/src/java/org/apache/sqoop/hbase/
wang@wang-pc:~/sqoop/src/java/org/apache/sqoop/hbase$ ls
HBasePutProcessor.java PutTransformer.java
HBaseUtil.java ToStringPutTransformer.java
##HBasePutProcessor.java
## --> 创建对象obj (PutTransformer.java : ToStringPutTransformer.java)
## --> 调用obj.getMutationCommand( getMutationCommand(Map<String, Object> fields) )处理输入的kv对
2, CDH集群中sqoop调用过程
[root@eadage jars]# which sqoop
/usr/bin/sqoop
[root@eadage jars]# ll /usr/bin/sqoop
lrwxrwxrwx 1 root root 23 11月 15 12:14 /usr/bin/sqoop -> /etc/alternatives/sqoop
[root@eadage jars]# ll /etc/alternatives/sqoop
lrwxrwxrwx 1 root root 60 11月 15 12:14 /etc/alternatives/sqoop -> /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/bin/sqoop
[root@eadage jars]# vi /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/bin/sqoop
#SQOOP_JARS=`ls -f /var/lib/sqoop/*.jar /usr/share/java/*.jar 2>/dev/null`
#if [ -n "${SQOOP_JARS}" ]; then
# export HADOOP_CLASSPATH=$(JARS=(${SQOOP_JARS}); IFS=:; echo "${HADOOP_CLASSPATH}:${JARS[*]}")
#fi
#
#export SQOOP_HOME=$LIB_DIR/sqoop
#exec $LIB_DIR/sqoop/bin/sqoop "$@"
[root@eadage jars]# vi /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/sqoop/bin/sqoop
#source ${bin}/configure-sqoop "${bin}"
#exec ${HADOOP_COMMON_HOME}/bin/hadoop org.apache.sqoop.Sqoop "$@"
3, 自定义实现类
- 原调用逻辑: HBasePutProcessor–> PutTransformer抽象类–> ToStringPutTransformer实现类
所以操作如下:
- 第一步: 自定义类 extends PutTransformer抽象类 : 定义自己的逻辑
- 第二步: 编译jar,放入sqoop的lib目录下
- 运行命令: sqoop import -D sqoop.hbase.insert.put.transformer.class=xxx …