创建目录sqoop,以下所有目录和文件都放在sqoop目录下。
1.创建目录lib,拷贝msql的jar包到该目录
mysql-connector-java.jar
2.job.properties文件
内容如下:
nameNode=hdfs://ip-150:8020
jobTracker=ip-150:8032
queueName=default
oozieappsRoot=user/hue/oozie-apps
DataRoot=user/hue/oozie/datas
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/${oozieappsRoot}/sqoop/workflow.xml
outputDir=sqoop/output
3. workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="sqoop-wf">
<start to="sqoop-node"/>
<action name="sqoop-node">
<sqoop xmlns="uri:oozie:sqoop-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://ip-150:9083</value>
</property>
</configuration>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:mysql://10.39.21.95:3306/test</arg>
<arg>--username</arg>
<arg>xxx</arg>
<arg>--password</arg>
<arg>cccc</arg>
<arg>--query</arg>
<arg>SELECT * from user where 1=1 and $CONDITIONS </arg>
<arg>--fields-terminated-by</arg>
<arg>\t</arg>
<arg>-m</arg>
<arg>2</arg>
<arg>--hive-database</arg>
<arg>huetest</arg>
<arg>--hive-table</arg>
<arg>autouser</arg>
<arg>--hive-import</arg>
<arg>--hive-overwrite</arg>
<arg>--split-by</arg>
<arg>id</arg>
<arg>--target-dir</arg>
<arg>/user/hive/warehouse/huetest.db/autouser</arg>
</sqoop>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
这里需要注意,sqoop的所有参数以arg的方式传入,否则会报错"unrecgnized argu……balabala"。如果哪个小伙伴实验command方式传入成功了,麻烦给我留言告诉我怎么配置的,先谢谢了。
3.上传文件到hdfs文件系统
4.执行
export OOZIE_URL=http://ip-150:11000/oozie/
oozie job --config sqoop/job.properties -run
然后打开oozie的web页面可以看到执行状态以及hadoop的日志:
http://ip-150:11000/oozie