【kettle集成cdh6.1】hadoop file output浏览目录报错:java.lang.NoClassDefFoundError: com/ctc/wstx/io/SystemId

【kettle集成cdh6.1】外部数据源读写hdfs若干错

前言

最近试着上手了一下kettle,搭建过程很简单,就是下载个包解压一下,但是在配置数据源的过程中着实踩了不少坑,这里记录一下。

环境

这里介绍一下几个组件的版本

kettle: 8.0
CDH: 6.1.0
HADOOP: 3.0.0
MYSQL: 5.5.62

报错

在此之前,我已经从CDH HDFS管理页面将所需要的core-site.xml、hdfs-site.xml等文件下载并放置至相应的插件位置,又从HADOOP在里将hadoop-client-3.0.0-cdh6.1.0.jar、hadoop-common-3.0.0-cdh6.1.0.jar等jar包下载并放置至lib文件夹中,像网上通用的教程我基本上跟着都做了一遍,这里再贴一下CDH集群配置:
在这里插入图片描述
hdfs用户密码我没有写,因为我压根不知道密码(后来的操作也不受影响),点击测试一下:
在这里插入图片描述
看起来没什么大问题,但是当我将mysql的数据往hdfs上写,打算浏览一下hhdfs目录的时候,报错了:
在这里插入图片描述
报错明细:

无法打开这个步骤窗口
java.lang.NoClassDefFoundError: com/ctc/wstx/io/SystemId
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2825)
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2814)
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2865)
	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2839)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2716)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1353)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1325)
	at org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem.resolveFile(HdfsFileSystem.java:116)
	at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:84)
	at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:64)
	at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:790)
	at org.pentaho.di.core.vfs.ConcurrentFileSystemManager.resolveFile(ConcurrentFileSystemManager.java:91)
	at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:712)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:152)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:107)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:103)
	at org.pentaho.big.data.kettle.plugins.hdfs.trans.HadoopFileOutputDialog$29.widgetSelected(HadoopFileOutputDialog.java:1207)
	at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
	at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
	at org.pentaho.big.data.kettle.plugins.hdfs.trans.HadoopFileOutputDialog.open(HadoopFileOutputDialog.java:1316)
	at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:127)
	at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8728)
	at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3214)
	at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:780)
	at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
	at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
	at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1366)
	at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7984)
	at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9245)
	at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:692)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)
Caused by: java.lang.ClassNotFoundException: com.ctc.wstx.io.SystemId
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

问题分析

笔者一看这报错信息,不就是找不到一个名叫com.ctc.wstx.io.SystemId的类么,找来不就得了!
于是去往namenode所在节点的hadoop的lib目录下一瞧,真有“woodstox-core-5.0.3.jar”这个包:
在这里插入图片描述
毫无疑问,将其拖下放入kettle的lib目录并重启kettle,然后再查看一下hdfs的目录,这回报了个新错:
在这里插入图片描述
报错明细:

无法打开这个步骤窗口
java.lang.NoSuchMethodError: com.ctc.wstx.io.StreamBootstrapper.getInstance(Ljava/lang/String;Lcom/ctc/wstx/io/SystemId;Ljava/io/InputStream;)Lcom/ctc/wstx/io/StreamBootstrapper;
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2831)
	at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2814)
	at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2865)
	at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2839)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2716)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1353)
	at org.apache.hadoop.conf.Configuration.set(Configuration.java:1325)
	at org.apache.commons.vfs2.provider.hdfs.HdfsFileSystem.resolveFile(HdfsFileSystem.java:116)
	at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:84)
	at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:64)
	at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:790)
	at org.pentaho.di.core.vfs.ConcurrentFileSystemManager.resolveFile(ConcurrentFileSystemManager.java:91)
	at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:712)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:152)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:107)
	at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:103)
	at org.pentaho.big.data.kettle.plugins.hdfs.trans.HadoopFileOutputDialog$29.widgetSelected(HadoopFileOutputDialog.java:1207)
	at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
	at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
	at org.pentaho.big.data.kettle.plugins.hdfs.trans.HadoopFileOutputDialog.open(HadoopFileOutputDialog.java:1316)
	at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:127)
	at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:8728)
	at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:3214)
	at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:780)
	at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
	at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
	at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
	at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1366)
	at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7984)
	at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:9245)
	at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:692)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.pentaho.commons.launcher.Launcher.main(Launcher.java:92)

这回不是报找不到类com.ctc.wstx.io.StreamBootstrapper,而是报找不到类com.ctc.wstx.io.StreamBootstrapper中的getInstance方法了,笔者找来反编译软件进去看了看:
在这里插入图片描述
What fuck?该有的方法一个不少的好吧!看来问题绝不是缺包、包冲突那么简单,冷静下来想想(实则慌的一批,临近过年,老子还想早点出去神游嬉戏一番),似是数据源版本和kettle之间的问题。
改变了思路之后,笔者又经过一番查阅之后,终于通过换Keettle版本的方式搞定了这个棘手的问题,效果如下:
在这里插入图片描述

解决办法

不管是7.版本的kettle还是8.0的kettle,我们从“\data-integration\plugins\pentaho-big-data-plugin\hadoop-configurations”路径下看只能看到对hadoop2.、cdh5.*的支持,因此要想支持cdh6.1唯有自己寻找插件,或者是使用更高版本的kettle,这里贴一下kettle下载路径:

https://sourceforge.net/projects/pentaho/files/Pentaho%208.3/client-tools/

访问url下拉至底部,下载pdi-ce-8.3.0.0-371.zip:
在这里插入图片描述
cdh6.1-hadoop3.0插件url:

https://sourceforge.net/projects/pentaho/files/Pentaho%208.3/shims/

下拉找到pentaho-hadoop-shims-cdh61-package-8.3.2019.05.00-371-dist.zip点击下载:
在这里插入图片描述

后记

通过这个例子可以看出,版本选不对,后患是无穷,大家在选择组件的时候一定要擦亮眼睛选互相兼容的组件~

  • 20
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值