操作如下
>>> val rdd= sc.textFile("hdfs://123.123.123.123:8020/user/hmh/spark/spark1.input")
>>> rdd.map(_.split(" ")).collect
报错如下:
java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$CompleteRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetPublicMethods(Class.java:2902)
at java.lang.Class.privateGetPublicMethods(Class.java:2911)
at java.lang.Class.getMethods(Class.java:1615)
at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:451)
at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:339)
at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:639)
at java.lang.reflect.Proxy$ProxyClassFactory.apply(Proxy.java:557)
at java.lang.reflect.WeakCache$Factory.get(WeakCache.java:230)
at java.lang.reflect.WeakCache.get(WeakCache.java:127)
at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:419)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:719)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:92)
at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:537)
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:365)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:262)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:628)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:574)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:147)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
......................
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:570)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:190)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
定位问题:在 hadoop 目录下查找 protobuf-java*.jar架包
>>> find . -name "protobuf-java-*"
hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/protobuf-java-2.5.0.jar
再打包spark源包pom.xml找 protobuf.version
>>> <hadoop.version>1.0.4</hadoop.version>
<protobuf.version>2.4.1</protobuf.version>
<yarn.version>${hadoop.version}</yarn.version>
两边的版本不一致。
解决方法:
重新build spark,
1、修改pom.xml中protobuf.version版本为hadoop使用的版本;
2、清空maven本地仓库,不然引起冲突导致无法bulid
>>> cd /root/.m2/repository
>>> rm -rf *
3、打包 spark
>>> ./make-distribution.sh --tgz -Phadoop-2.5 -Dhadoop.version=2.5.0-cdh5.3.6 -Pyarn -Phive-0.13.1 -Phive-thriftserver
4、重新测试
scala> val rdd= sc.textFile("hdfs://123.123.123.123:8020/user/hmh/spark/spark1.input")
scala> rdd.map(_.split(" ")).collect
res0: Array[Array[String]] = Array(Array(aa, 69), Array(cc, 87), Array(bb, 97),
完美解决。
打包过程中可能的错误是 Failed to execute goal org.scala-tools:maven-scala-plugin
解决方案关闭hadoop、spark所有进程,JPS查看。重新打包。