一、问题
使用Intellij IDEA构建工程,将spark mlib训练的模型放到resources下,训练的模型包括data和metadata两个部分,其中在程序加载metadata时,报校验和异常。Java代码加载模型的代码如下:
SparkConf conf = new SparkConf().setMaster("local").setAppName("modelPredict").set("spark.sql.warehouse.dir", System.getProperty("riskArsenalWeb.root") + "/spark-warehouse/");
SparkContext sc = new SparkContext(conf);
String path = GetPredictResultServiceImpl.class.getClassLoader().getResource("/model/rfMllibModel").toString();
RandomForestModel rfModel = RandomForestModel.load(sc, path);
具体异常如下:
org.apache.hadoop.fs.ChecksumException: Checksum file not a length multiple of checksum size in file:/D:/Git/risk-arsenal/risk-arsenal-web/target/risk-arsenal-web/WEB-INF/classes/model/rfMllibModel/metadata/part-00000 at 0 checksumpos: 8 sumLenread: 11
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:233) ~[hadoop-common-2.2.0.jar:na]
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:275) [hadoop-common-2.2.0.jar:na]
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:227) [hadoop-common-2.2.0.jar:na]
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:195) [hadoop-common-2.2.0.jar:na]
at java.io.DataInputStream.read(DataInputStream.java:100) [na:1.7.0_79]
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211) [hadoop-common-2.2.0.jar:na]
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) [hadoop-common-2.2.0.jar:na]
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206) [hadoop-mapreduce-client-core-2.2.0.jar:na]
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45) [hadoop-mapreduce-client-core-2.2.0.jar:na]
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:255) [spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:209) [spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) [spark-core_2.11-2.0.0.jar:2.0.0]
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) [spark-core_2.11-2.0.0.jar:2.0.0]
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) [scala-library-2.11.8.jar:na]
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:389) [scala-library-2.11.8.jar:na]
at scala.collection.Iterator$class.foreach(Iterator.scala:893) [scala-library-2.11.8.jar:na]
at sc