更多代码请见:https://github.com/xubo245/SparkLearning
Adam学习3之ADAMContext类没有找到
问题:
hadoop@Master:~/cloud/testByXubo/spark/hs38DH$ ./localNo.sh
Exception in thread "main" java.lang.NoClassDefFoundError: org/bdgenomics/adam/rdd/ADAMContext
at readFileFromHs38DH$.main(readFileFromHs38DH.scala:16)
at readFileFromHs38DH.main(readFileFromHs38DH.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.bdgenomics.adam.rdd.ADAMContext
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 11 more
待解决。。。20160305
解决了:
之前通过修改/etc/profile的CLASSPATH没用,
之后通过spark-submit时--jars导入adam的三个包解决!
hadoop@Master:~/cloud/testByXubo/spark/hs38DH$ ./localNo.sh
fq0.count:105887
Method 1=> Length:2971 sum:7888989 time:28765ms
Method 2=> Length:2971 sum2:7888989 time:2169ms
Method 3=> Length:2971 sum3:7888989.0 time:1169ms
#!/usr/bin/env bash
spark-submit --name hs38DH \
--class readFileFromHs38DH \
--master local \
--jars /home/hadoop/cloud/adam/lib/adam-apis_2.10-0.18.3-SNAPSHOT.jar,/home/hadoop/cloud/adam/lib/adam-cli_2.10-0.18.3-SNAPSHOT.jar,/home/hadoop/cloud/adam/lib/adam-core_2.10-0.18.3-SNAPSHOT.jar \
--executor-memory 512M \
--total-executor-cores 10 hs38DHNo.jar
集群脚本:
hadoop@Master:~/cloud/testByXubo/spark/hs38DH/package$ cat cluster.sh
#!/usr/bin/env bash
spark-submit --name hs38DH \
--class com.adam.code.hs38DH.readFileFromHs38DH \
--master spark://Master:7077 \
--jars /home/hadoop/cloud/adam/lib/adam-apis_2.10-0.18.3-SNAPSHOT.jar,/home/hadoop/cloud/adam/lib/adam-cli_2.10-0.18.3-SNAPSHOT.jar,/home/hadoop/cloud/adam/lib/adam-core_2.10-0.18.3-SNAPSHOT.jar \
--executor-memory 512M \
--total-executor-cores 20 readFileFromHs38DH.jar
运行结果:
hadoop@Master:~/cloud/testByXubo/spark/hs38DH/package$ ./cluster.sh
fq0.count:105887
Method 1=> Length:2971 sum:7888989 time:34126ms
Method 2=> Length:2971 sum2:7888989 time:5489ms
Method 3=> Length:2971 sum3:7888989.0 time:1720ms
集群的local和集群模式运行都没问题
代码文件:
package com.adam.code.hs38DH
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.bdgenomics.adam.rdd.ADAMContext
import htsjdk.samtools.ValidationStringency
import org.apache.tools.ant.taskdefs.Length
//import scala.collection.parallel.Foreach
object readFileFromHs38DH {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("ReadFile")
// .setMaster("local")
val sc = new SparkContext(conf)
val ac = new ADAMContext(sc)
// val file1 = "file/adam/hs38DH/hs38DH.fa"
val file1 = "hdfs://<strong>Master</strong>/xubo/adam/hs38DH/hs38DH.fa"
//load by SparkContext textFile
val fq0 = sc.textFile(file1)
// fq0.foreach(println)
println("fq0.count:" + fq0.count);
//load Fasta
val fq1 = ac.loadFasta(file1, 10000)
// println("fq1.partitions:" + fq1.partitions);
// println("fq1.partitions length:" + fq1.partitions.length);
// println("fq1.count:" + fq1.count);
// fq1.foreach(println)
//method 1
var startTime = System.currentTimeMillis();
var fq1Sequence = fq1.map(_.getFragmentSequence()).collect
val fq1Length = fq1Sequence.length
// println(fq1Count);
var sum = 0L;
for (i <- 0 until fq1Length) {
val a = fq1Sequence(i).length()
sum = sum + a
// println(sum + ":" + a);
}
var endTime = System.currentTimeMillis();
println("Method 1=> Length:" + fq1Length + " sum:" + sum + " time:" + (endTime - startTime) + "ms");
//method 2
startTime = System.currentTimeMillis();
var fq2Sequence = fq1.map(_.getFragmentLength()).collect
val fq2Length = fq2Sequence.length
// println("fq2Sequence.count:" + fq2Length);
var sum2 = 0L;
// fq2Sequence.foreach(println)
for (i <- 0 until fq2Length) {
sum2 = sum2 + fq2Sequence(i)
// println(sum2 + ":" + fq2Sequence(i));
}
// println(sum2);
endTime = System.currentTimeMillis();
println("Method 2=> Length:" + fq2Length + " sum2:" + sum2 + " time:" + (endTime - startTime) + "ms");
startTime = System.currentTimeMillis();
var fq3Sequence = fq1.map(_.getFragmentLength()).collect
val fq3Length = fq3Sequence.length
var sum3 = fq3Sequence.map(a => a.toDouble).sum;
// for (i <- 0 until fq3Length) {
// sum2 = sum2 + fq3Sequence(i)
// }
endTime = System.currentTimeMillis();
println("Method 3=> Length:" + fq3Length + " sum3:" + sum3 + " time:" + (endTime - startTime) + "ms");
}
}
Master为实际IP