We use Redis on Spark to cache our key-value pairs.This is the code:
import com.redis.RedisClient val r = new RedisClient("192.168.1.101", 6379) val perhit = perhitFile.map(x => { val arr = x.split(" ") val readId = arr(0).toInt val refId = arr(1).toInt val start = arr(2).toInt val end = arr(3).toInt val refStr = r.hmget("refStr", refId).get(refId).split(",")(1) val readStr = r.hmget("readStr", readId).get(readId) val realend = if(end > refStr.length - 1) refStr.length - 1 else end val refOneStr = refStr.substring(start, realend) (readStr, refOneStr, refId, start, realend, readId) })
But compiler gave me feedback like this:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1242) at org.apache.spark.rdd.RDD.map(RDD.scala:270) at com.ynu.App$.main(App.scala:511) at com.ynu.App.main(App.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.NotSerializableException: com.redis.RedisClient at java.io.ObjectOutputStream