1、场景描述:
执行:
scala> val lines=sc.textFile("/user/dev_yx/dpi/input/rule/keyWord.txt")
scala> lines.count()
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 61 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 66 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
... 68 more
以 spark-submit 执行:
spark-submit --class com.spark --master
local --driver-memory 2g --executor-memory 2g --num-executors 10 --executor-cores 4 --queue test3 test-0.0.1-SNAPSHOT.jar
也报上述异常
改成这样执行:
spark-submit --class com.spark --master
yarn --deploy-mode cluster --driver-memory 2g --executor-memory 1g --num-executors 10 --executor-cores 4 --queue test3 original-test-0.0.1-SNAPSHOT.jar
可以正常运行
2、原因:
这是因为在hadoop
的core-site.xml
和mapred-site.xml
中开启了压缩,并且压缩式lzo
的。这就导致写入/
上传到hdfs
的文件自动被压缩为lzo
了。这个时候你使用sc.textFile
读取文件时就会报告一堆lzo
找不到的异常。
因为在Spark on Yarn的模式下HadoopYarn的配置yarn.nodemanager.local-dirs会覆盖Spark的Spark.local.dir;