Spark配置启用LZO压缩

假设你已经配好操作系统的LZO以及Hadoop的LZO

这里直接去配Spark的LZO,不然Spark提交作业的时候如果涉及到文件操作的话会报错

[hadoop@hadoop004 conf]$ pwd
/home/hadoop/app/spark-2.3.3-bin-2.6.0-cdh5.7.0/conf

[hadoop@hadoop004 conf]$ vim vim spark-defaults.conf


spark.driver.extraClassPath /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar
spark.executor.extraClassPath /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar

注意hadoop-lzo-0.4.21-SNAPSHOT.jar这个jar文件要和你机器上/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/里面的文件名一一对应

[hadoop@hadoop004 conf]$ vim spark-env.sh

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/lib/native
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/yarn/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/yarn/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/hdfs/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/hdfs/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/tools/lib/*:/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/jars/*

这就好了,我们打开spark-shell试一试

scala> import com.hadoop.compression.lzo.LzopCodec
import com.hadoop.compression.lzo.LzopCodec

scala> val lzoTest = sc.parallelize(1 to 10)
lzoTest: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:25

scala> lzoTest.saveAsTextFile("/input/test_lzo", classOf[LzopCodec])

[hadoop@hadoop004 conf]$ hdfs dfs -ls /input/test_lzo
Found 3 items
-rw-r--r--   1 hadoop supergroup          0 2019-05-15 11:27 /input/test_lzo/_SUCCESS
-rw-r--r--   1 hadoop supergroup         60 2019-05-15 11:27 /input/test_lzo/part-00000.lzo
-rw-r--r--   1 hadoop supergroup         61 2019-05-15 11:27 /input/test_lzo/part-00001.lzo

我们再通过Spark-submit提交一个作业看一看,还有没有报错

[hadoop@hadoop004 conf]$ spark-submit --class com.ruozedata.spark.wc.WordCountApp --master yarn hdfs://hadoop004:9000/lib/spark-train-1.0.jar hdfs://hadoop004:9000/wc_input2

可以看到如下输出信息,代表配置LZO成功了。

19/05/15 11:29:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop004:43988 (size: 21.6 KB, free: 366.3 MB)
19/05/15 11:29:58 INFO spark.SparkContext: Created broadcast 0 from textFile at WordCountApp.scala:10
19/05/15 11:29:58 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
19/05/15 11:29:58 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev f1deea9a313f4017dd5323cb8bbb3732c1aaccc5]
19/05/15 11:29:58 INFO mapred.FileInputFormat: Total input paths to process : 1

结果出来了

(hello,451)
(world,762)
(hi,180)

Well done!!!

 

 

 

 

 

 

  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值