Spark 下载版本间的区别

Spark 下载版本间的区别
困惑于Spark官网的Pre-built for Apache Hadoop和Pre-built with user-provided Apache Hadoop的区别。
为何下载个Spark还搞那么多的版本???本文以2.1.1版作为测试。

Pre-built for Apache Hadoop
在一台安装了Java的机器上执行以下命令

# 0. 进入测试目录
cd /tmp

# 1. 下载Spark 并解压
wget https://archive.apache.org/dist/spark/spark-2.1.1/spark-2.1.1-bin-hadoop2.7.tgz
tar -xf spark-2.1.1-bin-hadoop2.7.tgz

# 2. 运行Spark
cd /tmp/spark-2.1.1-bin-hadoop2.7 &&\
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
./examples/jars/spark-examples_2.11-2.1.1.jar \
100
# 程序是可以正常运行的且可以看到
# 类似于Pi is roughly 3.1408123140812316 的字样

Pre-built with user-provided Apache Hadoop
在一台安装了Java的机器上执行以下命令

# 0. 进入测试目录
cd /tmp

# 1. 下载Spark 并解压
wget https://archive.apache.org/dist/spark/spark-2.1.1/spark-2.1.1-bin-without-hadoop.tgz
tar -xf spark-2.1.1-bin-without-hadoop.tgz

# 2. 运行Spark
cd /tmp/spark-2.1.1-bin-without-hadoop &&\
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
./examples/jars/spark-examples_2.11-2.1.1.jar \
100
# 直接报错,java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream

# 3. 与之前的spark-2.1.1-bin-hadoop2.7.tgz 对比以下jar 包
ls spark-2.1.1-bin-hadoop2.7/jars/ > h.txt
ls spark-2.1.1-bin-without-hadoop/jars/ > w.txt
diff -y -W 50 h.txt w.txt  # -y 并排对比,-W 列宽
# 看到右边的w.txt 内容中少了很多Hadoop 的包

# 4. 下载Hadoop 并解压
cd /tmp && wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
tar -xf hadoop-2.7.2.tar.gz

# 5. 确认Hadoop 的版本
/tmp/hadoop-2.7.2/bin/hadoop version

# 6. 关联Spark-Without-Hadoop 和Hadoop
cat > /tmp/spark-2.1.1-bin-without-hadoop/conf/spark-env.sh << 'EOF'
#!/usr/bin/env bash
export SPARK_DIST_CLASSPATH=$(/tmp/hadoop-2.7.2/bin/hadoop classpath)
EOF

# 7. 再次运行Spark
cd /tmp/spark-2.1.1-bin-without-hadoop &&\
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
./examples/jars/spark-examples_2.11-2.1.1.jar \
100 2>&1 | grep 'Pi is'
# 成功显示Pi is roughly 3.1424395142439514

总结
由上实验可见,[Pre-built with user-provided Apache Hadoop](#Pre-built with user-provided Apache Hadoop)版需要自己修改配置文件去适配Hadoop,实际就是执行时在CLASSPATH中加入Hadoop的Jar包,而[Pre-built for Apache Hadoop](#Pre-built for Apache Hadoop)则是做到了开箱即用,将提前对应的Hadoop Jar包捆绑在其中,同时因为Hadoop 2.6和Hadoop 2.7的HDFS等接口不一样,所以Pre-built for Apache Hadoop分了两个版本。

  • 3
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值