spark-3-大数据spark单机安装

1 下载

网址http://spark.apache.org/downloads.html
在这里插入图片描述
下载的版本,没有预编译hadoop,简单配置后可以应用到任意Hadoop版本。

2 单机安装

2.1 Java安装JDK

2.2 Hadoop安装

Spark会用到HDFS与YARN,因此需要先安装Hadoop,即必须安装Hadoop才能使用Spark。
但是如果使用Spark过程中没有用到HDFS,不启动Hadoop也是可以的,但是必须安装

2.3 Spark安装

2.3.1 解压

$ sudo tar -xzvf /home/hadoop/Desktop/spark-2.4.2-bin-without-hadoop.tgz -C /usr/local
$ cd /usr/local
$ sudo mv ./spark-2.4.2-bin-without-hadoop/ ./spark
$ sudo chown -R hadoop:hadoop spark/
$ gedit /home/hadoop/.bashrc

export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

$ source /home/hadoop/.bashrc

2.3.2 修改spark-env.sh

$ cd /usr/local/spark/conf
$ cp ./spark-env.sh.template spark-env.sh
$ gedit spark-env.sh

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

有了以上的配置信息以后,Spark就可以把数据存储到Hadoop分布式文件系统HDFS中,也可以从HDFS中读取数据。
如果没有配置上面的信息,Spark就只能读写本地数据,无法读写HDFS数据。
配置完成后就可以直接使用,不需要像Hadoop运行启动命令。

2.4 运行

目录/usr/local/spark/examples/src/main下面有Spark的示例程序,有Scala、Java、Python、R等语言版本。
运行一个示例程序SparkPi,计算pi的近似值。
(1)Java版本
$ run-example SparkPi
Pi is roughly 3.144195720978605
日志信息如下:
spark.SparkContext: Running Spark version 2.4.2
spark.SparkContext: Submitted application: Spark Pi
Successfully started service ‘sparkDriver’ on port 44037.
Created local directory at /tmp/blockmgr-ac1cea86-5092-4e7b-92e8-aedb55cd2164
MemoryStore started with capacity 413.9 MB
Started ServerConnector@37eb7628{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
Successfully started service ‘SparkUI’ on port 4040.
Bound SparkUI to 0.0.0.0, and started at http://192.168.1.112:4040
Server created on 192.168.1.112:36417
Stopped Spark@37eb7628{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
Stopped Spark web UI at http://192.168.1.112:4040
spark.SparkContext: Successfully stopped SparkContext
Deleting directory /tmp/spark-a1ecab95-ffc0-4bdc-986a-e361082d6bcc
Deleting directory /tmp/spark-57ecb328-079c-409f-aa81-bd0fea255b8b

(2)python版本的需要spark-submit运行
$ spark-submit /usr/local/spark/examples/src/main/python/pi.py
Pi is roughly 3.131640
日志信息如下:
Running Spark version 2.4.2
spark.SparkContext: Submitted application: PythonPi
Successfully started service ‘sparkDriver’ on port 39419.
Created local directory at /tmp/blockmgr-7304b67f-e55f-40e0-a376-ee23c8af6656
MemoryStore started with capacity 413.9 MB
Started ServerConnector@6fc8f2a{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
Successfully started service ‘SparkUI’ on port 4040.
Bound SparkUI to 0.0.0.0, and started at http://192.168.1.112:4040
Starting executor ID driver on host localhost
Setting hive.metastore.warehouse.dir (‘null’) to the value of spark.sql.warehouse.dir (‘file:/home/hadoop/spark-warehouse’).
Warehouse path is ‘file:/home/hadoop/spark-warehouse’.
Pi is roughly 3.131640
Stopped Spark@6fc8f2a{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
Stopped Spark web UI at http://192.168.1.112:4040
Successfully stopped SparkContext
Deleting directory /tmp/spark-24407c80-129f-4c38-a9a7-c3fd7ae29267
Deleting directory /tmp/spark-6da4353c-4072-41ba-b1b5-01747d9faffb

3 Spark交互式编程

Spark的交互式环境支持Scala和Python。

3.1 Scala版本的交互式环境

$ spark-shell
jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V报错
应该是版本太高的原因,切换回spark2.1.1版本,就解决了这个问题。
Spark context Web UI available at http://192.168.1.112:4040
Spark context available as ‘sc’ (master = local[*], app id = local-1556961357179).
Spark session available as ‘spark’.
scala>
在这里插入图片描述

3.2 Python版本的交互式环境

$ pyspark
Using Python version 3.5.2 (default, Nov 12 2018 13:43:14)
SparkSession available as ‘spark’.

在这里插入图片描述

4 切换pyspark的python版本

spark-env.sh 文件中追加内容:

export PYSPARK_PYTHON=/usr/bin/python3
export PYSPARK_DRIVER_PYTHON=/usr/bin/python3
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

皮皮冰燃

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值