安装scala、sbt、spark

一.安装scala 

maven下载链接

sudo tar -zxvf scala-2.11.6.tgz -C /opt
sudo mv scala-2.11.6 scala

sudo vim /etc/profile

#scala environment
export SCALA_HOME=/opt/scala
export PATH=${SCALA_HOME}/bin:$PATH

sudo source /etc/profile

二.安装sbt

sudo tar zxf sbt-1.2.8.tgz -C /opt
sudo mv sbt-1.2.8 sbt

sudo vim /etc/profile

#sbt environment
export SBT_HOME=/opt/sbt
export PATH=${SBT_HOME}/bin:$PATH

sudo source /etc/profile

sbt sbtVersion

 三.安装spark 

spark 下载链接

sudo tar -zvxf spark-3.0.0-bin-hadoop3.2.tgz -C /opt
mv spark-3.0.0-bin-hadoop3.2.tgz spark

sudo vim /etc/profile

#Spark environment
export SPARK_HOME=/opt/spark
export PATH=${SPARK_HOME}/bin:$PATH
#jupyter直接运行pyspark配置
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:$PYTHONPATH
export PYSPARK_PYTHON=/home/hadoop/envs/py3/bin/python3
export PYSPARK_DRIVER_PYTHON=/home/hadoop/envs/py3/bin/python3
#pyspark启动jupyter配置
#export PYSPARK_DRIVER_PYTHON=ipython
#export PYSPARK_DRIVER_PYTHON_OPTS="notebook"


sudo source /etc/profile

pyspark

直接jupyter notebook启动即可运行 

from pyspark import SparkContext, SparkConf 
conf=SparkConf()
conf.setAppName("My app")
sc.stop()
sc = SparkContext(conf=conf)
lines=sc.textFile("hdfs://localhost:9000/hive/zxx.db/t")
words=lines.flatMap(lambda line:line.split(" "))
keyvalue=words.map(lambda word:(word,1))
result=keyvalue.reduceByKey(lambda x,y:x+y)
print(result.collect())

spark连接hive

在spark-shell下测试是否支持spark连接hive,不支持的话安装支持的spark版本

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

 配置/opt/spark/conf/spark-env.sh

export JAVA_HOME=/opt/java
export SPARK_DIST_CLASSPATH=$(/opt/hadoop/bin/hadoop classpath)

export CLASSPATH=$CLASSPATH:/opt/hive/lib
export SCALA_HOME=/opt/scala
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
export HIVE_CONF_DIR=/opt/hive/conf
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/saprk/jars/mysql-connector-java-5.1.47-bin.jar
cp /optl/hive/conf/hive-site.xml /opt/spark/conf

启动jupyter运行

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext
conf=SparkConf()
conf.setAppName("My app")
sc = SparkContext(conf = conf)
sqlContext = HiveContext(sc)
my_dataframe = sqlContext.sql("Select * from t")
my_dataframe.show()

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值