jupyter notebook + pyspark 环境搭建

主要思路、步骤:

1、正常安装Anaconda环境

2、conda/pip install findspark

     #这一步很重要,findspark的作用:Provides findspark.init() to make pyspark importable as a regular library.

3、正常安装配置Jupyter Notebook,并后台启动

4、浏览器打开Jupyter Notebook页面,写代码

wordCount示例:

import findspark
findspark.init()
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName('testApp') \
                    .setMaster('spark://mdw-1:7077') \
                    .set('spark.executor.memory', '2g') \
                    .set('spark.executor.cores', '2') \
                    .set('spark.cores.max', '56')
sc = SparkContext( conf=conf )

textFile = sc.textFile('file:///usr/local/spark/README.md')
wordCount = textFile.flatMap(lambda line: line.split()).map(lambda word: (word,1)).reduceByKey(lambda a, b : a + b)
for x in sorted(wordCount.collect()):
    print(x)

print('\n'*2, '*'*20, '\n'*2)

textFile = sc.textFile('hdfs://mdw-1:9000/user/bda/README.md')
wordCount = textFile.flatMap(lambda line: line.split()).map(lambda word: (word,1)).reduceByKey(lambda a, b : a + b)
for x in sorted(wordCount.collect()):
    print(x)

sc.stop()

5、另外,也可以把pyspark集成到ipython环境中

Findspark can add a startup file to the current IPython profile so that the environment vaiables will be properly set and pyspark will be imported upon IPython startup. This file is created when edit_profile is set to true.

ipython --profile=myprofile
findspark.init('/path/to/spark_home', edit_profile=True)

 

 

参考文章:

How to configure an Apache Spark standalone cluster and integrate with Jupyter: Step-by-Step

配置Ipython Nodebook 运行 Python Spark 程序

jupyter notebook + pyspark 环境搭建

minrk/findspark

 

转载于:https://my.oschina.net/goopand/blog/2963135

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值