linux运行pyspark,PyCharm 远程连接linux中Python 运行pyspark

PySpark in PyCharm on a remote server

1、确保remote端Python、spark安装正确

2、remote端安装、设置

vi /etc/profile

添加一行:

export PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip

PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip

source /etc/profile

# 安装pip 和 py4j

下载pip-7.1.2.tar

tar -xvf pip-7.1.2.tar

cd pip-7.1.2

python setup.py install

pip install py4j

# 避免ssh时tty检测

cd /etc

chmod 640 sudoers

vi /etc/sudoers

#Default requiretty

3、本地Pycharm设置

File > Settings > Project Interpreter:

ecd889385c866e511c284bb37dcae6de.png

Project Interpreter > Add remote(前提:remote端python安装成功):

c07d8393e5ca040037fe5b291c344156.png

注意,这里的Python路径为python interpreter path,如果python安装在其它路径,要把路径改过来

Run > Edit Configuration (前提:虚拟机中共享本地目录成功):

4345ac812398fd05007f214d443b0149.png

此处我配置映射是在Tools中进行的

Tools > Dployment > Configuration

caaeca7e8098fa8dde5b6583bccccf6f.png

aeda1c103473d7439f9297fd037e06c5.png

067f8593b74d460dbd1b19fcd2c8f026.png

4、测试

importosimportsys

os.environ['SPARK_HOME'] = '/root/spark-1.4.0-bin-hadoop2.6'sys.path.append("/root/spark-1.4.0-bin-hadoop2.6/python")try:from pyspark importSparkContextfrom pyspark importSparkConfprint ("Successfully imported Spark Modules")exceptImportError as e:print ("Can not import Spark Modules", e)

sys.exit(1)

Result:ssh://hadoop@192.168.1.131:22/usr/bin/python -u /home/hadoop/TestFile/pysparkProgram/Mainprogram.py

Successfully imported Spark Modules

Process finished with exit code 0

或者:

importsys

sys.path.append("/root/programs/spark-1.4.0-bin-hadoop2.6/python")try:importnumpy as npimportscipy.sparse as spsfrom pyspark.mllib.linalg importVectors

dv1= np.array([1.0, 0.0, 3.0])

dv2= [1.0, 0.0, 3.0]

sv1= Vectors.sparse(3, [0, 2], [1.0, 3.0])

sv2= sps.csc_matrix((np.array([1.0, 3.0]), np.array([0, 2]), np.array([0, 2])), shape=(3, 1))print(sv2)exceptImportError as e:print("Can not import Spark Modules", e)

sys.exit(1)

Resultssh://hadoop@192.168.1.131:22/usr/bin/python -u /home/hadoop/TestFile/pysparkProgram/Mainprogram.py

(0, 0)1.0(2, 0) 3.0Process finished with exit code 0

参考:

https://edumine.wordpress.com/2015/08/14/pyspark-in-pycharm/http://renien.github.io/blog/accessing-pyspark-pycharm/http://www.tuicool.com/articles/MJnYJb

参照:

http://blog.csdn.net/u011196209/article/details/9934721

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值