本文主要向大家介绍了Linux运维知识之PyCharm 远程连接linux中Python 运行pyspark,通过具体的内容向大家展现,希望对大家学习Linux运维知识有所帮助。
PySpark in PyCharm on a remote server
1、确保remote端Python、spark安装正确
2、remote端安装、设置
vi /etc/profile添加一行:PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zipsource /etc/profile
# 安装pip 和 py4j
下载pip-7.1.2.tartar -xvf pip-7.1.2.tarcd pip-7.1.2python setup.py installpip install py4j
# 避免ssh时tty检测
cd /etcchmod 640 sudoersvi /etc/sudoers#Default requiretty
3、本地Pycharm设置
File > Settings > Project Interpreter:
Project Interpreter > Add remote(前提:remote端python安装成功):
注意,这里的Python路径为python interpreter path,如果python安装在其它路径,要把路径改过来
Run > Edit Configuration (前提:虚拟机中共享本地目录成功):
此处我配置映射是在Tools中进行的
Tools > Dployment > Configuration
4、测试
import os
import sys
os.environ[‘SPARK_HOME‘] = ‘/root/spark-1.4.0-bin-hadoop2.6‘
sys.path.append("/root/spark-1.4.0-bin-hadoop2.6/python")
try:
from pyspark import SparkContext
from pyspark import SparkConf
print ("Successfully imported Spark Modules")
except ImportError as e:
print ("Can not import Spark Modules", e)
sys.exit(1)
Result:
ssh://hadoop@192.168.1.131:22/usr/bin/python -u /home/hadoop/TestFile/pysparkProgram/Mainprogram.pySuccessfully imported Spark Modules
Process finished with exit code 0
或者:
import sys
sys.path.append("/root/programs/spark-1.4.0-bin-hadoop2.6/python")
try:
import numpy as np
import scipy.sparse as sps
from pyspark.mllib.linalg import Vectors
dv1 = np.array([1.0, 0.0, 3.0])
dv2 = [1.0, 0.0, 3.0]
sv1 = Vectors.sparse(3, [0, 2], [1.0, 3.0])
sv2 = sps.csc_matrix((np.array([1.0, 3.0]), np.array([0, 2]), np.array([0, 2])), shape=(3, 1))
print(sv2)
except ImportError as e:
print("Can not import Spark Modules", e)
sys.exit(1)
Result
ssh://hadoop@192.168.1.131:22/usr/bin/python -u /home/hadoop/TestFile/pysparkProgram/Mainprogram.py (0, 0) 1.0
(2, 0) 3.0
Process finished with exit code 0
本文由职坐标整理并发布,希望对同学们有所帮助。了解更多详情请关注系统运维Linux频道!