摘要
深度学习是机器学习的一个分支,机器学习的入门神器是Spark
提前准备
Ubuntu 14 (自带 Python 2.7)
py4j-0.10.4.tar.gz
spark-2.1.0-bin-hadoop2.7.tgz
jdk-8u121-linux-x64.tar.gz
各种解压略过
设置环境变量
nano ~/.bashrc
export JAVA_HOME=~/jdk/jdk1.8.0_121
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
export SPARK_HOME=~/spark/spark-2.1.0-bin-hadoop2.7
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PATH=${SPARK_HOME}/bin:$PATH
# export PYSPARK_PYTHON=python3
安装 py4j
tar xf py4j-0.10.4.tar.gz
cd py4j-0.10.4/
sudo python setup.py install
run following code in python shell
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)
print (sc)