在hadoop集群上搭好spark环境及体验spark shell之后可以重新做做官方的快速上手。
运行独立程序(SimpleApp.py):
首先编写程序(这里用Pytho的API):
from pyspark import SparkContext
logFile = "README.md" #注意这里的文件是在hdfs中的
sc = SparkContext("local","Simple App")
logData = sc.textFile(logFile).cache()
numAs = logData.filter(lambda s:'a' in s).count()
numBs = logData.filter(lambda s:'b' in s).count()
print "lines with a: %i,lines with b: %i" %(numAs,numBs)
然后进入spark安装目录运行:
hadoop@Mhadoop:/usr/local/spark/spark-1.3.1-bin-hadoop2.4$ vi /home/hadoop/Public/SimpleApp.py
hadoop@Mhadoop:/usr/local/spark/spark-1.3.1-bin-hadoop2.4$
./bin/spark-submit --master local
/home/hadoop/Public/SimpleApp.py