1. Python第一次运行分布式计算程序.
(1) 在Linux终端执行以下命令:
cat inputFile.txt | python mrMeanMapper.py
(2) 在windows系统下,可以在DOS窗口输入以下命令:
Python mrMeanMapper.py < inputFile.txt
执行结果如下,但是遇到一些问题:(目前还没找到解决办法)
要将目录切换到python的安装目录,否则报错:'python'不是内部或外部命令,也不是可运行的程序或批处理文件。
要执行和读取的文件(.py和.txt文档)也必须放到这个安装目录下面
(1) 同时运行map和reduce:
Linux: cat inputFile.txt | python mrMeanMapper.py | pythonmrMeanReducer.py
windows:python mrMeanMapper.py < inputFile.txt | python mrMeanReducer.py
运行时,mapperOut是如下二维列表:
很明显第二个列表的元素无法转换为float, 故原代码运行会出错
#for instance in mapperOut: # nj = float(instance[0]) # cumN += nj # cumVal += nj*float(instance[1]) # cumSumSq += nj*float(instance[2])
自己修改代码为:要使用strip()去掉字符串首尾空格,否则也可能报错
instance = mapperOut[0] nj = float(instance[0].strip()) cumN += nj cumVal += nj*float(instance[1].strip()) cumSumSq += nj*float(instance[2].strip())
贴出书上源代码如下:
#mrMeanMapper.py import sys import numpy as np #æè¡è¯»åè¾å¥ def read_input(file): for line in file: #rshrip()å é¤å符串æ«å°¾çæå®å符 yield line.rstrip() inputs = read_input(sys.stdin)#creates a list of input lines inputs = [float(line) for line in inputs] #overwrite with floats numInputs = len(inputs) inputs = np.mat(inputs) sqInput = np.power(inputs,2) ##output size, mean, mean(square values) print("%d\t%f\t%f" % (numInputs, np.mean(inputs), np.mean(sqInput))) #calc mean of columns ##print(>> sys.stderr, "report: still alive") print(sys.stderr, "report: still alive") #mrMeanReducer.py import sys import numpy as np def read_input(file): for line in file: yield line.rstrip() input = read_input(sys.stdin)#creates a list of input lines #split input lines into separate items and store in list of lists mapperOut = [line.split('\t') for line in input] #accumulate total number of samples, overall sum and overall sum sq cumVal=0.0 cumSumSq=0.0 cumN=0.0 #for instance in mapperOut: # nj = float(instance[0]) # cumN += nj # cumVal += nj*float(instance[1]) # cumSumSq += nj*float(instance[2]) #mapperOutæ¯ä¸ä¸ªå«æ两个å表çäºç»´å表ï¼ç¬¬äºä¸ªå表æ æ³è½¬æ¢ä¸ºæµ®ç¹æ° #使ç¨strip()å»æé¦å°¾ç©ºæ ¼ï¼å¦åå¯è½æ¥éã instance = mapperOut[0] nj = float(instance[0].strip()) cumN += nj cumVal += nj*float(instance[1].strip()) cumSumSq += nj*float(instance[2].strip()) #calculate means mean = cumVal/(cumN+1) meanSq = cumSumSq/(cumN+1) #output size, mean, mean(square values) print("%d\t%f\t%f" % (cumN, mean, meanSq)) print(sys.stderr, "report: still alive")