调和平均数的公式[1]:
H
=
n
1
x
1
+
1
x
2
+
1
x
3
+
.
.
.
+
1
x
n
H=\frac{n}{\frac{1}{x_1}+\frac{1}{x_2}+\frac{1}{x_3}+...+\frac{1}{x_n}}
H=x11+x21+x31+...+xn1n
定义很简单,具体有什么应用价值呢,网上的博客没有说。[2]是研究dash视频传输的,就是怎么根据预测的带宽,去请求合适的码率。而根据历史数据预测带宽,harmonic mean就有了用武之地。文中说这个方法能够有效滤除异常值。
First, the harmonic mean is more appropriate when we want to compute the average of rates which is the case with throughput estimation. Second, it is also more robust to larger outliers.
python实现:
import os
class HarmnicMean(object):
def __init__(self,window):
self.w=window
self.c=0
self.his=[]
def newSample(self,s):
mean=0.0
sample=float(s)
if self.c==0:
mean=sample
if sample>0:
self.his.append(1000/sample)
self.c+=1
if self.c>self.w:
a=self.his[self.c-self.w:]
self.his=a
if len(self.his)!=self.w:
print "error"
self.c=self.w
if self.c<self.w:
mean=self.c*1000/sum(self.his)
if self.c==self.w:
mean=self.w*1000/sum(self.his)
return mean
h=HarmnicMean(20)
fileName="data_in.txt"
f_h=open("data_out.txt",'w')
with open(fileName) as txtData:
for line in txtData.readlines():
lineArr = line.strip().split()
x=lineArr[0]
y=float(lineArr[2])
mean=h.newSample(y)
f_h.write(x+"\t"+str(mean)+"\n")
f_h.close()
看看处理效果如何:
[1] Harmonic mean
[2] Improving Fairness, Efficiency, and Stability in HTTP-based Adaptive Video Streaming with FESTIVE