我有一个Python代码,我想把它转换成Scala Spark,我的算法是LDA(潜在Dirichlet Allocation)的扩展,因为这个算法有一个采样过程,当数据非常大时非常耗时,我知道SparkMlib中有一个LDA的实现版本,但是我想单独编写我的算法再加上一些改进,我的代码是:def init_est(self):
self.nd = [0 for x in xrange(self.dset.M)]
self.ndl = [[0 for y in xrange(self.sentinum)] for x in xrange(self.dset.M)]
self.ndlz = [[[0 for z in xrange(self.K)] for y in xrange(self.sentinum)] for x in xrange(self.dset.M)]
self.nlzw = [[[0 for z in xrange(self.dset.V)] for y in xrange(self.K)] for x in xrange(self.sentinum)]
self.nlz = [[0 for y in xrange(self.K)] for x in xrange(self.sentinum)]
self.p = [[0.0 for y in xrange(self.K)] for x in xrange(self.sentinum)]
self.Z = [ [] for x in xrange(self.dset.M)]
self.L = [ [] for x in xrange(self.dset.M)]
self.pi_dl = [[0 for y in xrange(self.sentinum)]for x in xrange(self.dset.M)]
self.theta_dlz= [[[0 for z in xrange(self.K)]for y in xrange(self.sentinum)]for x in xrange(self.dset.M)]
self.phi_lzw = [[[0 for z in xrange(self.dset.V)]for y in xrange(self.K)]for x in xrange(self.sentinum)]
self.alpha_lz = [[0 for y in xrange(self.K)]for x in xrange(self.sentinum)]
self.alphaSum_l = [0 for x in xrange(self.sentinum)]
self.beta_lzw = [[[0 for z in xrange(self.dset.V)]for y in xrange(self.K)]for x in xrange(self.sentinum)]
self.betaSum_lz = [[0 for y in xrange(self.K)]for x in xrange(self.sentinum)]
self.lambda_lw = [[0 for y in xrange(s