ML中的pipeline估计是参考了py的Scipy等把
1.PIPELINE的主要部分就是
val pipeline = new Pipeline() .setStages(Array(tokenizer, hashingTF, lr)) // Fit the pipeline to training documents. val model = pipeline.fit(training)
2.将各个计算阶段按照stages顺序,整个阶段就是依靠DF的col,设置input,output
(1).构造tokenizer阶段
val training = sqlContext.createDataFrame(Seq( (