ML之LogisticRegression

ML 线性回归

1. 数据输入:

tips.csv

1,1,1
1,1.1,0.9
1,1,1.2
2,10,11
2,9,10
2,10,12
3,50,52
3,49,50
3,48,49

from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline, PipelineModel

data = sqlContext.read.load(dataPath, format='com.databricks.spark.csv', inferSchema='true')
data = data.withColumnRenamed('_c0', 'label')
strs=''
for i in range(len(data.columns)-1):
strs=strs+'_c'+str(i+1)+','
strs=strs[0:len(strs)-1]

all_feats = [str(x) for x in strs.split(',')]
assemblerAllFeatures = VectorAssembler(inputCols=all_feats, outputCol='features')
pipeline = Pipeline(stages=[assemblerAllFeatures])
pipelineModel = pipeline.fit(data)
output = pipelineModel.transform(data)
df=output.select('label','features')


2. 训练模型

def logisticRegression(df,arguments):
"""
Only supports binary classification
"""
from pyspark.ml.classification import LogisticRegression
maxIter = 100
regParam = 0
elasticNetParam = 0
if arguments.maxIter != None:
maxIter = float(arguments.maxIter)
if arguments.regParam != None:
regParam = float(arguments.regParam)
if arguments.elasticNetParam != None:
elasticNetParam = float(arguments.elasticNetParam)
lr = LogisticRegression(maxIter=maxIter,
regParam=regParam,
elasticNetParam=elasticNetParam)
lrModel = lr.fit(df)
return lrModel


modelPath = arguments.modelPath
model.write().overwrite().save(modelPath)

3. 预测输入数据

df=sc.parallelize([Row(features=Vectors.dense([float(x) for x in dataSet.split(',')]))]).toDF()

预测:

from pyspark.ml.classification import LogisticRegressionModel
model = LogisticRegressionModel.load(modelPath)

result = model.transform(data).head()
str_value = str(result.prediction)
fo = open("/tmp/foo.txt", "w")
fo.write(str_value);
fo.close()


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值