算法应用-逻辑回归的使用

最新推荐文章于 2021-12-12 14:29:13 发布

惜于情

最新推荐文章于 2021-12-12 14:29:13 发布

阅读量170

点赞数

分类专栏： spark技术中级

本文链接：https://blog.csdn.net/qq_45721573/article/details/117573980

版权

spark技术中级专栏收录该内容

9 篇文章 0 订阅

订阅专栏

实验名称

算法应用-逻辑回归的使用

实验目的

掌握Pipeline、逻辑回归的用法

实验原理

（1）Pipeline：将Pipeline多个Transformers和Estimators 链在一起以指定ML工作流程。

在这里插入图片描述

（2）逻辑回归：在线性回归增加了一个函数g(z)，能够把连续值映射到几个离散的数据，如：0、1等。

实验环境

VMware Workstation
Ubuntu 16.04
Pycharm
Pyspark

实验步骤

from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import HashingTF, Tokenizer

# Prepare training documents from a list of (id, text, label) tuples.
training = spark.createDataFrame([
    (0, "a b c d e spark", 1.0),
    (1, "b d", 0.0),
    (2, "spark f g h", 1.0),
    (3, "hadoop mapreduce", 0.0)
], ["id", "text", "label"])

# Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression(maxIter=10, regParam=0.001)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

# Fit the pipeline to training documents.
model = pipeline.fit(training)

# Prepare test documents, which are unlabeled (id, text) tuples.
test = spark.createDataFrame([
    (4, "spark i j k"),
    (5, "l m n"),
    (6, "spark hadoop spark"),
    (7, "apache hadoop")
], ["id", "text"])

# Make predictions on test documents and print columns of interest.
prediction = model.transform(test)
selected = prediction.select("id", "text", "probability", "prediction")
for row in selected.collect():
    rid, text, prob, prediction = row
    print("(%d, %s) --> prob=%s, prediction=%f" % (rid, text, str(prob), prediction))

惜于情

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
算法应用-逻辑回归的使用

实验名称算法应用-逻辑回归的使用实验目的掌握Pipeline、逻辑回归的用法实验原理（1）Pipeline：将Pipeline多个Transformers和Estimators 链在一起以指定ML工作流程。（2）逻辑回归：在线性回归增加了一个函数g(z)，能够把连续值映射到几个离散的数据，如：0、1等。实验环境VMware WorkstationUbuntu 16.04PycharmPyspark实验步骤from pyspark.ml import Pipelinefr
复制链接

扫一扫