alink：批式训练和保存模型，流式消费和分类文本

最新推荐文章于 2022-07-29 13:56:55 发布

ak3k

最新推荐文章于 2022-07-29 13:56:55 发布

阅读量515

点赞数

分类专栏：机器学习 alink 文章标签： flink

本文链接：https://blog.csdn.net/asdf1368822590/article/details/118370000

版权

机器学习同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

alink

1 篇文章 0 订阅

订阅专栏

背景：

需要对kafka里的聊天内容进行实时判断是否违规还是正常，打上标签后推送到下游系统。

版本：

alink1.4.0，flink1.12.1

alink文档：https://www.yuque.com/pinshu/alink_doc

https://gitee.com/mirrors/Alink

训练数据：train.txt

1表示违规，0表示正常

label|review
1|我是折扣后台 请一起折扣玩这个游戏 BUFF 果盘 小七 66手游 请折后再找我返现金 微信YYDM63
1|晚上好请折扣充值玩家主动找我返现金，免费98券和激活码buff 果盘 66手游 小七玩家，微信yydm63
1|免费98券和6激活码 BUFF小七自助45折后返现 果盘66游自助三浙后返现 不管哪个区 微信YYDM63
1|我是折扣后台 请BUFF 果盘 66手游 小七玩家自助三浙后再找我返现金 首3续35领98券 返现微信YYDM63
0|扎啤配生拌牛肉那才叫爽ee5
0|头部有糕的话也可以的
0|坐标世界可以发
0|来吧
0|我还有6次
1|老区玩家好 请自助打折后还可以再找我返现金 BUFF 果盘 66手游 小七玩家请主动找我返现 微信YYDM63

模型生成代码：

        String modePath = "D:\\navie";
        String train_path = "train.txt";
        CsvSourceBatchOp trainSource = new CsvSourceBatchOp()
                .setFilePath(train_path)
                .setFieldDelimiter("|")
                .setSchemaStr("label int , review string")
                .setIgnoreFirstLine(true);

        Pipeline pipeline = new Pipeline(
                new Imputer()
                        .setSelectedCols("review")
                        .setOutputCols("featureText")
                        .setStrategy("value")
                        .setFillValue("null"),
                new Segment()
                        .setSelectedCol("featureText"),
                new StopWordsRemover()
                        .setSelectedCol("featureText"),
                new DocCountVectorizer()
                        .setFeatureType("TF")
                        .setSelectedCol("featureText")
                        .setOutputCol("featureVector"),
                new LogisticRegression()
                        .setVectorCol("featureVector")
                        .setLabelCol("label")
                        .setPredictionCol("pred")
        );

        PipelineModel model = pipeline.fit(trainSource);

        model.save(modelPath,true);

        BatchOperator.execute();

模型使用代码：

        String modelPath = "D:\\navie";
        KafkaSourceStreamOp kafkaSourceStreamOp = new KafkaSourceStreamOp()
                .setBootstrapServers("127.0.0.1:9092")
                .setStartupMode("latest")
                .setGroupId("test")
                .setTopic("sentiment");

        StreamOperator data = kafkaSourceStreamOp
                .link(
                        new JsonValueStreamOp()
                                .setSelectedCol("message")
                                .setOutputCols(new String[]{"review","user_id", "role_name", "role_id"})
                                .setJsonPath(new String[]{"chat_content", "user_id", "role_name", "role_id"})
                );

        PipelineModel pipelineModel = PipelineModel.load(modelPath);

        pipelineModel.transform(data)
                .select(new String[]{"review", "user_id", "role_name", "role_id","pred"})
                .print();
        StreamOperator.execute();

测试数据：

//测试数据写入kafka的主题:sentiment
{"chat_content":"we are superman","user_id":1,"role_name":"1","role_id":"1"}

测试结果打印：

ak3k

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
alink：批式训练和保存模型，流式消费和分类文本

背景：需要对kafka里的聊天内容进行实时判断是否违规还是正常，打上标签后推送到下游系统。版本： alink1.4.0，flink1.12.1 alink文档：https://www.yuque.com/pinshu/alink_doc https://gitee.com/mirrors/Alink训练数据：train.txt1表示违规，0表示正常label|review1|我是折扣后台请一起折扣玩...
复制链接

扫一扫