JAVA项目调用BERT(python)计算语义相似度

Spring boot项目中,需要计算两个句子的语义相似度,可以基于JAVA直接写W2C计算语义相似度,但我选择了用BERT计算语义相似度(不过速度慢...)


1.BERT.py:可Pycharm修改输入试运行,测试效果

from transformers import BertTokenizer, BertModel
import torch

def calculate_similarity(sentence1, sentence2):
    # 加载预训练模型,可以替换合适的模型,如‘bert-base-chinese’,适合中文
    tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
    model = BertModel.from_pretrained('bert-base-chinese')

    # 对句子进行tokenization和padding,并计算句子的嵌入表示
    encoded_inputs = tokenizer([sentence1, sentence2], padding=True, truncation=True, return_tensors='pt')
    with torch.no_grad():
        outputs = model(**encoded_inputs)
        sentence1_embeddings = outputs.last_hidden_state[0]  # 句子1的嵌入表示
        sentence2_embeddings = outputs.last_hidden_state[1]  # 句子2的嵌入表示

    # 计算余弦相似度
    similarity = torch.cosine_similarity(sentence1_embeddings.mean(dim=0), sentence2_embeddings.mean(dim=0), dim=0)
    return similarity.item()

sentence1 = input()
sentence2 = input()
print(calculate_similarity(sentence1, sentence2))

        BERT模型可以更改,这里选了一个适合中文语句的模型,然后将以上BERT.py复制到自己JAVA项目需要调用BERT计算相似度的地方


2.SemanticSimilarityCalculator:写一个Java Class来调用BERT.py

package com.example.demo.service.Impl;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;

public class SemanticSimilarityCalculator {
    public double calculateSimilarity(String sentence1, String sentence2) throws IOException, InterruptedException {
        // 创建Python进程
        ProcessBuilder pb = new ProcessBuilder("python", "./到自己方法文件同级路径/BERT.py");
        Process process = pb.start();

        // 获取Python进程的输入流和输出流
        BufferedReader br = new BufferedReader(new InputStreamReader(process.getInputStream()));
        BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));

        System.out.println("Sending sentences to Python process:");
        System.out.println("Sentence 1: " + sentence1);
        System.out.println("Sentence 2: " + sentence2);
        try {
            bw.write(sentence1 + "\n");
            bw.write(sentence2 + "\n");
            bw.flush();
        } catch (IOException e) {
            throw new IOException("Failed to send data to Python process", e);
        }

        int exitCode = process.waitFor();

        if (exitCode != 0) {
            throw new IOException("Python process exited with non-zero status: " + exitCode);
        }

        // 读取Python进程的输出结果(语义相似度)
        String output = br.readLine();
        System.out.println("Python Output: " + output); // 调试语句
        if (output == null) {
            throw new IOException("Python process did not produce any output");
        }

        // 关闭输入流、输出流和进程
        br.close();
        bw.close();
        process.destroy();

        // 解析输出结果并返回语义相似度
        String similarityString = output.replaceAll("[^0-9.]", ""); // 提取字符串中的数字和小数点
        double similarity = Double.parseDouble(similarityString);
        similarity *= 100;

        return similarity;
    }
}

 3.在自己文件需要计算语义相似度的地方调用SemanticSimilarityCalculator方法计算语义相似度

Double sentenceSimilarity;
String sentence1 = "今天天气不错,对面只进了三个球";
String sentence2 = "今天有点晒,我没能零封对手";
sentenceSimilarity = similarityCalculator.calculateSimilarity(sentence1, sentence2);

在自己的项目里运行,即可调用BERT计算两个句子的语义相似度:


总结:这BERT不好使,而且...真的慢~

新手小白,望指正改进

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值