最近需求用到字节跳动 火山引擎的NLP相关服务,简要记录一下
字节的服务介绍页面
https://www.volcengine.com/product/text-correction
文本纠错API,基于这个文本纠错API可以应用为,文本校对,辅助编辑,作业批改等
同时字节的文本纠错API隶属于文本分析服务这个NLP大类,字节的文本分析服务包含了下面一共五个子类:
- 文本摘要
自动提取新闻文本中的关键信息并生成指定长度的摘要。目前提供抽取式和生成式两种方式。抽取式摘要支持灵活字数定义来从原文中提取关键信息组合成摘要。生成式摘要借助AI支持从头到尾自动生成流畅自然的摘要文本。 - 情感分析
针对带有主观描述的中文文本,自动判断该文本的情感类别(积极、中性、消极)并给出相应的置信度 - 英文纠错
自动识别句子中的错误,并给出正确的建议 - 中文纠错
自动识别句子中的错误,并给出正确的建议 - 关键词抽取
通过对文章内容进行深度分析,抽取和推断出能够表达文章中心内容的词语或者实体,以及对应的权重,针对实体还可以输出对应的分类,可应用于文章推荐、分类和搜索等场景。
首先申请开通相关服务并在控制台建立应用,同时接入上述对应服务
基本逻辑和机器翻译一样,构建请求json,对请求进行签名,发送请求后对返回结果进行解析,因为都是字节的火山引擎因此签名算法是通用的,可以参考和重用之前我做的字节跳动 火山引擎 机器翻译调用 C#详解
主要记录一下不一样的地方
首先Service不同,要有变化
const string Service = "nlp_gateway";
然后改一下useragent
request.UserAgent = "volc-sdk-python/v1.0.17";
在C#.NET 中模拟Python
https://github.com/volcengine/volc-sdk-python/tree/main/volcengine/example/nlp (字节的Python SDK)
然后QueryURL根据请求的服务需要变化
switch (NlpInterface)
{
case VolcanoEngineNLPInterface.KeyphraseExtractionExtract:
//文章关键词抽取
UrlQuery = "Action=KeyphraseExtractionExtract&Version=2020-09-01";
break;
case VolcanoEngineNLPInterface.TextSummarization:
//文本摘要
UrlQuery = "Action=TextSummarization&Version=2020-12-01";
break;
case VolcanoEngineNLPInterface.SentimentAnalysis:
//情感分析
UrlQuery = "Action=SentimentAnalysis&Version=2020-09-01";
break;
case VolcanoEngineNLPInterface.TextCorrectionZh:
//中文文本纠错
UrlQuery = "Action=TextCorrectionZhCorrect&Version=2020-09-01";
break;
case VolcanoEngineNLPInterface.TextCorrectionEn:
//英文文本纠错
UrlQuery = "Action=TextCorrectionEnCorrect&Version=2020-09-01";
break;
default:
break;
}
最后就是请求json和返回json了
- 文本摘要
请求 (长度不超过10000)
{"text": "你好"}
返回
"result": "你好"
- 情感分析
请求
{"text": "我很生气"}
返回
{
"negative_score": "0.9238909",
"positive_score": "0.0761091",
"result": "情感偏负向"
}
- 英文纠错
请求
{"content": "This is a incomplte text test"}
返回
{
"result": [
{ //元音开头
"correct": "a",
"index_of_doc": 8,
"original": "an"
},
{ //拼写错误
"correct": "incomplte",
"index_of_doc": 10,
"original": "incomplete"
},
{ //句子结束
"correct": "test",
"index_of_doc": 25,
"original": "test."
}
]
}
- 中文纠错
请求
{"content": "易洋千玺条纹茄克周冬雨粉色露肩群活力亮相"}
返回
{
"char_result": [
{ //错别字
"confidence": 0.9362648129463196,
"correct": "裙",
"index_of_doc": 15,
"original": "群"
}
],
"word_result": [
{ //错别字
"confidence": 0.9362648129463196,
"correct": "裙",
"index_of_doc": 15,
"original": "群"
}
]
}
- 关键词抽取
请求
{
"request_type": "text",
"title": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"content": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
}
返回
{
"keywords": [
{
"category": [
"xxxxxxxxxx"
],
"fullname": "xxxxxxxxxxx]",
"text": "xxxxxxxxx",
"weight": 0.9636314012897041
},
{
"category": [
"xx机构"
],
"fullname": "xxxxxxxxxxxxxx",
"text": "xxxxxxxxxxxx",
"weight": 0.5788701698344204
},
{
"category": [
"地点",
"行政区划",
"省级",
"省"
],
"fullname": "xxxxxxxxxxxxxx",
"text": "xxxxxxxxxxxx",
"weight": 0.4784924304565255
},
{
"category": [
"其他"
],
"fullname": "xxxxxxxxxxxx",
"text": "xxxxxxxxxxxxxx",
"weight": 0.2643414968768523
},
{
"category": [
"专业术语"
],
"fullname": "xxxxxxx",
"text": "xxxxxxxx",
"weight": 0.26167648750959244
},
{
"category": [
"其他"
],
"fullname": "记者会",
"text": "记者会",
"weight": 0.24613832617469417
},
{
"category": [
"专业术语"
],
"fullname": "xxxxxxxx",
"text": "xxxxxxxx",
"weight": 0.23161126464332313
},
{
"category": [
"其他"
],
"fullname": "xxxxxxxxxx",
"text": "xxxxxxxxxxx",
"weight": 0.22947351948561256
},
{
"category": [
"专业术语"
],
"fullname": "xxxxxxxxx",
"text": "xxxxxxx",
"weight": 0.22021905708357076
},
{
"category": [
"其他"
],
"fullname": "xxxxxxxxxx",
"text": "xxxxxxxxxxxxx",
"weight": 0.21305596164903787
},
{
"category": [
"文化"
],
"fullname": "xxxxxx",
"text": "xxxxxxxxxx",
"weight": 0.19445949853380673
},
{
"category": [
"民俗节日"
],
"fullname": "9月14日",
"text": "9月14日",
"weight": 0.1911517188201008
},
{
"category": [
"其他"
],
"fullname": "9月15日",
"text": "9月15日",
"weight": 0.18349248571869864
}
]
}