HuggingFace学习笔记--metrics和pipeline的使用

目录

1--metrics的使用

2--pipeline的使用

2-1--正负面文本分类任务

2-2--阅读理解任务

2-3--完型填空


1--metrics的使用

        metrics 意为指标,通过 API 可以快速使用内置的评价指标。

代码:

from datasets import list_metrics, load_metric

if __name__ == "__main__":
    # 列出所有评价指标
    metrics_list = list_metrics()
    print(len(metrics_list))
    print(metrics_list)
    
    # 加载一个评价指标
    metric = load_metric('glue', 'mrpc') # glue和mrpc见https://zhuanlan.zhihu.com/p/522017847
    print(metric.inputs_description) 
    
    # 计算一个评价指标
    predictions = [0, 1, 0]
    references = [0, 1, 1]

    final_score = metric.compute(predictions=predictions, references=references)
    print(final_score)
    print("All done!")

        通过 list_metrics 查看所有的评价指标,通过 load_metric 选取合适的评价指标;

2--pipeline的使用

        使用 pipeline 可以快速使用预训练好的模型,可以直接进行相关的任务,或作为下游任务的预训练模型。

        pipeline 一般可以拆分为 tokenizer(分词)、model(模型) 和 post-process(后处理) 三部分;

from transformers.pipelines import SUPPORTED_TASKS
from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer

if __name__ == "__main__":
    # 查看pipeline支持的任务类型
    for k, v in SUPPORTED_TASKS.items():
        print(k, v)
        
    # 根据任务类型直接创建pipeline
    pipe = pipeline("text-classification") # 默认都是英文
    print(pipe(["very good!", "very bad!"]))
    
    # 根据任务类型和模型创建pipeline
    pipe = pipeline("text-classification", model="uer/roberta-base-finetuned-dianping-chinese")
    pipe("我觉得不太行!")
    
    # 预先加载模型,再创建pipeline
    model = AutoModelForSequenceClassification.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")
    tokenizer = AutoTokenizer.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")
    pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0)

2-1--正负面文本分类任务

代码:

from transformers import pipeline

if __name__ == "__main__":
    # 正面和负面文本分类
    classifier = pipeline("sentiment-analysis")
    result = classifier("I hate you")[0]
    print(result)
    
    result = classifier("I love you")[0]
    print(result)

输出结果:

{'label': 'NEGATIVE', 'score': 0.9991129040718079}
{'label': 'POSITIVE', 'score': 0.9998656511306763}

2-2--阅读理解任务

代码:

from transformers import pipeline

if __name__ == "__main__":
    # 阅读理解
    question_answerer = pipeline("question-answering")

    context = r"""
        Extractive Question Answering is the task of extracting an answer from a text given a question. \
        An example of a question answering dataset is the SQuAD dataset, which is entirely based on that task. \
        If you would like to fine-tune a model on a SQuAD task, \
        you may leverage the examples/pytorch/question-answering/run_squad.py script.
    """

    result = question_answerer(question="What is extractive question answering?", context=context)
    print(result)

    result = question_answerer(question = "What is a good example of a question answering dataset?", context=context)
    print(result)

输出结果:

{'score': 0.6034508347511292, 'start': 42, 'end': 103, 'answer': 'the task of extracting an answer from a text given a question'}
{'score': 0.4721057713031769, 'start': 165, 'end': 178, 'answer': 'SQuAD dataset'}

2-3--完型填空

代码:

from transformers import pipeline

if __name__ == "__main__":
    # 完形填空
    unmasker = pipeline("fill-mask")
    sentence = 'HuggingFace is creating a <mask> that the community uses to solve NLP tasks.'
    result = unmasker(sentence)
    print(result)

输出结果:

[
    {'score': 0.17927497625350952, 'token': 3944, 'token_str': ' tool', 'sequence': 'HuggingFace is creating a tool that the community uses to solve NLP tasks.'},             
    {'score': 0.11349403858184814, 'token': 7208, 'token_str': ' framework', 'sequence': 'HuggingFace is creating a framework that the community uses to solve NLP tasks.'}, 
    {'score': 0.05243556201457977, 'token': 5560, 'token_str': ' library', 'sequence': 'HuggingFace is creating a library that the community uses to solve NLP tasks.'}, 
    {'score': 0.03493537753820419, 'token': 8503, 'token_str': ' database', 'sequence': 'HuggingFace is creating a database that the community uses to solve NLP tasks.'}, 
    {'score': 0.02860264666378498, 'token': 17715, 'token_str': ' prototype', 'sequence': 'HuggingFace is creating a prototype that the community uses to solve NLP tasks.'}
]

kube-state-metrics是一个开源的Kubernetes组件,用于将Kubernetes资源信息转换为指标(metrics)并暴露给Prometheus。YAML是Kubernetes常用的声明式配置文件格式。kube-state-metrics也需要通过YAML配置文件进行部署和管理。下面来详细介绍kube-state-metrics YAML的几个方面。 首先,kube-state-metrics YAML需要配置Kubernetes API Server的地址、证书等信息,以便通过API Server获取资源信息。其次,kube-state-metrics需要指定需要收集的资源类型,比如Pod、Deployment等。这些资源类型通过Kubernetes API定义,kube-state-metrics会通过API Server获取这些资源的详细信息,并将其转换为指标。此外,kube-state-metrics YAML还需要指定需要暴露给Prometheus的指标端口等信息,以便Prometheus可以通过这些指标监控Kubernetes集群各种资源的使用情况。 在kube-state-metrics的配置,还需要指定各个资源指标的定义方式,例如当Pod或Deployment出现异常时,kube-state-metrics需要如何定义这些指标?以及在Prometheus如何查询这些指标。因此,可以在kube-state-metrics的YAML文件定义各个指标的名称,以及在Prometheus的查询方式。这里的细节需要参考kube-state-metrics的文档。 总之,kube-state-metrics YAML是配置kube-state-metrics组件的一种方法,通过对该文件进行配置,可以定制化kube-state-metrics组件的行为,并将它整合到Kubernetes集群。而kube-state-metrics的指标可以用于Kubernetes集群的监控和自动化管理。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值