体验MindSporeRNN实现情感分类

最新推荐文章于 2024-05-17 18:30:00 发布

huodagu

最新推荐文章于 2024-05-17 18:30:00 发布

阅读量674

点赞数 2

文章标签： python

本文链接：https://blog.csdn.net/huodagu/article/details/128325669

版权

本文记录了使用MindSpore实现RNN进行情感分类的过程，包括数据预处理、模型构建、训练与评估。利用IMDB影评数据集，结合Glove词向量，最终得出0.64的测试准确率。

摘要由CSDN通过智能技术生成

小白第一次尝试用mindspore去跑一遍《RNN实现情感分类》。

这里教程案例中的情感分类是根据一段或一句话，让机器认知出是什么类型的情感，给出对应分类的标签。给予的数据集是IMDB影评数据集，其中有输入也有输出，所以是有监督学习；这里既然是人的自然语言，那不可缺少的就是将其转化为机器可识别的语言，所以预训练词向量编码也要同步进行。这里案例中选用的是Glove词向量（也不懂有啥好处，后续再研究）。

# 指定保存路径为 `home_path/.mindspore_examples`
cache_dir = Path.home() / '.mindspore_examples'

def http_get(url: str, temp_file: IO):
    """使用requests库下载数据，并使用tqdm库进行流程可视化"""
    req = requests.get(url, stream=True)
    content_length = req.headers.get('Content-Length')
    total = int(content_length) if content_length is not None else None
    progress = tqdm(unit='B', total=total)
    for chunk in req.iter_content(chunk_size=1024):
        if chunk:
            progress.update(len(chunk))
            temp_file.write(chunk)
    progress.close()

def download(file_name: str, url: str):
    """下载数据并存为指定名称"""
    if not os.path.exists(cache_dir):
        os.makedirs(cache_dir)
    cache_path = os.path.join(cache_dir, file_name)
    cache_exist = os.path.exists(cache_path)
    if not cache_exist:
        with tempfile.NamedTemporaryFile() as temp_file:
            http_get(url, temp_file)
            temp_file.flush()
            temp_file.seek(0)
            with open(cache_path, 'wb') as cache_file:
                shutil.copyfileobj(temp_file, cache_file)
    return cache_path

开始下载数据集，并下载影评数据集到当前目录下：


imdb_path = download('aclImdb_v1.tar.gz', 'https://mindspore-website.obs.myhuaweicloud.com/notebook/datasets/aclImdb_v1.tar.gz')
imdb_path

解压完数据集后，其中就有train和test两部分，对其进行数据清洗，去除非必要喂给机器的：


import re
import six
import string
import tarfile

class IMDBData():
    """IMDB数据集加载器

    加载IMDB数据集并处理为一个Python迭代对象。

    """
    label_map = {
        "pos":