小黑fastNLP成长日记1:DataSet构建

最新推荐文章于 2024-04-06 11:08:02 发布

爱喝喜茶爱吃烤冷面的小黑黑

最新推荐文章于 2024-04-06 11:08:02 发布

阅读量877

点赞数

分类专栏： fastNLP框架之小黑尝试文章标签：深度学习机器学习 pytorch

本文链接：https://blog.csdn.net/qq_37418807/article/details/122209386

版权

fastNLP框架之小黑尝试专栏收录该内容

5 篇文章 3 订阅

订阅专栏

DataSet的构建

字典构建

from fastNLP import DataSet
# 传入字典构建dataset
data = {'raw_words':["This is the first instance .", "Second instance .", "Third instance ."],
        'words': [['this', 'is', 'the', 'first', 'instance', '.'], ['Second', 'instance', '.'], ['Third', 'instance', '.']],
        'seq_len': [6, 3, 3]}
dataset = DataSet(data)
print(dataset)

使用append向DataSet中增加数据

from fastNLP import Instance
instance = Instance(raw_words="This is the fourth instance",
                    words=['this', 'is', 'the', 'fourth', 'instance', '.'],
                    seq_len=6)
dataset.append(instance)
print(dataset)

Instance方式构建datset

from fastNLP import DataSet
from fastNLP import Instance
dataset = DataSet([
    Instance(raw_words="This is the first instance",
        words=['this', 'is', 'the', 'first', 'instance', '.'],
        seq_len=6),
    Instance(raw_words="Second instance .",
        words=['Second', 'instance', '.'],
        seq_len=3)
    ])
print(dataset)

±-------------------------±-------------------------±--------+
| raw_words | words | seq_len |
±-------------------------±-------------------------±--------+
| This is the first ins… | [‘this’, ‘is’, ‘the’,… | 6 |
| Second instance . | [‘Second’, ‘instance’… | 3 |
±-------------------------±-------------------------±--------+

dataset的删除

from fastNLP import DataSet
dataset = DataSet({'a':range(-5,5),'c':[0]*10})
# 不改变dataset,生成一个删除了满足条件的instance的新DataSet
dropped_dataset = dataset.drop(lambda ins:ins['a'] < 0,inplace = False)
print('条件删除a<0:',dropped_dataset)
print('删除第2个元素:',dataset.delete_instance(1))
# 检查是否有field存在
# 删除 dataset.delete_field('a')
print('a列存在嘛?',dataset.has_field('a'))
print('将c列名称改为b:',dataset.rename_field('c','b'))
print('dataset的长度:',len(dataset))

条件删除a<0: ±–±--+
| a | c |
±–±--+
| 0 | 0 |
| 1 | 0 |
| 2 | 0 |
| 3 | 0 |
| 4 | 0 |
±–±--+
删除第2个元素: ±—±–+
| a | c |
±—±–+
| -5 | 0 |
| -3 | 0 |
| -2 | 0 |
| -1 | 0 |
| 0 | 0 |
| 1 | 0 |
| 2 | 0 |
| 3 | 0 |
| 4 | 0 |
±—±–+
a列存在嘛? True
将c列名称改为b: ±—±–+
| a | b |
±—±–+
| -5 | 0 |
| -3 | 0 |
| -2 | 0 |
| -1 | 0 |
| 0 | 0 |
| 1 | 0 |
| 2 | 0 |
| 3 | 0 |
| 4 | 0 |
±—±–+
dataset的长度: 9

简单数据预处理

from fastNLP import DataSet
data = {'raw_words':["This is the first instance .", "Second instance .", "Third instance ."]}
dataset =DataSet(data)
# 将单词切分，并赋予新的列
dataset.apply(lambda ins:ins['raw_words'].split(),new_field_name = 'words')
# 或使用DataSet.apply_field()
dataset.apply_field(lambda sent:sent.split(),field_name = 'raw_words',new_field_name = 'new_words')
# 定义函数创建新列
def get_words(instance):
    sentence = instance['raw_words']
    words = sentence.split()
    return words
dataset.apply(get_words,new_field_name = 'func_words')
dataset

爱喝喜茶爱吃烤冷面的小黑黑

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
小黑fastNLP成长日记1:DataSet构建

DataSet的构建字典构建from fastNLP import DataSet# 传入字典构建datasetdata = {'raw_words':["This is the first instance .", "Second instance .", "Third instance ."], 'words': [['this', 'is', 'the', 'first', 'instance', '.'], ['Second', 'instance', '.'], ['Thi
复制链接

扫一扫