datasets DatasetBuilder类

最新推荐文章于 2024-05-13 14:33:51 发布

不负韶华ღ

最新推荐文章于 2024-05-13 14:33:51 发布

阅读量592

点赞数

分类专栏： # transformers 文章标签： python 开发语言

本文链接：https://blog.csdn.net/weixin_49346755/article/details/125294453

版权

transformers 专栏收录该内容

20 篇文章 8 订阅

订阅专栏

对象属性/方法

1、as_dataset函数

dataset_builder.as_dataset(
		split: Optional[Split] = None, 
		run_post_process=True, 
		ignore_verifications=False, 
		in_memory=False
    )

通过构造器返回一个Dataset对象。

>>> from datasets import load_dataset_builder
>>> builder = load_dataset_builder('rotten_tomatoes')
>>> ds = builder.download_and_prepare()
>>> ds = builder.as_dataset(split='train')
>>> ds
Dataset({
    features: ['text', 'label'],
    num_rows: 8530
})

2、download_and_prepare函数

dataset_builder.download_and_prepare(
        download_config: Optional[DownloadConfig] = None,
        download_mode: Optional[DownloadMode] = None,
        ignore_verifications: bool = False,
        try_from_hf_gcs: bool = True,
        dl_manager: Optional[DownloadManager] = None,
        base_path: Optional[str] = None,
        use_auth_token: Optional[Union[bool, str]] = None,
        **download_and_prepare_kwargs,
    )

下载并且准备好数据。

>>> from datasets import load_dataset_builder
>>> builder = load_dataset_builder('rotten_tomatoes')
>>> ds = builder.download_and_prepare()
# 等价
>>> df = load_dataset('rotten_tomatoes')

3、get_exported_dataset_info函数

dataset_builder.get_exported_dataset_info()

返回一个DatasetInfo对象，获取数据集的信息。

>>> from datasets import load_dataset_builder
>>> ds_builder = load_dataset_builder('rotten_tomatoes')
>>> ds_builder.get_exported_dataset_info()
DatasetInfo(description="Movie Review Dataset.
a dataset of containing 5,331 positive and 5,331 negative processed
s from Rotten Tomatoes movie reviews. This data was first used in Bo
 Lillian Lee, ``Seeing stars: Exploiting class relationships for
t categorization with respect to rating scales.'', Proceedings of the
5.
ion='@InProceedings{Pang+Lee:05a,
 =       {Bo Pang and Lillian Lee},
=        {Seeing stars: Exploiting class relationships for sentiment
          categorization with respect to rating scales},
tle =    {Proceedings of the ACL},
         2005

age='http://www.cs.cornell.edu/people/pabo/movie-review-data/', license='', features={'text': Value(dtype='string', id=None), 'label': ClassLabel(num_classes=2, names=['neg', 'pos'], id=None)}, post_processed=None, supervised_keys=SupervisedKeysData(input='', output=''), task_templates=[TextClassification(task='text-classification', text_column='text', label_column='label')], builder_name='rotten_tomatoes_movie_review', config_name='default', version=1.0.0, splits={'train': SplitInfo(name='train', num_bytes=1074810, num_examples=8530, dataset_name='rotten_tomatoes_movie_review'), 'validation': SplitInfo(name='validation', num_bytes=134679, num_examples=1066, dataset_name='rotten_tomatoes_movie_review'), 'test': SplitInfo(name='test', num_bytes=135972, num_examples=1066, dataset_name='rotten_tomatoes_movie_review')}, download_checksums={'https://storage.googleapis.com/seldon-datasets/sentence_polarity_v1/rt-polaritydata.tar.gz': {'num_bytes': 487770, 'checksum': 'a05befe52aafda71d458d188a1c54506a998b1308613ba76bbda2e5029409ce9'}}, download_size=487770, post_processing_size=None, dataset_size=1345461, size_in_bytes=1833231)

4、get_all_exported_dataset_infos函数

dataset_builder.get_all_exported_dataset_infos()

返回一个关于DatasetInfo对象的字典，获取数据集的信息。

>>> from datasets import load_dataset_builder
>>> ds_builder = load_dataset_builder('rotten_tomatoes')
>>> ds_builder.get_all_exported_dataset_infos()
{'default': DatasetInfo(description="Movie Review Dataset.
a dataset of containing 5,331 positive and 5,331 negative processed
s from Rotten Tomatoes movie reviews. This data was first used in Bo
 Lillian Lee, ``Seeing stars: Exploiting class relationships for
t categorization with respect to rating scales.'', Proceedings of the
5.
ion='@InProceedings{Pang+Lee:05a,
 =       {Bo Pang and Lillian Lee},
=        {Seeing stars: Exploiting class relationships for sentiment
          categorization with respect to rating scales},
tle =    {Proceedings of the ACL},
         2005

age='http://www.cs.cornell.edu/people/pabo/movie-review-data/', license='', features={'text': Value(dtype='string', id=None), 'label': ClassLabel(num_classes=2, names=['neg', 'pos'], id=None)}, post_processed=None, supervised_keys=SupervisedKeysData(input='', output=''), task_templates=[TextClassification(task='text-classification', text_column='text', label_column='label')], builder_name='rotten_tomatoes_movie_review', config_name='default', version=1.0.0, splits={'train': SplitInfo(name='train', num_bytes=1074810, num_examples=8530, dataset_name='rotten_tomatoes_movie_review'), 'validation': SplitInfo(name='validation', num_bytes=134679, num_examples=1066, dataset_name='rotten_tomatoes_movie_review'), 'test': SplitInfo(name='test', num_bytes=135972, num_examples=1066, dataset_name='rotten_tomatoes_movie_review')}, download_checksums={'https://storage.googleapis.com/seldon-datasets/sentence_polarity_v1/rt-polaritydata.tar.gz': {'num_bytes': 487770, 'checksum': 'a05befe52aafda71d458d188a1c54506a998b1308613ba76bbda2e5029409ce9'}}, download_size=487770, post_processing_size=None, dataset_size=1345461, size_in_bytes=1833231)}

不负韶华ღ

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
datasets DatasetBuilder类

通过构造器返回一个Dataset对象。2、download_and_prepare函数下载并且准备好数据。3、get_exported_dataset_info函数返回一个DatasetInfo对象，获取数据集的信息。4、get_all_exported_dataset_infos函数返回一个关于DatasetInfo对象的字典，获取数据集的信息。...
复制链接

扫一扫