如下方式使用 datasets 时
from datasets import load_dataset
dataset = load_dataset("kigner/ruozhiba-llama3", split = "train")
datasets 版本:2.20.0
出现报错:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/username/miniconda3/lib/python3.10/site-packages/datasets/load.py", line 2594,in load_dataset
builder_instance = load_dataset_builder(
File "/home/username/miniconda3/lib/python3.10/site-packages/datasets/load.py", line 2266,in load_dataset_builder
dataset_module = dataset_module_factory(
File "/home/username/miniconda3/lib/python3.10/site-packages/datasets/load.py", line 1827,in dataset_module_factory
).get_module()
File "/home/username/miniconda3/lib/python3.10/site-packages/datasets/load.py", line 1034,in get_module
patterns = get_data_patterns(base_path)
File "/home/username/miniconda3/lib/python3.10/site-packages/datasets/data_files.py", line501, in get_data_patterns
return _get_data_files_patterns(resolver)
File "/home/username/miniconda3/lib/python3.10/site-packages/datasets/data_files.py", line295, in _get_data_files_patterns
data_files = pattern_resolver(pattern)
File "/home/username/miniconda3/lib/python3.10/site-packages/datasets/data_files.py", line388, in resolve_pattern
for filepath, info in fs.glob(pattern, detail=True, **glob_kwargs).items()
File "/home/username/miniconda3/lib/python3.10/site-packages/fsspec/spec.py", line 606, inglob
pattern = glob_translate(path + ("/" if ends_with_sep else ""))
File "/home/username/miniconda3/lib/python3.10/site-packages/fsspec/utils.py", line 734, in glob_translate
raise ValueError(
ValueError: Invalid pattern: '**' can only be an entire path component
修改数据集为 完整路径依旧不行
搜索说是 fsspec
库的原因:
https://github.com/huggingface/datasets/issues/6737
安装旧版本的 fsspec
库
pip install spec==2023.9.2
虽然其他依赖库可能出问题,但 datasets 再次加载上述代码,没有再报错。
2024-07-15(一)