python 3.9 执行 `from datasets import load_dataset`时出现如下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/hanhq/anaconda3/lib/python3.9/site-packages/datasets/__init__.py", line 18, in <module>
from .arrow_dataset import Dataset
File "/home/hanhq/anaconda3/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 67, in <module>
from .arrow_reader import ArrowReader
File "/home/hanhq/anaconda3/lib/python3.9/site-packages/datasets/arrow_reader.py", line 29, in <module>
import pyarrow.parquet as pq
File "/home/hanhq/anaconda3/lib/python3.9/site-packages/pyarrow/parquet/__init__.py", line 20, in <module>
from .core import *
File "/home/hanhq/anaconda3/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 52, in <module>
from pyarrow.fs import (LocalFileSystem, FileSystem, FileType,
File "/home/hanhq/anaconda3/lib/python3.9/site-packages/pyarrow/fs.py", line 49, in <module>
from pyarrow._gcsfs import GcsFileSystem # noqa
File "pyarrow/_gcsfs.pyx", line 1, in init pyarrow._gcsfs
ValueError: pyarrow.lib.NativeFile size changed, may indicate binary incompatibility. Expected 104 from C header, got 96 from PyObject
解决办法:
我的安装了12.0.1的pyarrow就可以了
pip install pyarrow==12.0.1
pip install cchardet另一个解决方法
pip install pyarrow==11.0.0