问题
运行SimCSE的时候发现的问题
对于这一行代码报错
datasets = load_dataset(extension, data_files=data_files, cache_dir="./data/")
意思就是访问不了这个地址,在浏览器试了一下,确实不行。应该是被墙给挡住了。
File "C:\developmentTool\Anaconda\envs\SimCSE\lib\site-packages\datasets\utils\file_utils.py", line 617, in get_from_cache
raise ConnectionError("Couldn't reach {}".format(url))
ConnectionError: Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/1.2.1/datasets/text/text.py
使用代理,还是报错。
requests.exceptions.SSLError: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Max retries exceeded with url: /datasets.huggingface.co/datasets/datasets/text/text.py (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)')))
解决方法
使用本地加载
- 在浏览器中把文件下载下来
https://raw.githubusercontent.com/huggingface/datasets/1.2.1/datasets/text/text.py
- 把extension修改成文件的地址
datasets = load_dataset("data/text.py", data_files=data_files, cache_dir="./data/")