书中下载数据源代码:
import os
import tarfile
import urllib
DOWNLOAD_ROOT = "http://raw.githubusercontent.com/ageron/handson-ml2/master"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
os.makedirs(housing_path, exist_ok=True)
tgz_path = os.path.join(housing_path,"housing.tgz")
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()
运行fetch_housing_data()函数后出现HTTPError: HTTP Error 404: Not Found的报错,下载地址无法打开。
书中给出另一个下载相关数据的地址:handson-ml2/housing.tgz at master · ageron/handson-ml2 · GitHub
但是将代码内网址替换后,仍然无法打开,很抱歉我也是个新手小白,所以通过代码下载我还不太熟练,所以只能换另外一种方法。
在handson-ml2/housing.tgz at master · ageron/handson-ml2 · GitHub中找到housing.tgz,下载到本地,终端解压文件,命令格式:$ tar zxvf 文件名 -C 指定下载路径。
解压完成后,进入jupyter notebook dataset/housing文件夹下,点击upload上传刚刚解压完成的housing.csv文件就OK了。