在Kaggle Store Sales TS Forecasting - A Comprehensive Guide 比赛中,需要将train数据集和holidays等数据集进行融合(pd.merge)。融合的过程中发现如下问题:
问题在于我更改了一位大神的代码。原代码如下:
import pandas as pd
import numpy as np
holidays=pd.read_csv("store-sales-time-series-forecasting/holidays_events.csv")
train=pd.read_csv("store-sales-time-series-forecasting/train.csv")
train["date"]=pd.to_datetime(train["date"])
holidays["date"]=pd.to_datetime(holidays["date"])
print((holidays.date.dtype))
holidays["type"]=np.where(holidays["type"] = np.where(holidays["type"] == "Bridge", "Holiday", holidays["type"])) #我所修改的原代码
d=pd.merge(train,holidays,on="date",how="left")
print(holidays.date.dtype)
import pandas as pd
import numpy as np
holidays=pd.read_csv("store-sales-time-series-forecasting/holidays_events.csv")
train=pd.read_csv("store-sales-time-series-forecasting/train.csv")
train["date"]=pd.to_datetime(train["date"])
holidays["date"]=pd.to_datetime(holidays["date"])
print((holidays.date.dtype))
holidays.loc[holidays["type"]=="Bridge"]="Holiday" #修改后的代码
d=pd.merge(train,holidays,on="date",how="left")
print(holidays.date.dtype)
不知道为什么,使用loc函数进行赋值之后,原本用于join的“date”特征的dtype变成了“object”,导致报错。