行路难,行路难;多歧路,今安在?

本人的首次Python实践任务,花了一晚上4个小时时间,将本该一步完成的任务绕了一大圈,最后还是剩个Warning去不掉。。。

内容是大学课程大作业,要求用Forward Selection预测材料性能,并提供了github教程GitHub - chris-santiago/steps: A SciKit-Learn style feature selector using best subsets and stepwise regression.

import pandas as pd
from steps.forward import ForwardSelector

selector = ForwardSelector(normalize=True, metric='aic')
selector.fit(X, y)
X.loc[:, selector.best_support_]

1.正确答案

代码就三行。模型中X,y应分别为训练集train_values和目标函数train_labels。但正确的代码应改写为:

from steps.forward import ForwardSelector

selector = ForwardSelector(normalize=True, metric='aic')
selector.fit(train_values, train_labels)
train_values[:, selector.best_support_]

可以看出唯一的区别是删去了.loc。

2.错误示范

其实ForwardSelector库本身就可以接受Numpy array或者DataFrame的输入。但本人就在这前一天学了DataFrame的使用,并“敏锐”地在课件前面的dataset getting中找到了df

# Pandas Dataframe
all_labels = df['density_of_solid'].tolist()
df = df.drop(['density_of_solid'], axis=1)

df.head(n=10) # With this line you can see the first ten entries of our database

于是理所当然地认为这个库的输入为DataFrame.当然,运行之后会有这样的报错:

AttributeError                            Traceback (most recent call last)
<ipython-input-32-dbad8d7d0bd2> in <module>
      2 selector = ForwardSelector(normalize=True, metric='aic')
      3 selector.fit(train_values, train_labels)
----> 4 train_values.loc[:, selector.best_support_]

AttributeError: 'numpy.ndarray' object has no attribute 'loc'

问了gpt知道一般代码会使用Numpy array,于是从头开始开始一行一行代码往下寻找。幸好课件的程序规范有批注:

def plot(y_train = np.empty(0), y_test = np.empty(0), predictions_train = np.empty(0), predictions_test = np.empty(0)):
    
    # The reshape functions in the next two lines, turns each of the
    # vertical NumPy array [[x]
    #                       [y]
    #                       [z]]
    # into python lists [ x, y, z]
    
    # This step is required to create plots with plotly like we did in the previous tutorial

    y_train = y_train.reshape(1,-1).tolist()[0]
    y_test = y_test.reshape(1,-1).tolist()[0]    
    predictions_train = predictions_train.reshape(1,-1).tolist()[0]
    predictions_test = predictions_test.reshape(1,-1).tolist()[0]
    k = np.arange(-50,21000).reshape(1,-1).tolist()[0]

于是用gpt把Numpy array改回来:

#将NumPy array格式改回DataFrame
train_values = pd.DataFrame(train_values)
train_labels = pd.DataFrame(train_labels)

最后为了套用画图的代码再改回Numpy array:

#We will rewrite the arrays with the patches we made on the dataset by turning the dataframe back into a list of lists

all_values = [list(df.iloc[x]) for x in range(len(all_values))]

# SETS

# List of lists are turned into Numpy arrays to facilitate calculations in steps to follow.
all_values = np.array(all_values, dtype = float) 
print("Shape of Values:", all_values.shape)
all_labels = np.array(all_labels, dtype = float)
print("Shape of Labels:", all_labels.shape)

总而言之,就是绕了很大很大一圈。本来只有一步之遥,但在岔路口选择了另一个方向。归根结底还是基础不够扎实。后来想想,其实如果没有学DataFrame,甚至也不会有这样的错误。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值