python随机森林回归,Python Scikit随机森林回归器错误

最新推荐文章于 2024-05-28 17:44:06 发布

weixin_39590739

最新推荐文章于 2024-05-28 17:44:06 发布

阅读量283

点赞数

文章标签： python随机森林回归

在尝试使用scikit-learn的随机森林回归器进行预测时，出现了一个错误。代码试图对预测结果的每个元素的第二个元素进行索引，但随机森林回归的预测返回的是浮点数，而不是包含多个元素的结构。因此，尝试对浮点数进行索引导致了错误。解决方法是理解随机森林回归的输出并正确处理它。

摘要由CSDN通过智能技术生成

I am trying to load training and test data from a csv, run the random forest regressor in scikit/sklearn, and then predict the output from the test file.

The TrainLoanData.csv file contains 5 columns; the first column is the output and the next 4 columns are the features. The TestLoanData.csv contains 4 columns - the features.

When I run the code, I get error:

predicted_probs = ["%f" % x[1] for x in predicted_probs]

IndexError: invalid index to scalar variable.

What does this mean?

Here is my code:

import numpy, scipy, sklearn, csv_io //csv_io from https://raw.github.com/benhamner/BioResponse/master/Benchmarks/csv_io.py

from sklearn import datasets

from sklearn.ensemble import RandomForestRegressor

def main():

#read in the training file

train = csv_io.read_data("TrainLoanData.csv")

#set the training responses

target = [x[0] for x in train]

#set the training features

train = [x[1:] for x in train]

#read in the test file

realtest = csv_io.read_data("TestLoanData.csv")

# random forest code

rf = RandomForestRegressor(n_estimators=10, min_samples_split=2, n_jobs=-1)

# fit the training data

print('fitting the model')

rf.fit(train, target)

# run model against test data

predicted_probs = rf.predict(realtest)

print predicted_probs

predicted_probs = ["%f" % x[1] for x in predicted_probs]

csv_io.write_delimited_file("random_forest_solution.csv", predicted_probs)

main()

解决方案

The return value from a RandomForestRegressor is an array of floats:

In [3]: rf = RandomForestRegressor(n_estimators=10, min_samples_split=2, n_jobs=-1)

In [4]: rf.fit([[1,2,3],[4,5,6]],[-1,1])

Out[4]:

RandomForestRegressor(bootstrap=True, compute_importances=False,

criterion='mse', max_depth=None, max_features='auto',

min_density=0.1, min_samples_leaf=1, min_samples_split=2,

n_estimators=10, n_jobs=-1, oob_score=False,

random_state=,

verbose=0)

In [5]: rf.predict([1,2,3])

Out[5]: array([-0.6])

In [6]: rf.predict([[1,2,3],[4,5,6]])

Out[6]: array([-0.6, 0.4])

So you're trying to index a float like (-0.6)[1], which is not possible.

As a side note, the model does not return probabilities.

weixin_39590739

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。