r语言 tunerf_R:tuneRF函数的行为不清楚(randomForest包)

I feel uncomfortable with the meaning of the stepFactor parameter of the tuneRF function which is used for tuning the mtry parameter used further in the randomForest function.

The documentation of tuneRF says that stepFactor is a magnitude by which

the chosen mtry gets deflated or inflated.

Obviously, since mtry is a number of variables chosen randomly, it has to be an integer, however I saw many examples on the net using stepFactor=1.5.

At first I thought that R uses by default next mtry equal to floor(mtry_current-stepFactor), but it turned out that I was wrong.

Moreover, I do not understand the R commands displaying search left... search right... while tuneRF is working.

I thought it was the information on either inflating or deflating the mtry parameter but my suppositions did not turn out to be correct.

To sum up this long and not too graceful description of my doubts, my questions are:

why is stepFactor NOT an integer number??

How are subsequent mtry values chosen?

What searching left/right actually mean??

Any help would be very much appreciated!! :)

解决方案

Below is a summary of how tuneRF works:

a. Set mtry to the default value of sqrt(p) for classification, and p/3 for regression (where p = total number of variables)

b. Compute the out-of-bag (OOB) error (say error_default) for a Random Forest with mtry set to the default value found above

a. Look to the left: set mtry = default value/stepFactor. For instance, if stepFactor=1.5 and your default starting value is 8, mtry would be set to be 8/1.5=5.33, rounded up to the be an integer, which gives 6

b. Compute the OOB error, say error_left

a. Look to the right: set mtry = default value*stepFactor. To continue with my example, mtry would be set to be 8*1.5=12

b. Compute the OOB error, say error_right

i. If (error_default < error_right) OR (error_default < error_left), the best mtry is the default value

ii. If the previous condition is not met, but the delta between errors_default and error_right/error_left is less than the improve parameter, the best mtry is the default value

iii. Without any loss of generality, if the condition is not met, and if error_right < error_left, and if (error_default-error_right) > improve, set mtry to be mtry_right (12). From now on, always go to the right

If 4.iii. is verified, iterate: set mtry to be mtry_right*stepFactor (in my example, 12*1.5=18), compute the OOB error and compare it with the error obtained at the previous step (in my example, for mtry=12). If the error new error is smaller, and if the gain in error reduction is enough (i.e, >improve), select the new mtry and continue to repeat these steps, otherwise stop and return the current mtry as the best mtry

The smaller stepFactor you set (e.g., 1.1, 1.2), the more values of mtry you try (fine search), the bigger stepFactor you set (e.g., 2, 2.5), the less values you try (rough search). Also, with low values of improve, the search will continue longer.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值