随机森林

最新推荐文章于 2022-07-16 16:10:12 发布

Frank_07

最新推荐文章于 2022-07-16 16:10:12 发布

阅读量443

点赞数

分类专栏：机器学习

机器学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

一个简短的python例子

Scikit-Learn是开始使用随机森林的一个很好的方式。scikit-learn API在所以算法中极其的一致，所有你测试和在不同的模型间切换非常容易。很多时候，我从一些简单的东西开始，然后转移到了随机森林。

随机森林在scikit-learn中的实现最棒的特性是n_jobs参数。这将会基于你想使用的核数自动地并行设置随机森林。这里是scikit-learn的贡献者Olivier Grisel的一个很棒的报告，在这个报告中他谈论了使用20个节点的EC2集群训练随机森林。

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.75
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
df.head()

这里写图片描述

train, test = df[df['is_train']==True], df[df['is_train']==False]

features = df.columns[:4]
clf = RandomForestClassifier(n_jobs=2)
y, _ = pd.factorize(train['species'])
clf.fit(train[features], y)

preds = iris.target_names[clf.predict(test[features])]
pd.crosstab(test['species'], preds, rownames=['actual'], colnames=['preds'])

这里写图片描述

转载：
http://www.cnblogs.com/maybe2030/p/4585705.html
http://www.oschina.net/translate/random-forests-in-python?cmp

2、回归

import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
import numpy as np

x_train = np.random.uniform(1, 100, 1000).reshape(-1, 1)
y_train = np.log(x_train).reshape(-1, 1) + np.random.normal(0, .3, 1000).reshape(-1, 1)

x_test = np.random.uniform(1, 100, 1000).reshape(-1, 1)
y_test = np.log(x_test).reshape(-1, 1) + np.random.normal(0, .3, 1000).reshape(-1, 1)


def random_forest():
    clf = RandomForestRegressor(n_estimators=100,max_features=0.8,oob_score=True,n_jobs=-1,random_state=50,min_samples_leaf =1)
    clf.fit(x_train, y_train)

    plt.figure()
    pred = clf.predict(x_test)

    ax = plt.subplot(211)
    ax.plot(x_test, y_test, 'b.')
    ax.legend(['real'])
    bx = plt.subplot(212)
    bx.plot(x_test, pred, 'r.')
    bx.legend(['pred'])
    plt.show()


if __name__ == '__main__':
    random_forest()