序列化sklearn model 为 json (sklearn-json)

最新推荐文章于 2023-06-21 23:47:52 发布

wendaocp

最新推荐文章于 2023-06-21 23:47:52 发布

阅读量1.6k

点赞数 1

分类专栏：机器学习文章标签： python 机器学习编程语言

本文链接：https://blog.csdn.net/wendaocp/article/details/105112863

版权

机器学习专栏收录该内容

4 篇文章 0 订阅

订阅专栏

sklearn-json

前言
安装 sklearn-json
使用
- 序列化模型为json
- 反序列化
参考

前言

需求：导出sklearn训练好的算法模型为json格式，方便在不同编程语言间传递数据。
方案：使用 sklearn-json

安装 sklearn-json

pip install sklearn-json

注：需要 scikit-learn >= 0.21.3

使用

序列化模型为json

以分类决策树为例子

from sklearn import tree
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
import sklearn_json as skljson

# data
wine = load_wine()

# train/test split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(wine.data, wine.target, test_size=0.3)

# train with deicision tree
clf = tree.DecisionTreeClassifier(criterion='gini', max_depth=5, random_state=0)
clf = clf.fit(Xtrain, Ytrain) # after fit, clf is the model

# save model to json
skljson.to_json(clf, "tree_model") # 重点重点重点

至此，分类决策树已经存成json格式。
json是肉眼可理解的，打开"tree_model"文件，看到如下：

{
  "meta": "decision-tree",
  "feature_importances_": [
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.0,
    0.4245297894999867,
    0.0,
    0.0,
    0.39702026297677956,
    0.02368943909521627,
    0.046465929707028834,
    0.10829457872098862
  ],
  "max_features_": 13,
  "n_classes_": 3,
  "n_features_": 13,
  "n_outputs_": 1,
  "tree_": {
    "max_depth": 3,
    "node_count": 11,
    "nodes": [
      [
        1,
        4,
        9,
        3.819999933242798,
        0.6619406867845994,
        124,
        124.0
      ],
      [
        2,
        3,
        11,
        3.694999933242798,
        0.08869659275283936,
        43,
        43.0
      ],
      [
        -1,
        -1,
        -2,
        -2.0,
        0.0,
        41,
        41.0
      ],
      [
        -1,
        -1,
        -2,
        -2.0,
        0.0,
        2,
        2.0
      ],
      [
        5,
        8,
        6,
        1.5800000429153442,
        0.5639384240207286,
        81,
        81.0
      ],
      [
        6,
        7,
        10,
        0.9699999988079071,
        0.054012345679012363,
        36,
        36.0
      ],
      [
        -1,
        -1,
        -2,
        -2.0,
        0.0,
        35,
        35.0
      ],
      [
        -1,
        -1,
        -2,
        -2.0,
        0.0,
        1,
        1.0
      ],
      [
        9,
        10,
        12,
        670.0,
        0.19753086419753085,
        45,
        45.0
      ],
      [
        -1,
        -1,
        -2,
        -2.0,
        0.0,
        5,
        5.0
      ],
      [
        -1,
        -1,
        -2,
        -2.0,
        0.0,
        40,
        40.0
      ]
    ],
    "values": [
      [
        [
          42.0,
          47.0,
          35.0
        ]
      ],
      [
        [
          2.0,
          41.0,
          0.0
        ]
      ],
      [
        [
          0.0,
          41.0,
          0.0
        ]
      ],
      [
        [
          2.0,
          0.0,
          0.0
        ]
      ],
      [
        [
          40.0,
          6.0,
          35.0
        ]
      ],
      [
        [
          0.0,
          1.0,
          35.0
        ]
      ],
      [
        [
          0.0,
          0.0,
          35.0
        ]
      ],
      [
        [
          0.0,
          1.0,
          0.0
        ]
      ],
      [
        [
          40.0,
          5.0,
          0.0
        ]
      ],
      [
        [
          0.0,
          5.0,
          0.0
        ]
      ],
      [
        [
          40.0,
          0.0,
          0.0
        ]
      ]
    ],
    "nodes_dtype": [
      "<i8",
      "<i8",
      "<i8",
      "<f8",
      "<f8",
      "<i8",
      "<f8"
    ]
  },
  "classes_": [
    0,
    1,
    2
  ],
  "params": {
    "ccp_alpha": 0.0,
    "class_weight": null,
    "criterion": "gini",
    "max_depth": 5,
    "max_features": null,
    "max_leaf_nodes": null,
    "min_impurity_decrease": 0.0,
    "min_impurity_split": null,
    "min_samples_leaf": 1,
    "min_samples_split": 2,
    "min_weight_fraction_leaf": 0.0,
    "presort": "deprecated",
    "random_state": 0,
    "splitter": "best"
  }
}

此json文件包含了所有的关于已经训练好的分类决策树模型的相关数据。
若想深入了解json中各个属性的含义，尤其是最核心的tree_.nodes，可以结合树的graphviz可视化去理解。

反序列化

此处给出在python中的反序列化

# 承接上面的代码

model = skljson.from_json("tree_model") # 重点重点重点

print(model.score(Xtrain, Ytrain)) # accuray of training dataset
print(model.score(Xtest, Ytest)) # accuracy of test dataset

print(model.predict(Xtest)) # prediction of test dataset

若想在其他语言比如java中解析json，可自行解决，网上方法有很多。