PythonNote018---持久化操作

1. 保存变量

1.1 pickle(泡菜)

  pickle库可以指定多个变量保存在.pickle文件中,如果需要保存的变量不是很多,这种方法可以采用。

import pickle

# obj0, obj1, obj2 are created here...
obj0, obj1, obj2 = [1, 2], [2, 3], [3, 4]
# Saving the objects:
# 通过传递protocol = -1到dump()来减少文件大小
with open('test.pickle', 'wb') as f:  # Python 3: open(..., 'wb')
    pickle.dump([obj0, obj1, obj2], f)
f.close()
# Getting back the objects:
with open('test.pickle', 'rb') as f:  # Python 3: open(..., 'rb')
    x0, x1, x2 = pickle.load(f)
print(x0)
f.close()
[1, 2]

1.2 cPickle

  cPickle的速度更快,其余和cPickle基无差别。在python3里面,cPickle变成_pickle。具体如下:

Docstring:   Optimized C implementation for the Python pickle module.
import  _pickle as cpickle

# obj0, obj1, obj2 are created here...
obj0, obj1, obj2 = [1, 2], [2, 3], [3, 4]
# Saving the objects:
# 通过传递protocol = -1到dump()来减少文件大小
with open('test.pickle', 'wb') as f:  # Python 3: open(..., 'wb')
    cpickle.dump([obj0, obj1, obj2], f)
f.close()
del x0, x1, x2 
# Getting back the objects:
with open('test.pickle', 'rb') as f:  # Python 3: open(..., 'rb')
    x0, x1, x2 = cpickle.load(f)
print(x0)
f.close()
[1, 2]

1.3 shelve

  似乎不支持内建函数等其他的对象,也不是很智能嘛~这样到不如直接用pickle。参考:http://www.php.cn/python-tutorials-410803.html

import shelve

T='Hiya'
val=[1,2,3]

filename='test'
my_shelf = shelve.open(filename,'n') # 'n' for new

for key in dir():
    try:
        my_shelf[key] = globals()[key]
    except:
        #
        # __builtins__, my_shelf, and imported modules can not be shelved.
        #
        print('ERROR shelving: {0}'.format(key))
my_shelf.close()
del val,T
my_shelf = shelve.open(filename)
for key in my_shelf:
    globals()[key]=my_shelf[key]
my_shelf.close()

print(T)
# Hiya
print(val)
# [1, 2, 3]

1.4 dill

  pycharm里可以保存文件,but在jupyter里报错,不知道是个啥原因啊~

dump_session(filename='/tmp/session.pkl', main=None, byref=False)
    pickle the current state of __main__ to a file
import dill
# 保存文件
filename = 'globalsave.pkl'
dill.dump_session(filename)
dill.load_session(filename)

2. 保存模型文件

2.1 .model文件

2.1.1 训练模型

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn import cross_validation, metrics
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# -- 载入鸢尾花数据集
iris_dataset = load_iris()
# -- 数据处理&训练集、测试集切分
rawdata = pd.DataFrame(iris_dataset['data'], columns=['x0', 'x1', 'x2', 'x3'])
rawlabel = pd.DataFrame(iris_dataset['target'], columns=['label'])
dt_model = DecisionTreeClassifier()

train_X, test_X, train_y, test_y = train_test_split(rawdata,
                                                    rawlabel, test_size=0.3, random_state=0)
dt_model.fit(X=train_X, y=train_y)


print(metrics.classification_report(train_y,
                                    dt_model.predict(X=train_X)))

print(metrics.classification_report(test_y,
                                    dt_model.predict(X=test_X)))


2.1.2 保存&调用模型文件。

新版本,直接import joblib

from sklearn.externals import joblib

# 模型保存
joblib.dump(dt_model, './Code/dt_model.model')
# 模型载入
dt_model_load = joblib.load('./Code/dt_model.model')

print(metrics.classification_report(test_y,
                                    dt_model_load.predict(X=test_X)))

2.2 pickle文件

  发现pickle也可以用,不知道是否性能方面存在一定的问题~

import pickle
with open('dt_model.pickle', 'wb') as f:
    pickle.dump(dt_model, f)
f.close()
import pickle
with open('dt_model.pickle', 'rb') as f:  # Python 3: open(..., 'rb')
    x = pickle.load(f)

2.3 pmml文件

  sklearn训练的模型可以保存为pmml文件,似乎可以用java直接调用~后面用到再说,其余不赘。

                                    2018-09-29 于南京 紫东创业园

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值