python | nupic，一个强大的处理时间序列的Python 库！

本文链接：https://blog.csdn.net/csdn_xmj/article/details/139358822

本文来源公众号“python”，仅用于学术分享，侵权删，干货满满。

大家好，今天为大家分享一个强大的 Python 库 - nupic。

Github地址：https://github.com/numenta/nupic-legacy

随着人工智能和机器学习技术的迅猛发展，神经网络和深度学习已经成为许多应用的核心。然而，对于某些实时数据流和异常检测任务，传统的神经网络方法可能并不适用。NuPIC（Numenta Platform for Intelligent Computing）是一个基于HTM（Hierarchical Temporal Memory）理论的机器智能平台，旨在模拟大脑的新皮层功能，特别擅长处理时间序列数据和异常检测。本文将详细介绍NuPIC库，包括其安装方法、主要特性、基本和高级功能，以及实际应用场景，帮助全面了解并掌握该库的使用。

1 安装

要使用NuPIC库，首先需要安装它。可以通过pip工具方便地进行安装。

以下是安装步骤：

pip install nupic

安装完成后，可以通过导入nupic库来验证是否安装成功：

import nupic
print("NuPIC库安装成功！")

2 特性

时间序列数据处理：擅长处理时间序列数据，能够进行预测和异常检测。
基于HTM理论：模拟大脑的新皮层功能，具有自学习和自适应能力。
实时处理：支持实时数据流处理，适用于在线学习和实时异常检测。
多平台支持：支持多种操作系统和硬件平台，具有良好的扩展性和适应性。
丰富的API：提供丰富的API，方便开发者进行定制化开发。

3 基本功能

3.1 构建时间序列预测模型

使用NuPIC库，可以方便地构建时间序列预测模型。

以下是一个简单的示例：

from nupic.frameworks.opf.model_factory import ModelFactory
from nupic.data.datasethelpers import findDataset

# 加载数据集
datasetPath = findDataset("extra/keyboard/rec-center-hourly.csv")
model = ModelFactory.create(modelConfig)

# 训练模型
with open(datasetPath, "r") as f:
    for line in f:
        model.run(line.strip().split(','))

print("时间序列预测模型构建成功！")

3.2 进行预测

训练完成后，可以使用模型进行预测。

以下是一个示例，演示如何进行预测：

from nupic.data.datasethelpers import findDataset

# 加载数据集
datasetPath = findDataset("extra/keyboard/rec-center-hourly.csv")

# 进行预测
with open(datasetPath, "r") as f:
    for line in f:
        result = model.run(line.strip().split(','))
        print("预测结果:", result.inferences["multiStepBestPredictions"][1])

3.3 异常检测

NuPIC库提供了强大的异常检测功能。

以下是一个示例：

from nupic.frameworks.opf.model_factory import ModelFactory
from nupic.data.datasethelpers import findDataset

# 加载数据集
datasetPath = findDataset("extra/keyboard/rec-center-hourly.csv")
model = ModelFactory.create(modelConfig)

# 训练模型并进行异常检测
with open(datasetPath, "r") as f:
    for line in f:
        result = model.run(line.strip().split(','))
        anomalyScore = result.inferences["anomalyScore"]
        if anomalyScore > 0.8:
            print("异常检测: 异常得分为", anomalyScore)

4 高级功能

4.1 自定义模型配置

NuPIC库允许用户自定义模型配置，以适应不同的数据和任务。

以下是一个示例：

from nupic.frameworks.opf.model_factory import ModelFactory
from nupic.data.datasethelpers import findDataset

# 自定义模型配置
modelConfig = {
    "aggregationInfo": {"seconds": 0, "fields": [], "months": 0, "days": 0, "years": 0, "hours": 0, "microseconds": 0, "weeks": 0, "minutes": 0, "milliseconds": 0},
    "model": "HTMPrediction",
    "modelParams": {
        "sensorParams": {
            "encoders": {
                "timestamp_dayOfWeek": {"fieldname": "timestamp", "type": "DateEncoder", "dayOfWeek": (21, 1)},
                "timestamp_timeOfDay": {"fieldname": "timestamp", "type": "DateEncoder", "timeOfDay": (21, 1)},
                "timestamp_weekend": {"fieldname": "timestamp", "type": "DateEncoder", "weekend": 21},
                "value": {"fieldname": "value", "type": "RandomDistributedScalarEncoder", "resolution": 0.88}
            }
        },
        "spEnable": True,
        "spParams": {"spVerbosity": 0, "globalInhibition": 1, "columnCount": 2048, "inputWidth": 0, "numActiveColumnsPerInhArea": 40, "seed": 1956, "potentialPct": 0.8, "synPermInactiveDec": 0.005, "synPermActiveInc": 0.04, "synPermConnected": 0.1, "minPctOverlapDutyCycle": 0.001, "dutyCyclePeriod": 1000, "maxBoost": 1.0},
        "tpEnable": True,
        "tpParams": {"verbosity": 0, "columnCount": 2048, "cellsPerColumn": 32, "inputWidth": 2048, "seed": 1960, "temporalImp": "cpp", "newSynapseCount": 20, "maxSynapsesPerSegment": 32, "maxSegmentsPerCell": 128, "initialPerm": 0.21, "permanenceInc": 0.1, "permanenceDec": 0.1, "globalDecay": 0.0, "maxAge": 0, "minThreshold": 9, "activationThreshold": 12, "outputType": "normal", "pamLength": 1},
        "clEnable": True,
        "clParams": {"regionName": "SDRClassifierRegion", "clVerbosity": 0, "alpha": 0.0001, "steps": "1"},
        "anomalyParams": {"anomalyCacheRecords": None, "autoDetectThreshold": None, "autoDetectWaitRecords": 5030}
    },
    "trainSPNetOnlyIfRequested": False
}

# 加载数据集
datasetPath = findDataset("extra/keyboard/rec-center-hourly.csv")
model = ModelFactory.create(modelConfig)

# 训练模型并进行预测
with open(datasetPath, "r") as f:
    for line in f:
        result = model.run(line.strip().split(','))
        print("预测结果:", result.inferences["multiStepBestPredictions"][1])

4.2 实时数据流处理

NuPIC库支持实时数据流处理，适用于在线学习和实时异常检测。

以下是一个示例：

import time
from nupic.frameworks.opf.model_factory import ModelFactory

# 自定义模型配置
modelConfig = {
    "aggregationInfo": {"seconds": 0, "fields": [], "months": 0, "days": 0, "years": 0, "hours": 0, "microseconds": 0, "weeks": 0, "minutes": 0, "milliseconds": 0},
    "model": "HTMPrediction",
    "modelParams": {
        "sensorParams": {
            "encoders": {
                "timestamp_dayOfWeek": {"fieldname": "timestamp", "type": "DateEncoder", "dayOfWeek": (21, 1)},
                "timestamp_timeOfDay": {"fieldname": "timestamp", "type": "DateEncoder", "timeOfDay": (21, 1)},
                "timestamp_weekend": {"fieldname": "timestamp", "type": "DateEncoder", "weekend": 21},
                "value": {"fieldname": "value", "type": "RandomDistributedScalarEncoder", "resolution": 0.88}
            }
        },
        "spEnable": True,
        "spParams": {"spVerbosity": 0, "globalInhibition": 1, "columnCount": 2048, "inputWidth": 0, "numActiveColumnsPerInhArea": 40, "seed": 1956, "potentialPct": 0.8, "synPermInactiveDec": 0.005, "synPermActiveInc": 0.04, "synPermConnected": 0.1, "minPctOverlapDutyCycle": 0.001, "dutyCyclePeriod": 1000, "maxBoost": 1.0},
        "tpEnable": True,
        "tpParams": {"verbosity": 0, "columnCount": 2048, "cellsPerColumn": 32,"inputWidth": 2048, "seed": 1960, "temporalImp": "cpp", "newSynapseCount": 20, "maxSynapsesPerSegment": 32, "maxSegmentsPerCell": 128, "initialPerm": 0.21, "permanenceInc": 0.1, "permanenceDec": 0.1, "globalDecay": 0.0, "maxAge": 0, "minThreshold": 9, "activationThreshold": 12, "outputType": "normal", "pamLength": 1},
        "clEnable": True,
        "clParams": {"regionName": "SDRClassifierRegion", "clVerbosity": 0, "alpha": 0.0001, "steps": "1"},
        "anomalyParams": {"anomalyCacheRecords": None, "autoDetectThreshold": None, "autoDetectWaitRecords": 5030}
    },
    "trainSPNetOnlyIfRequested": False
}

# 创建模型
model = ModelFactory.create(modelConfig)

# 模拟实时数据流
def stream_data():
    import random
    import datetime

    while True:
        value = random.gauss(10, 1)
        timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        yield {"timestamp": timestamp, "value": value}
        time.sleep(1)

# 处理实时数据流
for data in stream_data():
    result = model.run([data["timestamp"], data["value"]])
    anomaly_score = result.inferences["anomalyScore"]
    print(f"时间: {data['timestamp']}, 值: {data['value']}, 异常得分: {anomaly_score}")
    if anomaly_score > 0.8:
        print("检测到异常！")

5 总结

NuPIC库是一个功能强大且独特的时间序列数据处理和异常检测工具，能够帮助开发者高效地处理各种实时数据流任务。通过支持基于HTM理论的时间序列预测、异常检测、多步预测和自定义模型配置等特性，NuPIC库能够满足各种复杂的应用需求。本文详细介绍了NuPIC库的安装方法、主要特性、基本和高级功能，以及实际应用场景。希望本文能帮助大家全面掌握NuPIC库的使用，并在实际项目中发挥其优势。

THE END !

文章结束，感谢阅读。您的点赞，收藏，评论是我继续更新的动力。大家有推荐的公众号可以评论区留言，共同学习，一起进步。