用户网站访问行为预测-----Nupic算法简单Demo

最新推荐文章于 2024-07-08 17:27:47 发布

Jiede1

最新推荐文章于 2024-07-08 17:27:47 发布

阅读量3.3k

点赞数 2

分类专栏：机器学习 python学习

本文链接：https://blog.csdn.net/jiede1/article/details/80773838

版权

本文通过Nupic算法预测用户网站访问类别，详细阐述数据说明、算法运行、代码解析及结果分析，展示了算法在预测用户行为上的应用。

摘要由CSDN通过智能技术生成

之前介绍了两篇Nupic的技术细节—-脑皮层学习算法 —nupic的深入学习（一），脑皮层学习算法 —nupic的深入学习（二），但缺少了利用Nupic的具体实例。这篇文章会利用Nupic算法，基于已有的用户访问网站类别数据，预测用户的访问网站的类别。

1. 数据说明

数据，代码都在Github上。可下载。
列表中的元素就是网站类别

PAGE_CATEGORIES = [
  "frontpage", "news", "tech", "local", "opinion", "on-air", "misc", "weather",
  "msn-news", "health", "living", "business", "msn-sports", "sports", "summary",
  "bbs", "travel"
]

下面就是算法要读取的数据（msnbc990928.zip）

% Different categories found in input file:

frontpage news tech local opinion on-air misc weather msn-news health living business msn-sports sports summary bbs travel


% Sequences:

1 1 
2 
3 2 2 4 2 2 2 3 3 
5 
1 
6 
1 1 
6 
6 7 7 7 6 6 8 8 8 8 
6 9 4 4 4 10 3 10 5 10 4 4 4 
1 1 1 11 1 1 1 
12 12

数据序列中，每行代表一个用户的点击情况，比如第一行，用户先点击了frontpage 1次，然后点击了news 1次，算法要做的工作是，基于已有的用户点击行为，预测下一刻用户的点击行为。

2.算法运行

在Github上下载源码，运行

python webdata.py

算法会完成所有操作，结果会打印在控制台。

3.算法代码解析

算法分为两个架构：1.配置神经网络各个组件的参数；2.依次读取单个用户的数据，训练算法；3. 利用算法预测
分步骤讲述如下：
(一) 配置神经网络各个组件的参数

#网页的类别
# List of page categories used in the dataset
PAGE_CATEGORIES = [
  "frontpage", "news", "tech", "local", "opinion", "on-air", "misc", "weather",
  "msn-news", "health", "living", "business", "msn-sports", "sports", "summary",
  "bbs", "travel"
]

#配置编码器，这里利用SDRCategoryEncoder
# Configure the sensor/input region using the "SDRCategoryEncoder" to encode
# the page category into SDRs suitable for processing directly by the TM
SENSOR_PARAMS = {
  "verbosity": 0,
  "encoders": {
    "page": {
      "fieldname": "page",
      "name": "page",
      "type": "SDRCategoryEncoder",
      # The output of this encoder will be passed directly to the TM region,
      # therefore the number of bits should match TM's "inputWidth" parameter
      "n": 1024,
      # Use ~2% sparsity
      "w": 21
    },
  },
}

#配置时间池组件，使算法有学习功能的组件
# Configure the temporal memory to learn a sequence of page SDRs and make
# predictions on the next page of the sequence.
TM_PARAMS = {
  "seed": 1960,
  # Use "nupic.bindings.algorithms.TemporalMemoryCPP" algorithm
  "temporalImp": "tm_cpp",
  # Should match the encoder output
  "inputWidth": 1024,
  "columnCount": 1024,
  # Use 1 cell per column for first order prediction.
  # Use more cells per column for variable order predictions.
  "cellsPerColumn": 1,
}

#配置Classifier组件，使得算法能够输出预测的网站类别
# Configure the output region with a classifier used to decode TM SDRs back
# into pages
CL_PARAMS = {
  "implementation": "cpp",
  "regionName": "SDRClassifierRegion",
  # alpha parameter controls how fast the classifier learns/forgets. Higher
  # values make it adapt faster and forget older patterns faster.
  "alpha": 0.001,
  "steps": 1,
}

#将所有的参数组合在一起，构成完成的Model
#顺序是# page => [encoder] => [TM] => [classifier] => prediction
# Create a simple HTM network that will receive the current page as input, pass
# the encoded pag