之前介绍了两篇Nupic的技术细节—-脑皮层学习算法 —nupic的深入学习(一),脑皮层学习算法 —nupic的深入学习(二),但缺少了利用Nupic的具体实例。这篇文章会利用Nupic算法,基于已有的用户访问网站类别数据,预测用户的访问网站的类别。
1. 数据说明
数据,代码都在Github上。可下载。
列表中的元素就是网站类别
PAGE_CATEGORIES = [
"frontpage", "news", "tech", "local", "opinion", "on-air", "misc", "weather",
"msn-news", "health", "living", "business", "msn-sports", "sports", "summary",
"bbs", "travel"
]
下面就是算法要读取的数据(msnbc990928.zip)
% Different categories found in input file:
frontpage news tech local opinion on-air misc weather msn-news health living business msn-sports sports summary bbs travel
% Sequences:
1 1
2
3 2 2 4 2 2 2 3 3
5
1
6
1 1
6
6 7 7 7 6 6 8 8 8 8
6 9 4 4 4 10 3 10 5 10 4 4 4
1 1 1 11 1 1 1
12 12
数据序列中,每行代表一个用户的点击情况,比如第一行,用户先点击了frontpage 1次,然后点击了news 1次,算法要做的工作是,基于已有的用户点击行为,预测下一刻用户的点击行为。
2.算法运行
在Github上下载源码,运行
python webdata.py
算法会完成所有操作,结果会打印在控制台。
3.算法代码解析
算法分为两个架构:1.配置神经网络各个组件的参数;2.依次读取单个用户的数据,训练算法;3. 利用算法预测
分步骤讲述如下:
(一) 配置神经网络各个组件的参数
#网页的类别
# List of page categories used in the dataset
PAGE_CATEGORIES = [
"frontpage", "news", "tech", "local", "opinion", "on-air", "misc", "weather",
"msn-news", "health", "living", "business", "msn-sports", "sports", "summary",
"bbs", "travel"
]
#配置编码器,这里利用SDRCategoryEncoder
# Configure the sensor/input region using the "SDRCategoryEncoder" to encode
# the page category into SDRs suitable for processing directly by the TM
SENSOR_PARAMS = {
"verbosity": 0,
"encoders": {
"page": {
"fieldname": "page",
"name": "page",
"type": "SDRCategoryEncoder",
# The output of this encoder will be passed directly to the TM region,
# therefore the number of bits should match TM's "inputWidth" parameter
"n": 1024,
# Use ~2% sparsity
"w": 21
},
},
}
#配置时间池组件,使算法有学习功能的组件
# Configure the temporal memory to learn a sequence of page SDRs and make
# predictions on the next page of the sequence.
TM_PARAMS = {
"seed": 1960,
# Use "nupic.bindings.algorithms.TemporalMemoryCPP" algorithm
"temporalImp": "tm_cpp",
# Should match the encoder output
"inputWidth": 1024,
"columnCount": 1024,
# Use 1 cell per column for first order prediction.
# Use more cells per column for variable order predictions.
"cellsPerColumn": 1,
}
#配置Classifier组件,使得算法能够输出预测的网站类别
# Configure the output region with a classifier used to decode TM SDRs back
# into pages
CL_PARAMS = {
"implementation": "cpp",
"regionName": "SDRClassifierRegion",
# alpha parameter controls how fast the classifier learns/forgets. Higher
# values make it adapt faster and forget older patterns faster.
"alpha": 0.001,
"steps": 1,
}
#将所有的参数组合在一起,构成完成的Model
#顺序是# page => [encoder] => [TM] => [classifier] => prediction
# Create a simple HTM network that will receive the current page as input, pass
# the encoded pag