第一版(不成功):马尔可夫链预测模型的应用——以个人图书借阅为例
读取个人图书借阅数据
##图书类别:A马克思主义、列宁主义、毛泽东思想、邓小平理论;B哲学、宗教;C 社会科学总论;D 政治、法律;E 军事;F 经济;G 文化、科学、教育、体育;
##H 语言、文字;I 文学;J 艺术;K 历史、地理;N 自然科学总论;O 数理科学和化学;P 天文学、地球科学;Q 生物科学;R 医药、卫生;S 农业科学;
##T 工业技术;U 交通运输;V 航空、航天;X 环境科学、安全科学;Z 综合性图书。
import pandas as pd
# 显示Dateframe所有行
pd.set_option('display.max_rows',None)
#显示所有列
pd.set_option('display.max_columns',None)
#图书类别
Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# 读取数据
Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
print(Person_data)
结果:
LOAN_DATE ITEM_CALLNO TIMESTAMP
0 2013/1/1 H 1356969600
1 2013/1/1 H 1356969600
2 2013/1/1 H 1356969600
3 2013/1/1 H 1356969600
4 2013/1/1 H 1356969600
5 2013/2/20 D 1361289600
6 2013/4/19 F 1366300800
7 2013/4/19 F 1366300800
8 2013/4/19 F 1366300800
9 2013/4/24 D 1366732800
10 2013/4/24 E 1366732800
11 2013/4/24 E 1366732800
12 2013/4/24 D 1366732800
13 2013/4/24 D 1366732800
14 2013/4/24 D 1366732800
15 2013/4/24 F 1366732800
16 2013/4/27 F 1366992000
17 2013/5/8 H 1367942400
18 2013/5/15 B 1368547200
19 2013/5/15 B 1368547200
20 2013/5/17 B 1368720000
21 2013/6/6 C 1370448000
22 2013/6/8 D 1370620800
23 2013/6/8 D 1370620800
24 2013/10/9 I 1381248000
25 2013/10/29 F 1382976000
26 2013/10/29 I 1382976000
27 2013/11/7 F 1383753600
28 2013/11/7 D 1383753600
29 2013/11/14 I 1384358400
30 2013/11/14 F 1384358400
31 2013/11/14 F 1384358400
32 2013/11/25 D 1385308800
33 2013/12/2 F 1385913600
34 2013/12/2 F 1385913600
35 2013/12/4 H 1386086400
36 2013/12/6 F 1386259200
37 2013/12/11 F 1386691200
38 2013/12/11 F 1386691200
39 2013/12/18 F 1387296000
40 2013/12/18 D 1387296000
41 2014/1/1 F 1388505600
42 2014/1/2 D 1388592000
43 2014/2/17 D 1392566400
44 2014/2/17 D 1392566400
45 2014/3/13 K 1394640000
46 2014/3/13 K 1394640000
47 2014/4/15 D 1397491200
48 2014/4/29 I 1398700800
49 2014/4/29 I 1398700800
50 2014/4/29 I 1398700800
51 2014/5/5 D 1399219200
52 2014/9/23 H 1411401600
53 2014/10/10 D 1412870400
54 2014/10/10 D 1412870400
55 2014/10/10 D 1412870400
56 2014/10/10 D 1412870400
57 2014/10/10 D 1412870400
58 2014/10/10 D 1412870400
59 2014/10/10 D 1412870400
60 2014/10/10 D 1412870400
61 2014/10/10 D 1412870400
62 2014/10/10 D 1412870400
63 2014/12/2 F 1417449600
64 2014/12/2 K 1417449600
65 2014/12/2 D 1417449600
66 2014/12/4 D 1417622400
67 2014/12/4 D 1417622400
68 2015/1/14 D 1421164800
69 2015/2/28 D 1425052800
70 2015/2/28 D 1425052800
71 2015/3/5 D 1425484800
72 2015/3/5 D 1425484800
73 2015/3/5 D 1425484800
74 2015/3/5 D 1425484800
75 2015/3/11 D 1426003200
76 2015/3/11 D 1426003200
77 2015/3/11 D 1426003200
78 2015/3/12 D 1426089600
79 2015/3/12 D 1426089600
80 2015/3/12 D 1426089600
81 2015/3/12 D 1426089600
82 2015/3/12 D 1426089600
83 2015/3/19 J 1426694400
84 2015/3/19 D 1426694400
将两列数据转化为字典
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
t_n = len(time_number)
N = len(ITEM_CALLNO) #样本数量
Book_data_dic = {}
i ,j = 0, 0
list = []
for I in range(t_n+N-1):
if LOAN_DATE[i] == time_number[j]:
list.append(ITEM_CALLNO[i])
i += 1
else:
j += 1
list = []
Book_data_dic[j + 1] = list
print(Book_data_dic)
结果:
{1: ['H', 'H', 'H', 'H', 'H'], 2: ['D'], 3: ['F', 'F', 'F'], 4: ['D', 'E', 'E', 'D', 'D', 'D', 'F'], 5: ['F'], 6: ['H'], 7: ['B', 'B'], 8: ['B'], 9: ['C'], 10: ['D', 'D'], 11: ['I'], 12: ['F', 'I'], 13: ['F', 'D'], 14: ['I', 'F', 'F'], 15: ['D'], 16: ['F', 'F'], 17: ['H'], 18: ['F'], 19: ['F', 'F'], 20: ['F', 'D'], 21: ['F'], 22: ['D'], 23: ['D', 'D'], 24: ['K', 'K'], 25: ['D'], 26: ['I', 'I', 'I'], 27: ['D'], 28: ['H'], 29: ['D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D'], 30: ['F', 'K', 'D'], 31: ['D', 'D'], 32: ['D'], 33: ['D', 'D'], 34: ['D', 'D', 'D', 'D'], 35: ['D', 'D', 'D'], 36: ['D', 'D', 'D', 'D', 'D'], 37: ['J', 'D']}
步长为1的频率矩阵与转移概率矩阵
import pandas as pd
import numpy as np
from sympy import *
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)
#图书类别
Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
N = len(ITEM_CALLNO) #样本数量
t_n = len(time_number) #次数
B_N = len(Book_category)
#列表转化为字典
def Book_list_to_dic():
Book_data_dic = {}
i, j = 0, 0
list = []
for I in range(t_n + N - 1):
if LOAN_DATE[i] == time_number[j]:
list.append(ITEM_CALLNO[i])
i += 1
else:
j += 1
list = []
Book_data_dic[j + 1] = list
return Book_data_dic
Book_data_dic = Book_list_to_dic()
n_list = [1] #步长为n,n_list为步长列表[1,2,3,4,5]
for n in n_list:
print(f'步长为{n}:')
f_array = np.array(np.zeros((B_N,B_N))) #频数矩阵
for I in range(t_n-n):
for a in Book_data_dic[I+1]:
i = 0
for a1 in Book_category:
if a1 == a:
for b in Book_data_dic[I+2]:
j = 0
for b1 in Book_category:
if b1 == b:
f_array[i][j] += 1
else:
j += 1
else:
i += 1
print(f'步长为{n}的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum() #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = pd.DataFrame(f_array)
print(df_f)
P = np.array(np.zeros((B_N,B_N))) # 转移概率矩阵
for i in range(B_N):
f_sum_i = sum(f_array[i, :]) # 频数矩阵一行的和
if f_sum_i == 0:
P[i][i] = 1 #整个行向量为零的,矩阵i行i列修正为1
else:
for j in range(B_N):
P[i][j] = f_array[i][j]/f_sum_i
print(f'步长为{n}的转移概率矩阵P_{n}:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = pd.DataFrame(P)
print(df_P)
结果:
步长为1:
步长为1的频数矩阵f:
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 2. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 58. 0. 22. 0. 1. 6. 5. 14. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 20. 6. 12. 0. 3. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 2. 0. 15. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 5. 0. 2. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0.]]
总频数为: 185.0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 2.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 58.0 0.0 22.0 0.0 1.0 6.0 5.0 14.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 20.0 6.0 12.0 0.0 3.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7 0.0 2.0 0.0 15.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 5.0 0.0 2.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
10 0.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
14 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
17 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
18 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
19 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
20 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
步长为1的转移概率矩阵P_1:
[[1. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0.66666667 0.33333333 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 1. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0.54716981 0. 0.20754717
0. 0.00943396 0.05660377 0.04716981 0.13207547 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 1.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0.47619048 0.14285714 0.28571429
0. 0.07142857 0.02380952 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
1. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0.11111111 0. 0.83333333 0. 0.05555556
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0.625 0. 0.25
0. 0. 0.125 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 1. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 1. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 1.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
1. 0. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 1. 0. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 1. 0. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 1. 0. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 1. 0.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 1.
0. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
1. 0. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 1. 0. ]
[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 1. ]]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 1.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.666667 0.333333 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.000000 0.000000 1.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.000000 0.000000 0.547170 0.000000 0.207547 0.0 0.009434 0.056604 0.04717 0.132075 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.000000 0.000000 0.000000 0.000000 1.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.000000 0.000000 0.476190 0.142857 0.285714 0.0 0.071429 0.023810 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
6 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 1.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7 0.0 0.111111 0.000000 0.833333 0.000000 0.055556 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
8 0.0 0.000000 0.000000 0.625000 0.000000 0.250000 0.0 0.000000 0.125000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
9 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 1.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
10 0.0 0.000000 0.000000 1.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
11 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
12 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
13 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
14 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
15 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
16 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
17 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0
18 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
19 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0
20 0.0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
由人工统计的频率矩阵结果如下:
由此可以看出上面写的程序是对的。
解题步骤
以ID号8748847336为例进行计算。数据库中这个人借书37次,以前34次数据为训练集,后3次数据为测试集。
一、对历史数据进行分组
按照图书分类约定俗成的分法:
图书类别:
A马克思主义、列宁主义、毛泽东思想、邓小平理论;
B哲学、宗教;
C 社会科学总论;
D 政治、法律;
E 军事;
F 经济;
G 文化、科学、教育、体育;
H 语言、文字;
I 文学;
J 艺术;
K 历史、地理;
N 自然科学总论;
O 数理科学和化学;
P 天文学、地球科学;
Q 生物科学;
R 医药、卫生;
S 农业科学;
T 工业技术;
U 交通运输;
V 航空、航天;
X 环境科学、安全科学;
Z 综合性图书。
一、对历史数据进行分组——2.0版的改进之处
由于图书类别涉及到的专业非常广,而某个专业的学生几乎很少借阅相差较大专业类别的书,比如文科生很少会借阅工科生的专业书籍,因此状态分类不应该使用整个图书类别,而是使用他曾经借阅过的书籍类别作为状态类别。
比如:
#借阅书籍类别记录
ITEM_CALLNO = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
#状态去重
ITEM_CALLNO.sort() # 重新排序
news_items = []
for item in ITEM_CALLNO:
if item not in news_items:
news_items.append(item)
print('借阅状态分类:',news_items)
return news_items
Book_Category(ITEM_CALLNO)
结果:
借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'J', 'K']
二、“马氏性”检验
以前34次数据为训练集,后3次数据为测试集。
import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)
# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
Book_data_dic = {}
i, j = 0, 0
list = []
for I in range(t_n + N - 1):
if LOAN_DATE[i] == time_number[j]:
list.append(ITEM_CALLNO_data[i])
i += 1
else:
j += 1
list = []
Book_data_dic[j + 1] = list
for i in range(test_n):
Book_data_dic.pop(t_n-i)
return Book_data_dic
#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
ITEM_CALLNO = []
n = t_n-test_n
for i in range(n):
for item in Book_data_dic[i+1]:
ITEM_CALLNO.append(item)
return ITEM_CALLNO
#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
#状态去重
ITEM_CALLNO.sort() # 重新排序
news_items = []
for item in ITEM_CALLNO:
if item not in news_items:
news_items.append(item)
print('借阅状态分类:',news_items)
return news_items
N = len(ITEM_CALLNO_data) #样本数量
t_n = len(time_number) #次数
test_n = 3 #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)
f_array = np.array(np.zeros((B_N,B_N))) #频数矩阵
for I in range(t_n-test_n-1):
for a in Book_data_dic[I+1]:
i = 0
for a1 in Book_category:
if a1 == a:
for b in Book_data_dic[I+2]:
j = 0
for b1 in Book_category:
if b1 == b:
f_array[i][j] += 1
else:
j += 1
else:
i += 1
print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum() #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = pd.DataFrame(f_array)
print(df_f)
P = np.array(np.zeros((B_N,B_N))) # 转移概率矩阵
P_j_list = [] #边际概率列表
for i in range(B_N):
f_sum_i = sum(f_array[i, :]) # 频数矩阵一行的和
P_j_list.append(f_sum_i/n_sum_f) #求边际概率矩阵
if f_sum_i == 0:
P[i][i] = 1 #整个行向量为零的,矩阵i行i列修正为1
else:
for j in range(B_N):
P[i][j] = f_array[i][j]/f_sum_i
print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = pd.DataFrame(P)
print(df_P)
P_j = np.array(P_j_list) #边际概率矩阵
print(P_j)
# ”马氏“检验
x = 0
for i in range(B_N):
for j in range(B_N):
if f_array[i][j] == 0:
x += 0
elif P_j[j] == 0:
x += 0
else:
x += f_array[i][j] * abs(np.log(P[i][j] / P_j[j]))
X_2 = 2 * x
print('卡方分布X^2为:', X_2)
结果:
借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
步长为1的频数矩阵f:
[[ 2. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 2. 0. 0. 0. 0. 0.]
[ 0. 0. 26. 0. 22. 1. 6. 14.]
[ 0. 0. 0. 0. 2. 0. 0. 0.]
[ 0. 0. 20. 6. 12. 3. 1. 0.]
[ 2. 0. 15. 0. 1. 0. 0. 0.]
[ 0. 0. 5. 0. 2. 0. 1. 0.]
[ 0. 0. 4. 0. 0. 0. 0. 0.]]
总频数为: 148.0
0 1 2 3 4 5 6 7
0 2.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 26.0 0.0 22.0 1.0 6.0 14.0
3 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0
4 0.0 0.0 20.0 6.0 12.0 3.0 1.0 0.0
5 2.0 0.0 15.0 0.0 1.0 0.0 0.0 0.0
6 0.0 0.0 5.0 0.0 2.0 0.0 1.0 0.0
7 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0
步长为1的转移概率矩阵P_1:
[[0.66666667 0.33333333 0. 0. 0. 0.
0. 0. ]
[0. 0. 1. 0. 0. 0.
0. 0. ]
[0. 0. 0.37681159 0. 0.31884058 0.01449275
0.08695652 0.20289855]
[0. 0. 0. 0. 1. 0.
0. 0. ]
[0. 0. 0.47619048 0.14285714 0.28571429 0.07142857
0.02380952 0. ]
[0.11111111 0. 0.83333333 0. 0.05555556 0.
0. 0. ]
[0. 0. 0.625 0. 0.25 0.
0.125 0. ]
[0. 0. 1. 0. 0. 0.
0. 0. ]]
0 1 2 3 4 5 6 7
0 0.666667 0.333333 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
1 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.376812 0.000000 0.318841 0.014493 0.086957 0.202899
3 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000
4 0.000000 0.000000 0.476190 0.142857 0.285714 0.071429 0.023810 0.000000
5 0.111111 0.000000 0.833333 0.000000 0.055556 0.000000 0.000000 0.000000
6 0.000000 0.000000 0.625000 0.000000 0.250000 0.000000 0.125000 0.000000
7 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[0.02027027 0.01351351 0.46621622 0.01351351 0.28378378 0.12162162
0.05405405 0.02702703]
卡方分布X^2为: 183.92468307288394
(马尔可夫性的统计检验(马氏性检验))
给定显著性水平α=0.05,自由度(m-1)2=(9-1)2=64。查卡方分布的表格可知分位点X0.052=83.675。
由于本题计算的X2 > X0.052(64),故图书借阅所对应的的状态序列满足“马氏性”,所以这个序列能当作马尔可夫链来对待。
改进上面程序的数据框输出
import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)
# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
Book_data_dic = {}
i, j = 0, 0
list = []
for I in range(t_n + N - 1):
if LOAN_DATE[i] == time_number[j]:
list.append(ITEM_CALLNO_data[i])
i += 1
else:
j += 1
list = []
Book_data_dic[j + 1] = list
for i in range(test_n):
Book_data_dic.pop(t_n-i)
return Book_data_dic
#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
ITEM_CALLNO = []
n = t_n-test_n
for i in range(n):
for item in Book_data_dic[i+1]:
ITEM_CALLNO.append(item)
return ITEM_CALLNO
#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
#状态去重
ITEM_CALLNO.sort() # 重新排序
news_items = []
for item in ITEM_CALLNO:
if item not in news_items:
news_items.append(item)
print('借阅状态分类:',news_items)
return news_items
def Mat_To_Frame(M, Book_category, B_N):
M = M.tolist() # 矩阵转换为列表
for i in range(B_N):
M[i].insert(0, Book_category[i])
Book_category.insert(0, ' ')
M.insert(0, Book_category)
df = pd.DataFrame(M)
if Book_category[0] == ' ':
Book_category.remove(' ')
return df
N = len(ITEM_CALLNO_data) #样本数量
t_n = len(time_number) #次数
test_n = 3 #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)
f_array = np.array(np.zeros((B_N,B_N))) #频数矩阵
for I in range(t_n-test_n-1):
for a in Book_data_dic[I+1]:
i = 0
for a1 in Book_category:
if a1 == a:
for b in Book_data_dic[I+2]:
j = 0
for b1 in Book_category:
if b1 == b:
f_array[i][j] += 1
else:
j += 1
else:
i += 1
print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum() #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = Mat_To_Frame(f_array, Book_category, B_N)
print(df_f)
P = np.array(np.zeros((B_N,B_N))) # 转移概率矩阵
P_j_list = [] #边际概率列表
for i in range(B_N):
f_sum_i = sum(f_array[i, :]) # 频数矩阵一行的和
P_j_list.append(f_sum_i/n_sum_f) #求边际概率矩阵
if f_sum_i == 0:
P[i][i] = 1 #整个行向量为零的,矩阵i行i列修正为1
else:
for j in range(B_N):
P[i][j] = f_array[i][j]/f_sum_i
print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = Mat_To_Frame(P, Book_category, B_N)
print(df_P)
P_j = np.array(P_j_list) #边际概率矩阵
print(P_j)
# ”马氏“检验
x = 0
for i in range(B_N):
for j in range(B_N):
if f_array[i][j] == 0:
x += 0
elif P_j[j] == 0:
x += 0
else:
x += f_array[i][j] * abs(np.log(P[i][j] / P_j[j]))
X_2 = 2 * x
print('卡方分布X^2为:', X_2)
结果:
借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
步长为1的频数矩阵f:
[[ 2. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 2. 0. 0. 0. 0. 0.]
[ 0. 0. 26. 0. 22. 1. 6. 14.]
[ 0. 0. 0. 0. 2. 0. 0. 0.]
[ 0. 0. 20. 6. 12. 3. 1. 0.]
[ 2. 0. 15. 0. 1. 0. 0. 0.]
[ 0. 0. 5. 0. 2. 0. 1. 0.]
[ 0. 0. 4. 0. 0. 0. 0. 0.]]
总频数为: 148.0
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 2 1 0 0 0 0 0 0
2 C 0 0 2 0 0 0 0 0
3 D 0 0 26 0 22 1 6 14
4 E 0 0 0 0 2 0 0 0
5 F 0 0 20 6 12 3 1 0
6 H 2 0 15 0 1 0 0 0
7 I 0 0 5 0 2 0 1 0
8 K 0 0 4 0 0 0 0 0
步长为1的转移概率矩阵P_1:
[[0.66666667 0.33333333 0. 0. 0. 0.
0. 0. ]
[0. 0. 1. 0. 0. 0.
0. 0. ]
[0. 0. 0.37681159 0. 0.31884058 0.01449275
0.08695652 0.20289855]
[0. 0. 0. 0. 1. 0.
0. 0. ]
[0. 0. 0.47619048 0.14285714 0.28571429 0.07142857
0.02380952 0. ]
[0.11111111 0. 0.83333333 0. 0.05555556 0.
0. 0. ]
[0. 0. 0.625 0. 0.25 0.
0.125 0. ]
[0. 0. 1. 0. 0. 0.
0. 0. ]]
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0.666667 0.333333 0 0 0 0 0 0
2 C 0 0 1 0 0 0 0 0
3 D 0 0 0.376812 0 0.318841 0.0144928 0.0869565 0.202899
4 E 0 0 0 0 1 0 0 0
5 F 0 0 0.47619 0.142857 0.285714 0.0714286 0.0238095 0
6 H 0.111111 0 0.833333 0 0.0555556 0 0 0
7 I 0 0 0.625 0 0.25 0 0.125 0
8 K 0 0 1 0 0 0 0 0
[0.02027027 0.01351351 0.46621622 0.01351351 0.28378378 0.12162162
0.05405405 0.02702703]
卡方分布X^2为: 183.92468307288394
基于绝对分布的马尔可夫链预测
预测第35次
import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)
# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
Book_data_dic = {}
i, j = 0, 0
list = []
for I in range(t_n + N - 1):
if LOAN_DATE[i] == time_number[j]:
list.append(ITEM_CALLNO_data[i])
i += 1
else:
j += 1
list = []
Book_data_dic[j + 1] = list
for i in range(test_n):
Book_data_dic.pop(t_n-i)
return Book_data_dic
#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
ITEM_CALLNO = []
n = t_n-test_n
for i in range(n):
for item in Book_data_dic[i+1]:
ITEM_CALLNO.append(item)
return ITEM_CALLNO
#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
#状态去重
ITEM_CALLNO.sort() # 重新排序
news_items = []
for item in ITEM_CALLNO:
if item not in news_items:
news_items.append(item)
print('借阅状态分类:',news_items)
return news_items
def Mat_To_Frame(M, Book_category, B_N):
M = M.tolist() # 矩阵转换为列表
for i in range(B_N):
M[i].insert(0, Book_category[i])
Book_category.insert(0, ' ')
M.insert(0, Book_category)
df = pd.DataFrame(M)
if Book_category[0] == ' ':
Book_category.remove(' ')
return df
N = len(ITEM_CALLNO_data) #样本数量
t_n = len(time_number) #次数
test_n = 3 #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)
f_array = np.array(np.zeros((B_N,B_N))) #频数矩阵
for I in range(t_n-test_n-1):
for a in Book_data_dic[I+1]:
i = 0
for a1 in Book_category:
if a1 == a:
for b in Book_data_dic[I+2]:
j = 0
for b1 in Book_category:
if b1 == b:
f_array[i][j] += 1
else:
j += 1
else:
i += 1
# print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum() #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = Mat_To_Frame(f_array, Book_category, B_N)
print(f'步长为1的频数矩阵f(数据框展示形式):\n',df_f)
P = np.array(np.zeros((B_N,B_N))) # 转移概率矩阵
P_j_list = [] #边际概率列表
for i in range(B_N):
f_sum_i = sum(f_array[i, :]) # 频数矩阵一行的和
P_j_list.append(f_sum_i/n_sum_f) #求边际概率矩阵
if f_sum_i == 0:
P[i][i] = 1 #整个行向量为零的,矩阵i行i列修正为1
else:
for j in range(B_N):
P[i][j] = f_array[i][j]/f_sum_i
# print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = Mat_To_Frame(P, Book_category, B_N)
print(f'步长为1的转移概率矩阵P_1(数据框展示形式):\n', df_P, '\n')
#基于绝对分布的马尔可夫链预测
P_0 = np.mat([[0, 0, 4, 0, 0, 0, 0, 0]])
P_next = P_0 * P
P_next_dic = {}
for i in range(B_N):
key = Book_category[i]
value = P_next.tolist()[0][i]
P_next_dic[key] = value
print(f"第{t_n-test_n+1}的状态:\n",P_next_dic)
结果:
借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
总频数为: 148.0
步长为1的频数矩阵f(数据框展示形式):
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 2 1 0 0 0 0 0 0
2 C 0 0 2 0 0 0 0 0
3 D 0 0 26 0 22 1 6 14
4 E 0 0 0 0 2 0 0 0
5 F 0 0 20 6 12 3 1 0
6 H 2 0 15 0 1 0 0 0
7 I 0 0 5 0 2 0 1 0
8 K 0 0 4 0 0 0 0 0
步长为1的转移概率矩阵P_1(数据框展示形式):
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0.666667 0.333333 0 0 0 0 0 0
2 C 0 0 1 0 0 0 0 0
3 D 0 0 0.376812 0 0.318841 0.0144928 0.0869565 0.202899
4 E 0 0 0 0 1 0 0 0
5 F 0 0 0.47619 0.142857 0.285714 0.0714286 0.0238095 0
6 H 0.111111 0 0.833333 0 0.0555556 0 0 0
7 I 0 0 0.625 0 0.25 0 0.125 0
8 K 0 0 1 0 0 0 0 0
第35的状态:
{'B': 0.0, 'C': 0.0, 'D': 1.5072463768115942, 'E': 0.0, 'F': 1.2753623188405796, 'H': 0.057971014492753624, 'I': 0.34782608695652173, 'K': 0.8115942028985508}
用前34次数据预测第35次借阅可能,得到借阅“D”类书的可能性最大。
实际上第35次借阅的是“D”类书,3本
预测第36次
将test_n由3改成2即可
import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)
# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
Book_data_dic = {}
i, j = 0, 0
list = []
for I in range(t_n + N - 1):
if LOAN_DATE[i] == time_number[j]:
list.append(ITEM_CALLNO_data[i])
i += 1
else:
j += 1
list = []
Book_data_dic[j + 1] = list
for i in range(test_n):
Book_data_dic.pop(t_n-i)
return Book_data_dic
#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
ITEM_CALLNO = []
n = t_n-test_n
for i in range(n):
for item in Book_data_dic[i+1]:
ITEM_CALLNO.append(item)
return ITEM_CALLNO
#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
#状态去重
ITEM_CALLNO.sort() # 重新排序
news_items = []
for item in ITEM_CALLNO:
if item not in news_items:
news_items.append(item)
print('借阅状态分类:',news_items)
return news_items
def Mat_To_Frame(M, Book_category, B_N):
M = M.tolist() # 矩阵转换为列表
for i in range(B_N):
M[i].insert(0, Book_category[i])
Book_category.insert(0, ' ')
M.insert(0, Book_category)
df = pd.DataFrame(M)
if Book_category[0] == ' ':
Book_category.remove(' ')
return df
N = len(ITEM_CALLNO_data) #样本数量
t_n = len(time_number) #次数
test_n = 2 #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)
f_array = np.array(np.zeros((B_N,B_N))) #频数矩阵
for I in range(t_n-test_n-1):
for a in Book_data_dic[I+1]:
i = 0
for a1 in Book_category:
if a1 == a:
for b in Book_data_dic[I+2]:
j = 0
for b1 in Book_category:
if b1 == b:
f_array[i][j] += 1
else:
j += 1
else:
i += 1
# print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum() #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = Mat_To_Frame(f_array, Book_category, B_N)
print(f'步长为1的频数矩阵f(数据框展示形式):\n',df_f)
P = np.array(np.zeros((B_N,B_N))) # 转移概率矩阵
P_j_list = [] #边际概率列表
for i in range(B_N):
f_sum_i = sum(f_array[i, :]) # 频数矩阵一行的和
P_j_list.append(f_sum_i/n_sum_f) #求边际概率矩阵
if f_sum_i == 0:
P[i][i] = 1 #整个行向量为零的,矩阵i行i列修正为1
else:
for j in range(B_N):
P[i][j] = f_array[i][j]/f_sum_i
# print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = Mat_To_Frame(P, Book_category, B_N)
print(f'步长为1的转移概率矩阵P_1(数据框展示形式):\n', df_P, '\n')
#基于绝对分布的马尔可夫链预测
P_0 = np.mat([[0, 0, 3, 0, 0, 0, 0, 0]])
P_next = P_0 * P
P_next_dic = {}
for i in range(B_N):
key = Book_category[i]
value = P_next.tolist()[0][i]
P_next_dic[key] = value
print(f"第{t_n-test_n+1}的状态:\n",P_next_dic)
结果:
借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
总频数为: 160.0
步长为1的频数矩阵f(数据框展示形式):
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 2 1 0 0 0 0 0 0
2 C 0 0 2 0 0 0 0 0
3 D 0 0 38 0 22 1 6 14
4 E 0 0 0 0 2 0 0 0
5 F 0 0 20 6 12 3 1 0
6 H 2 0 15 0 1 0 0 0
7 I 0 0 5 0 2 0 1 0
8 K 0 0 4 0 0 0 0 0
步长为1的转移概率矩阵P_1(数据框展示形式):
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0.666667 0.333333 0 0 0 0 0 0
2 C 0 0 1 0 0 0 0 0
3 D 0 0 0.469136 0 0.271605 0.0123457 0.0740741 0.17284
4 E 0 0 0 0 1 0 0 0
5 F 0 0 0.47619 0.142857 0.285714 0.0714286 0.0238095 0
6 H 0.111111 0 0.833333 0 0.0555556 0 0 0
7 I 0 0 0.625 0 0.25 0 0.125 0
8 K 0 0 1 0 0 0 0 0
第36的状态:
{'B': 0.0, 'C': 0.0, 'D': 1.4074074074074074, 'E': 0.0, 'F': 0.8148148148148148, 'H': 0.037037037037037035, 'I': 0.2222222222222222, 'K': 0.5185185185185185}
用前35次数据预测第36次借阅可能,得到借阅“D”类书的可能性最大。
实际上第36次借阅的是“D”类书,5本
预测第37次
将test_n改为1即可
import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)
# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
Book_data_dic = {}
i, j = 0, 0
list = []
for I in range(t_n + N - 1):
if LOAN_DATE[i] == time_number[j]:
list.append(ITEM_CALLNO_data[i])
i += 1
else:
j += 1
list = []
Book_data_dic[j + 1] = list
for i in range(test_n):
Book_data_dic.pop(t_n-i)
return Book_data_dic
#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
ITEM_CALLNO = []
n = t_n-test_n
for i in range(n):
for item in Book_data_dic[i+1]:
ITEM_CALLNO.append(item)
return ITEM_CALLNO
#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
#状态去重
ITEM_CALLNO.sort() # 重新排序
news_items = []
for item in ITEM_CALLNO:
if item not in news_items:
news_items.append(item)
print('借阅状态分类:',news_items)
return news_items
def Mat_To_Frame(M, Book_category, B_N):
M = M.tolist() # 矩阵转换为列表
for i in range(B_N):
M[i].insert(0, Book_category[i])
Book_category.insert(0, ' ')
M.insert(0, Book_category)
df = pd.DataFrame(M)
if Book_category[0] == ' ':
Book_category.remove(' ')
return df
N = len(ITEM_CALLNO_data) #样本数量
t_n = len(time_number) #次数
test_n = 1 #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)
f_array = np.array(np.zeros((B_N,B_N))) #频数矩阵
for I in range(t_n-test_n-1):
for a in Book_data_dic[I+1]:
i = 0
for a1 in Book_category:
if a1 == a:
for b in Book_data_dic[I+2]:
j = 0
for b1 in Book_category:
if b1 == b:
f_array[i][j] += 1
else:
j += 1
else:
i += 1
# print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum() #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = Mat_To_Frame(f_array, Book_category, B_N)
print(f'步长为1的频数矩阵f(数据框展示形式):\n',df_f)
P = np.array(np.zeros((B_N,B_N))) # 转移概率矩阵
P_j_list = [] #边际概率列表
for i in range(B_N):
f_sum_i = sum(f_array[i, :]) # 频数矩阵一行的和
P_j_list.append(f_sum_i/n_sum_f) #求边际概率矩阵
if f_sum_i == 0:
P[i][i] = 1 #整个行向量为零的,矩阵i行i列修正为1
else:
for j in range(B_N):
P[i][j] = f_array[i][j]/f_sum_i
# print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = Mat_To_Frame(P, Book_category, B_N)
print(f'步长为1的转移概率矩阵P_1(数据框展示形式):\n', df_P, '\n')
#基于绝对分布的马尔可夫链预测
P_0 = np.mat([[0, 0, 5, 0, 0, 0, 0, 0]])
P_next = P_0 * P
P_next_dic = {}
for i in range(B_N):
key = Book_category[i]
value = P_next.tolist()[0][i]
P_next_dic[key] = value
print(f"第{t_n-test_n+1}的状态:\n",P_next_dic)
结果:
借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
总频数为: 175.0
步长为1的频数矩阵f(数据框展示形式):
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 2 1 0 0 0 0 0 0
2 C 0 0 2 0 0 0 0 0
3 D 0 0 53 0 22 1 6 14
4 E 0 0 0 0 2 0 0 0
5 F 0 0 20 6 12 3 1 0
6 H 2 0 15 0 1 0 0 0
7 I 0 0 5 0 2 0 1 0
8 K 0 0 4 0 0 0 0 0
步长为1的转移概率矩阵P_1(数据框展示形式):
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0.666667 0.333333 0 0 0 0 0 0
2 C 0 0 1 0 0 0 0 0
3 D 0 0 0.552083 0 0.229167 0.0104167 0.0625 0.145833
4 E 0 0 0 0 1 0 0 0
5 F 0 0 0.47619 0.142857 0.285714 0.0714286 0.0238095 0
6 H 0.111111 0 0.833333 0 0.0555556 0 0 0
7 I 0 0 0.625 0 0.25 0 0.125 0
8 K 0 0 1 0 0 0 0 0
第37的状态:
{'B': 0.0, 'C': 0.0, 'D': 2.760416666666667, 'E': 0.0, 'F': 1.1458333333333333, 'H': 0.05208333333333333, 'I': 0.3125, 'K': 0.7291666666666667}
用前36次数据预测第37次借阅可能,得到借阅“D”类书的可能性最大。
实际上第37次借阅的是“D”类书,1本和“J”类书1本。
预测出现偏差,主要原因是因为前面35次数据记录,并没有“J”类书的借阅记录,第37次借阅记录,“J”类书是第一次出现,而之前的状态不包含未出现的“J”类,所以这种“突发性”影响了预测的准确性。
三、计算各阶的一步转移概率矩阵
import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)
# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
Book_data_dic = {}
i, j = 0, 0
list = []
for I in range(t_n + N - 1):
if LOAN_DATE[i] == time_number[j]:
list.append(ITEM_CALLNO_data[i])
i += 1
else:
j += 1
list = []
Book_data_dic[j + 1] = list
for i in range(test_n):
Book_data_dic.pop(t_n-i)
return Book_data_dic
#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
ITEM_CALLNO = []
n = t_n-test_n
for i in range(n):
for item in Book_data_dic[i+1]:
ITEM_CALLNO.append(item)
return ITEM_CALLNO
#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
#状态去重
ITEM_CALLNO.sort() # 重新排序
news_items = []
for item in ITEM_CALLNO:
if item not in news_items:
news_items.append(item)
print('借阅状态分类:',news_items)
return news_items
def Mat_To_Frame(M, Book_category, B_N):
M = M.tolist() # 矩阵转换为列表
for i in range(B_N):
M[i].insert(0, Book_category[i])
Book_category.insert(0, ' ')
M.insert(0, Book_category)
df = pd.DataFrame(M)
if Book_category[0] == ' ':
Book_category.remove(' ')
return df
N = len(ITEM_CALLNO_data) #样本数量
t_n = len(time_number) #次数
test_n = 3 #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)
n_list = [1,2,3,4,5] #步长为n,n_list为步长列表[1,2,3,4,5]
for n in n_list:
print(f'步长为{n}:')
f_array = np.array(np.zeros((B_N,B_N))) #频数矩阵
for I in range(t_n-test_n-n):
for a in Book_data_dic[I+1]:
i = 0
for a1 in Book_category:
if a1 == a:
for b in Book_data_dic[I+1+n]:
j = 0
for b1 in Book_category:
if b1 == b:
f_array[i][j] += 1
else:
j += 1
else:
i += 1
# print(f'步长为{n}的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum() #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = Mat_To_Frame(f_array, Book_category, B_N)
print(f'步长为{n}的频数矩阵f的列表形式:\n',df_f)
P = np.array(np.zeros((B_N,B_N))) # 转移概率矩阵
P_j_list = [] #边际概率列表
for i in range(B_N):
f_sum_i = sum(f_array[i, :]) # 频数矩阵一行的和
P_j_list.append(f_sum_i/n_sum_f) #求边际概率矩阵
if f_sum_i == 0:
P[i][i] = 1 #整个行向量为零的,矩阵i行i列修正为1
else:
for j in range(B_N):
P[i][j] = f_array[i][j]/f_sum_i
# print(f'步长为{n}的转移概率矩阵P_{n}:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = Mat_To_Frame(P, Book_category, B_N)
print(f'步长为{n}的转移概率矩阵P_{n}的列表形式:\n', df_P, '\n')
结果:
借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
步长为1:
总频数为: 148.0
步长为1的频数矩阵f的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 2 1 0 0 0 0 0 0
2 C 0 0 2 0 0 0 0 0
3 D 0 0 26 0 22 1 6 14
4 E 0 0 0 0 2 0 0 0
5 F 0 0 20 6 12 3 1 0
6 H 2 0 15 0 1 0 0 0
7 I 0 0 5 0 2 0 1 0
8 K 0 0 4 0 0 0 0 0
步长为1的转移概率矩阵P_1的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0.666667 0.333333 0 0 0 0 0 0
2 C 0 0 1 0 0 0 0 0
3 D 0 0 0.376812 0 0.318841 0.0144928 0.0869565 0.202899
4 E 0 0 0 0 1 0 0 0
5 F 0 0 0.47619 0.142857 0.285714 0.0714286 0.0238095 0
6 H 0.111111 0 0.833333 0 0.0555556 0 0 0
7 I 0 0 0.625 0 0.25 0 0.125 0
8 K 0 0 1 0 0 0 0 0
步长为2:
总频数为: 131.0
步长为2的频数矩阵f的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0 2 2 0 0 0 0 0
2 C 0 0 0 0 0 0 1 0
3 D 0 0 48 2 3 5 2 2
4 E 0 0 0 0 0 2 0 0
5 F 2 0 6 0 14 1 1 0
6 H 1 0 1 0 18 0 0 1
7 I 0 0 1 0 5 3 1 0
8 K 0 0 1 0 0 0 6 0
步长为2的转移概率矩阵P_2的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0 0.5 0.5 0 0 0 0 0
2 C 0 0 0 0 0 0 1 0
3 D 0 0 0.774194 0.0322581 0.0483871 0.0806452 0.0322581 0.0322581
4 E 0 0 0 0 0 1 0 0
5 F 0.0833333 0 0.25 0 0.583333 0.0416667 0.0416667 0
6 H 0.047619 0 0.047619 0 0.857143 0 0 0.047619
7 I 0 0 0.1 0 0.5 0.3 0.1 0
8 K 0 0 0.142857 0 0 0 0.857143 0
步长为3:
总频数为: 163.0
步长为3的频数矩阵f的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0 0 4 0 0 0 1 0
2 C 0 0 0 0 1 0 1 0
3 D 8 0 26 0 7 1 6 1
4 E 4 0 0 0 0 0 0 0
5 F 3 0 7 0 7 5 0 2
6 H 0 1 23 10 6 0 0 0
7 I 0 0 31 0 2 1 1 0
8 K 0 0 4 0 0 0 0 0
步长为3的转移概率矩阵P_3的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0 0 0.8 0 0 0 0.2 0
2 C 0 0 0 0 0.5 0 0.5 0
3 D 0.163265 0 0.530612 0 0.142857 0.0204082 0.122449 0.0204082
4 E 1 0 0 0 0 0 0 0
5 F 0.125 0 0.291667 0 0.291667 0.208333 0 0.0833333
6 H 0 0.025 0.575 0.25 0.15 0 0 0
7 I 0 0 0.885714 0 0.0571429 0.0285714 0.0285714 0
8 K 0 0 1 0 0 0 0 0
步长为4:
总频数为: 122.0
步长为4的频数矩阵f的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0 0 0 0 1 0 3 0
2 C 0 0 1 0 1 0 0 0
3 D 4 0 38 0 6 2 5 2
4 E 2 0 0 0 0 0 0 0
5 F 7 1 12 0 6 1 0 2
6 H 0 0 3 0 6 0 0 0
7 I 0 0 4 0 6 0 0 3
8 K 0 0 4 0 0 2 0 0
步长为4的转移概率矩阵P_4的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0 0 0 0 0.25 0 0.75 0
2 C 0 0 0.5 0 0.5 0 0 0
3 D 0.0701754 0 0.666667 0 0.105263 0.0350877 0.0877193 0.0350877
4 E 1 0 0 0 0 0 0 0
5 F 0.241379 0.0344828 0.413793 0 0.206897 0.0344828 0 0.0689655
6 H 0 0 0.333333 0 0.666667 0 0 0
7 I 0 0 0.307692 0 0.461538 0 0 0.230769
8 K 0 0 0.666667 0 0 0.333333 0 0
步长为5:
总频数为: 134.0
步长为5的频数矩阵f的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0 0 1 0 3 0 2 0
2 C 0 0 0 0 2 0 1 0
3 D 2 4 47 0 3 2 0 1
4 E 0 2 0 0 0 0 0 0
5 F 3 1 5 0 7 1 3 4
6 H 0 0 3 0 0 5 1 0
7 I 0 0 6 0 4 1 0 0
8 K 0 0 20 0 0 0 0 0
步长为5的转移概率矩阵P_5的列表形式:
0 1 2 3 4 5 6 7 8
0 B C D E F H I K
1 B 0 0 0.166667 0 0.5 0 0.333333 0
2 C 0 0 0 0 0.666667 0 0.333333 0
3 D 0.0338983 0.0677966 0.79661 0 0.0508475 0.0338983 0 0.0169492
4 E 0 1 0 0 0 0 0 0
5 F 0.125 0.0416667 0.208333 0 0.291667 0.0416667 0.125 0.166667
6 H 0 0 0.333333 0 0 0.555556 0.111111 0
7 I 0 0 0.545455 0 0.363636 0.0909091 0 0
8 K 0 0 1 0 0 0 0 0
四、计算各阶相关系数(权重)
五、计算平稳分布
以34次数据集为例进行计算,所以测试集数量test_n为3
import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)
# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']
#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
Book_data_dic = {}
i, j = 0, 0
list = []
for I in range(t_n + N - 1):
if LOAN_DATE[i] == time_number[j]:
list.append(ITEM_CALLNO_data[i])
i += 1
else:
j += 1
list = []
Book_data_dic[j + 1] = list
for i in range(test_n):
Book_data_dic.pop(t_n-i)
return Book_data_dic
#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
ITEM_CALLNO = []
n = t_n-test_n
for i in range(n):
for item in Book_data_dic[i+1]:
ITEM_CALLNO.append(item)
return ITEM_CALLNO
#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
#状态去重
ITEM_CALLNO.sort() # 重新排序
news_items = []
for item in ITEM_CALLNO:
if item not in news_items:
news_items.append(item)
print('借阅状态分类:',news_items)
return news_items
N = len(ITEM_CALLNO_data) #样本数量
t_n = len(time_number) #次数
test_n = 3 #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)
f_array = np.array(np.zeros((B_N,B_N))) #频数矩阵
for I in range(t_n-test_n-1):
for a in Book_data_dic[I+1]:
i = 0
for a1 in Book_category:
if a1 == a:
for b in Book_data_dic[I+2]:
j = 0
for b1 in Book_category:
if b1 == b:
f_array[i][j] += 1
else:
j += 1
else:
i += 1
print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum() #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = pd.DataFrame(f_array)
print(df_f)
P = np.array(np.zeros((B_N,B_N))) # 转移概率矩阵
P_j_list = [] #边际概率列表
for i in range(B_N):
f_sum_i = sum(f_array[i, :]) # 频数矩阵一行的和
P_j_list.append(f_sum_i/n_sum_f) #求边际概率矩阵
if f_sum_i == 0:
P[i][i] = 1 #整个行向量为零的,矩阵i行i列修正为1
else:
for j in range(B_N):
P[i][j] = f_array[i][j]/f_sum_i
print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = pd.DataFrame(P)
print(df_P)
#求平稳分布
from sympy import *
xB, xC, xD, xE, xF, xH, xI, xK= symbols('xB, xC, xD, xE, xF, xH, xI, xK=')
I = xB+xC+xD+xE+xF+xH+xI+xK -1
S = Matrix([[xB, xC, xD, xE, xF, xH, xI, xK]])
X = solve([S*P-S, I],[xB, xC, xD, xE, xF, xH, xI, xK])
print('平稳分布π:\n',X)
结果:
借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
步长为1的频数矩阵f:
[[ 2. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 2. 0. 0. 0. 0. 0.]
[ 0. 0. 26. 0. 22. 1. 6. 14.]
[ 0. 0. 0. 0. 2. 0. 0. 0.]
[ 0. 0. 20. 6. 12. 3. 1. 0.]
[ 2. 0. 15. 0. 1. 0. 0. 0.]
[ 0. 0. 5. 0. 2. 0. 1. 0.]
[ 0. 0. 4. 0. 0. 0. 0. 0.]]
总频数为: 148.0
0 1 2 3 4 5 6 7
0 2.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 26.0 0.0 22.0 1.0 6.0 14.0
3 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0
4 0.0 0.0 20.0 6.0 12.0 3.0 1.0 0.0
5 2.0 0.0 15.0 0.0 1.0 0.0 0.0 0.0
6 0.0 0.0 5.0 0.0 2.0 0.0 1.0 0.0
7 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0
步长为1的转移概率矩阵P_1:
[[0.66666667 0.33333333 0. 0. 0. 0.
0. 0. ]
[0. 0. 1. 0. 0. 0.
0. 0. ]
[0. 0. 0.37681159 0. 0.31884058 0.01449275
0.08695652 0.20289855]
[0. 0. 0. 0. 1. 0.
0. 0. ]
[0. 0. 0.47619048 0.14285714 0.28571429 0.07142857
0.02380952 0. ]
[0.11111111 0. 0.83333333 0. 0.05555556 0.
0. 0. ]
[0. 0. 0.625 0. 0.25 0.
0.125 0. ]
[0. 0. 1. 0. 0. 0.
0. 0. ]]
0 1 2 3 4 5 6 7
0 0.666667 0.333333 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
1 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.376812 0.000000 0.318841 0.014493 0.086957 0.202899
3 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000
4 0.000000 0.000000 0.476190 0.142857 0.285714 0.071429 0.023810 0.000000
5 0.111111 0.000000 0.833333 0.000000 0.055556 0.000000 0.000000 0.000000
6 0.000000 0.000000 0.625000 0.000000 0.250000 0.000000 0.125000 0.000000
7 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
平稳分布π:
{xK=: 0.0963640202016921, xI: 0.0551393912117535, xH: 0.0277274488287983, xF: 0.291820263401484, xE: 0.0416886090573549, xD: 0.474936956708340, xC: 0.00308082764764426, xB: 0.00924248294293277}