马尔可夫链预测模型的应用——以个人图书借阅为例(改进2.0版)

第一版(不成功):马尔可夫链预测模型的应用——以个人图书借阅为例

读取个人图书借阅数据

在这里插入图片描述

##图书类别:A马克思主义、列宁主义、毛泽东思想、邓小平理论;B哲学、宗教;C 社会科学总论;D 政治、法律;E 军事;F 经济;G 文化、科学、教育、体育;
##H 语言、文字;I 文学;J 艺术;K 历史、地理;N 自然科学总论;O 数理科学和化学;P 天文学、地球科学;Q 生物科学;R 医药、卫生;S 农业科学;
##T 工业技术;U 交通运输;V 航空、航天;X 环境科学、安全科学;Z 综合性图书。
import pandas as pd
# 显示Dateframe所有行
pd.set_option('display.max_rows',None)
#显示所有列
pd.set_option('display.max_columns',None)
#图书类别
Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# 读取数据
Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]

print(Person_data)

结果:

     LOAN_DATE ITEM_CALLNO   TIMESTAMP
0     2013/1/1           H  1356969600
1     2013/1/1           H  1356969600
2     2013/1/1           H  1356969600
3     2013/1/1           H  1356969600
4     2013/1/1           H  1356969600
5    2013/2/20           D  1361289600
6    2013/4/19           F  1366300800
7    2013/4/19           F  1366300800
8    2013/4/19           F  1366300800
9    2013/4/24           D  1366732800
10   2013/4/24           E  1366732800
11   2013/4/24           E  1366732800
12   2013/4/24           D  1366732800
13   2013/4/24           D  1366732800
14   2013/4/24           D  1366732800
15   2013/4/24           F  1366732800
16   2013/4/27           F  1366992000
17    2013/5/8           H  1367942400
18   2013/5/15           B  1368547200
19   2013/5/15           B  1368547200
20   2013/5/17           B  1368720000
21    2013/6/6           C  1370448000
22    2013/6/8           D  1370620800
23    2013/6/8           D  1370620800
24   2013/10/9           I  1381248000
25  2013/10/29           F  1382976000
26  2013/10/29           I  1382976000
27   2013/11/7           F  1383753600
28   2013/11/7           D  1383753600
29  2013/11/14           I  1384358400
30  2013/11/14           F  1384358400
31  2013/11/14           F  1384358400
32  2013/11/25           D  1385308800
33   2013/12/2           F  1385913600
34   2013/12/2           F  1385913600
35   2013/12/4           H  1386086400
36   2013/12/6           F  1386259200
37  2013/12/11           F  1386691200
38  2013/12/11           F  1386691200
39  2013/12/18           F  1387296000
40  2013/12/18           D  1387296000
41    2014/1/1           F  1388505600
42    2014/1/2           D  1388592000
43   2014/2/17           D  1392566400
44   2014/2/17           D  1392566400
45   2014/3/13           K  1394640000
46   2014/3/13           K  1394640000
47   2014/4/15           D  1397491200
48   2014/4/29           I  1398700800
49   2014/4/29           I  1398700800
50   2014/4/29           I  1398700800
51    2014/5/5           D  1399219200
52   2014/9/23           H  1411401600
53  2014/10/10           D  1412870400
54  2014/10/10           D  1412870400
55  2014/10/10           D  1412870400
56  2014/10/10           D  1412870400
57  2014/10/10           D  1412870400
58  2014/10/10           D  1412870400
59  2014/10/10           D  1412870400
60  2014/10/10           D  1412870400
61  2014/10/10           D  1412870400
62  2014/10/10           D  1412870400
63   2014/12/2           F  1417449600
64   2014/12/2           K  1417449600
65   2014/12/2           D  1417449600
66   2014/12/4           D  1417622400
67   2014/12/4           D  1417622400
68   2015/1/14           D  1421164800
69   2015/2/28           D  1425052800
70   2015/2/28           D  1425052800
71    2015/3/5           D  1425484800
72    2015/3/5           D  1425484800
73    2015/3/5           D  1425484800
74    2015/3/5           D  1425484800
75   2015/3/11           D  1426003200
76   2015/3/11           D  1426003200
77   2015/3/11           D  1426003200
78   2015/3/12           D  1426089600
79   2015/3/12           D  1426089600
80   2015/3/12           D  1426089600
81   2015/3/12           D  1426089600
82   2015/3/12           D  1426089600
83   2015/3/19           J  1426694400
84   2015/3/19           D  1426694400

将两列数据转化为字典

LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

t_n = len(time_number)
N = len(ITEM_CALLNO)    #样本数量
Book_data_dic = {}
i ,j = 0, 0
list = []
for I in range(t_n+N-1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
print(Book_data_dic)

结果:

{1: ['H', 'H', 'H', 'H', 'H'], 2: ['D'], 3: ['F', 'F', 'F'], 4: ['D', 'E', 'E', 'D', 'D', 'D', 'F'], 5: ['F'], 6: ['H'], 7: ['B', 'B'], 8: ['B'], 9: ['C'], 10: ['D', 'D'], 11: ['I'], 12: ['F', 'I'], 13: ['F', 'D'], 14: ['I', 'F', 'F'], 15: ['D'], 16: ['F', 'F'], 17: ['H'], 18: ['F'], 19: ['F', 'F'], 20: ['F', 'D'], 21: ['F'], 22: ['D'], 23: ['D', 'D'], 24: ['K', 'K'], 25: ['D'], 26: ['I', 'I', 'I'], 27: ['D'], 28: ['H'], 29: ['D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D'], 30: ['F', 'K', 'D'], 31: ['D', 'D'], 32: ['D'], 33: ['D', 'D'], 34: ['D', 'D', 'D', 'D'], 35: ['D', 'D', 'D'], 36: ['D', 'D', 'D', 'D', 'D'], 37: ['J', 'D']}

步长为1的频率矩阵与转移概率矩阵

import pandas as pd
import numpy as np
from sympy import *
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)

#图书类别
Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

N = len(ITEM_CALLNO)    #样本数量
t_n = len(time_number)    #次数
B_N = len(Book_category)
#列表转化为字典
def Book_list_to_dic():
    Book_data_dic = {}
    i, j = 0, 0
    list = []
    for I in range(t_n + N - 1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
    return Book_data_dic

Book_data_dic = Book_list_to_dic()
n_list = [1]    #步长为n,n_list为步长列表[1,2,3,4,5]
for n in n_list:
    print(f'步长为{n}:')
    f_array = np.array(np.zeros((B_N,B_N)))    #频数矩阵
    for I in range(t_n-n):
        for a in Book_data_dic[I+1]:
            i = 0
            for a1 in Book_category:
                if a1 == a:
                    for b in Book_data_dic[I+2]:
                        j = 0
                        for b1 in Book_category:
                            if b1 == b:
                                f_array[i][j] += 1
                            else:
                                j += 1
                else:
                    i += 1
    print(f'步长为{n}的频数矩阵f:\n',f_array)
    n_sum_f = sum(f_array[:, :]).sum()    #总频数
    print('总频数为:',n_sum_f)
    # 矩阵显示太乱,用列表的形式显示出来
    df_f = pd.DataFrame(f_array)
    print(df_f)

    P = np.array(np.zeros((B_N,B_N)))  # 转移概率矩阵
    for i in range(B_N):
        f_sum_i = sum(f_array[i, :])  # 频数矩阵一行的和
        if f_sum_i == 0:
            P[i][i] = 1     #整个行向量为零的,矩阵i行i列修正为1
        else:
            for j in range(B_N):
                P[i][j] = f_array[i][j]/f_sum_i
    print(f'步长为{n}的转移概率矩阵P_{n}:\n', P, '\n')
    # 矩阵显示太乱,用列表的形式显示出来
    df_P = pd.DataFrame(P)
    print(df_P)

结果:

步长为1:
步长为1的频数矩阵f:
 [[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  2.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  2.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0. 58.  0. 22.  0.  1.  6.  5. 14.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  2.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0. 20.  6. 12.  0.  3.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  2.  0. 15.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  5.  0.  2.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]]
总频数为: 185.0
      0    1    2     3    4     5    6    7    8    9    10   11   12   13   14   15   16   17   18   19   20
0   0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1   0.0  2.0  1.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
2   0.0  0.0  0.0   2.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
3   0.0  0.0  0.0  58.0  0.0  22.0  0.0  1.0  6.0  5.0  14.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
4   0.0  0.0  0.0   0.0  0.0   2.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5   0.0  0.0  0.0  20.0  6.0  12.0  0.0  3.0  1.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
6   0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
7   0.0  2.0  0.0  15.0  0.0   1.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
8   0.0  0.0  0.0   5.0  0.0   2.0  0.0  0.0  1.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
9   0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
10  0.0  0.0  0.0   4.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
11  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
12  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
13  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
14  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
15  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
16  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
17  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
18  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
19  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
20  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
步长为1的转移概率矩阵P_1:
 [[1.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.66666667 0.33333333 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         1.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.54716981 0.         0.20754717
  0.         0.00943396 0.05660377 0.04716981 0.13207547 0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         1.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.47619048 0.14285714 0.28571429
  0.         0.07142857 0.02380952 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  1.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.11111111 0.         0.83333333 0.         0.05555556
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.625      0.         0.25
  0.         0.         0.125      0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         1.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         1.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         1.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  1.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         1.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         1.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         1.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         1.         0.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         1.
  0.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  1.         0.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         1.         0.        ]
 [0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         1.        ]] 

      0         1         2         3         4         5    6         7         8        9        10   11   12   13   14   15   16   17   18   19   20
0   1.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1   0.0  0.666667  0.333333  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
2   0.0  0.000000  0.000000  1.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
3   0.0  0.000000  0.000000  0.547170  0.000000  0.207547  0.0  0.009434  0.056604  0.04717  0.132075  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
4   0.0  0.000000  0.000000  0.000000  0.000000  1.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5   0.0  0.000000  0.000000  0.476190  0.142857  0.285714  0.0  0.071429  0.023810  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
6   0.0  0.000000  0.000000  0.000000  0.000000  0.000000  1.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
7   0.0  0.111111  0.000000  0.833333  0.000000  0.055556  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
8   0.0  0.000000  0.000000  0.625000  0.000000  0.250000  0.0  0.000000  0.125000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
9   0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  1.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
10  0.0  0.000000  0.000000  1.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
11  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
12  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
13  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
14  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
15  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0
16  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0
17  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0
18  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0
19  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0
20  0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.0  0.000000  0.000000  0.00000  0.000000  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0

由人工统计的频率矩阵结果如下:
在这里插入图片描述
由此可以看出上面写的程序是对的。

解题步骤

以ID号8748847336为例进行计算。数据库中这个人借书37次,以前34次数据为训练集,后3次数据为测试集。

一、对历史数据进行分组

按照图书分类约定俗成的分法:
图书类别:
A马克思主义、列宁主义、毛泽东思想、邓小平理论;
B哲学、宗教;
C 社会科学总论;
D 政治、法律;
E 军事;
F 经济;
G 文化、科学、教育、体育;
H 语言、文字;
I 文学;
J 艺术;
K 历史、地理;
N 自然科学总论;
O 数理科学和化学;
P 天文学、地球科学;
Q 生物科学;
R 医药、卫生;
S 农业科学;
T 工业技术;
U 交通运输;
V 航空、航天;
X 环境科学、安全科学;
Z 综合性图书。

一、对历史数据进行分组——2.0版的改进之处

由于图书类别涉及到的专业非常广,而某个专业的学生几乎很少借阅相差较大专业类别的书,比如文科生很少会借阅工科生的专业书籍,因此状态分类不应该使用整个图书类别,而是使用他曾经借阅过的书籍类别作为状态类别。
比如:

#借阅书籍类别记录
ITEM_CALLNO = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
    #状态去重
    ITEM_CALLNO.sort()  # 重新排序
    news_items = []
    for item in ITEM_CALLNO:
        if item not in news_items:
            news_items.append(item)
    print('借阅状态分类:',news_items)
    return news_items
Book_Category(ITEM_CALLNO)

结果:

借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'J', 'K']

二、“马氏性”检验

以前34次数据为训练集,后3次数据为测试集。

import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)

# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
    Book_data_dic = {}
    i, j = 0, 0
    list = []
    for I in range(t_n + N - 1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO_data[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
    for i in range(test_n):
        Book_data_dic.pop(t_n-i)
    return Book_data_dic

#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
    ITEM_CALLNO = []
    n = t_n-test_n
    for i in range(n):
        for item in Book_data_dic[i+1]:
            ITEM_CALLNO.append(item)
    return ITEM_CALLNO

#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
    #状态去重
    ITEM_CALLNO.sort()  # 重新排序
    news_items = []
    for item in ITEM_CALLNO:
        if item not in news_items:
            news_items.append(item)
    print('借阅状态分类:',news_items)
    return news_items

N = len(ITEM_CALLNO_data)    #样本数量
t_n = len(time_number)    #次数
test_n = 3   #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)

f_array = np.array(np.zeros((B_N,B_N)))    #频数矩阵
for I in range(t_n-test_n-1):
    for a in Book_data_dic[I+1]:
        i = 0
        for a1 in Book_category:
            if a1 == a:
                for b in Book_data_dic[I+2]:
                    j = 0
                    for b1 in Book_category:
                        if b1 == b:
                            f_array[i][j] += 1
                        else:
                            j += 1
            else:
                i += 1
print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum()    #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = pd.DataFrame(f_array)
print(df_f)

P = np.array(np.zeros((B_N,B_N)))  # 转移概率矩阵
P_j_list = []    #边际概率列表
for i in range(B_N):
    f_sum_i = sum(f_array[i, :])  # 频数矩阵一行的和
    P_j_list.append(f_sum_i/n_sum_f)    #求边际概率矩阵
    if f_sum_i == 0:
        P[i][i] = 1     #整个行向量为零的,矩阵i行i列修正为1
    else:
        for j in range(B_N):
            P[i][j] = f_array[i][j]/f_sum_i
print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = pd.DataFrame(P)
print(df_P)
P_j = np.array(P_j_list)    #边际概率矩阵
print(P_j)
# ”马氏“检验
x = 0
for i in range(B_N):
    for j in range(B_N):
        if f_array[i][j] == 0:
            x += 0
        elif P_j[j] == 0:
            x += 0
        else:
            x += f_array[i][j] * abs(np.log(P[i][j] / P_j[j]))
X_2 = 2 * x
print('卡方分布X^2为:', X_2)

结果:

借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
步长为1的频数矩阵f:
 [[ 2.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  2.  0.  0.  0.  0.  0.]
 [ 0.  0. 26.  0. 22.  1.  6. 14.]
 [ 0.  0.  0.  0.  2.  0.  0.  0.]
 [ 0.  0. 20.  6. 12.  3.  1.  0.]
 [ 2.  0. 15.  0.  1.  0.  0.  0.]
 [ 0.  0.  5.  0.  2.  0.  1.  0.]
 [ 0.  0.  4.  0.  0.  0.  0.  0.]]
总频数为: 148.0
     0    1     2    3     4    5    6     7
0  2.0  1.0   0.0  0.0   0.0  0.0  0.0   0.0
1  0.0  0.0   2.0  0.0   0.0  0.0  0.0   0.0
2  0.0  0.0  26.0  0.0  22.0  1.0  6.0  14.0
3  0.0  0.0   0.0  0.0   2.0  0.0  0.0   0.0
4  0.0  0.0  20.0  6.0  12.0  3.0  1.0   0.0
5  2.0  0.0  15.0  0.0   1.0  0.0  0.0   0.0
6  0.0  0.0   5.0  0.0   2.0  0.0  1.0   0.0
7  0.0  0.0   4.0  0.0   0.0  0.0  0.0   0.0
步长为1的转移概率矩阵P_1:
 [[0.66666667 0.33333333 0.         0.         0.         0.
  0.         0.        ]
 [0.         0.         1.         0.         0.         0.
  0.         0.        ]
 [0.         0.         0.37681159 0.         0.31884058 0.01449275
  0.08695652 0.20289855]
 [0.         0.         0.         0.         1.         0.
  0.         0.        ]
 [0.         0.         0.47619048 0.14285714 0.28571429 0.07142857
  0.02380952 0.        ]
 [0.11111111 0.         0.83333333 0.         0.05555556 0.
  0.         0.        ]
 [0.         0.         0.625      0.         0.25       0.
  0.125      0.        ]
 [0.         0.         1.         0.         0.         0.
  0.         0.        ]] 

          0         1         2         3         4         5         6         7
0  0.666667  0.333333  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
1  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000  0.000000  0.000000
2  0.000000  0.000000  0.376812  0.000000  0.318841  0.014493  0.086957  0.202899
3  0.000000  0.000000  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000
4  0.000000  0.000000  0.476190  0.142857  0.285714  0.071429  0.023810  0.000000
5  0.111111  0.000000  0.833333  0.000000  0.055556  0.000000  0.000000  0.000000
6  0.000000  0.000000  0.625000  0.000000  0.250000  0.000000  0.125000  0.000000
7  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000  0.000000  0.000000
[0.02027027 0.01351351 0.46621622 0.01351351 0.28378378 0.12162162
 0.05405405 0.02702703]
卡方分布X^2为: 183.92468307288394

马尔可夫性的统计检验(马氏性检验)
给定显著性水平α=0.05,自由度(m-1)2=(9-1)2=64。查卡方分布的表格可知分位点X0.052=83.675。
由于本题计算的X2 > X0.052(64),故图书借阅所对应的的状态序列满足“马氏性”,所以这个序列能当作马尔可夫链来对待。

改进上面程序的数据框输出

import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)

# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
    Book_data_dic = {}
    i, j = 0, 0
    list = []
    for I in range(t_n + N - 1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO_data[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
    for i in range(test_n):
        Book_data_dic.pop(t_n-i)
    return Book_data_dic

#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
    ITEM_CALLNO = []
    n = t_n-test_n
    for i in range(n):
        for item in Book_data_dic[i+1]:
            ITEM_CALLNO.append(item)
    return ITEM_CALLNO

#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
    #状态去重
    ITEM_CALLNO.sort()  # 重新排序
    news_items = []
    for item in ITEM_CALLNO:
        if item not in news_items:
            news_items.append(item)
    print('借阅状态分类:',news_items)
    return news_items

def Mat_To_Frame(M, Book_category, B_N):
    M = M.tolist()  # 矩阵转换为列表
    for i in range(B_N):
        M[i].insert(0, Book_category[i])
    Book_category.insert(0, ' ')
    M.insert(0, Book_category)
    df = pd.DataFrame(M)
    if Book_category[0] == ' ':
        Book_category.remove(' ')
    return df


N = len(ITEM_CALLNO_data)    #样本数量
t_n = len(time_number)    #次数
test_n = 3   #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)

f_array = np.array(np.zeros((B_N,B_N)))    #频数矩阵
for I in range(t_n-test_n-1):
    for a in Book_data_dic[I+1]:
        i = 0
        for a1 in Book_category:
            if a1 == a:
                for b in Book_data_dic[I+2]:
                    j = 0
                    for b1 in Book_category:
                        if b1 == b:
                            f_array[i][j] += 1
                        else:
                            j += 1
            else:
                i += 1
print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum()    #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = Mat_To_Frame(f_array, Book_category, B_N)
print(df_f)

P = np.array(np.zeros((B_N,B_N)))  # 转移概率矩阵
P_j_list = []    #边际概率列表
for i in range(B_N):
    f_sum_i = sum(f_array[i, :])  # 频数矩阵一行的和
    P_j_list.append(f_sum_i/n_sum_f)    #求边际概率矩阵
    if f_sum_i == 0:
        P[i][i] = 1     #整个行向量为零的,矩阵i行i列修正为1
    else:
        for j in range(B_N):
            P[i][j] = f_array[i][j]/f_sum_i
print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = Mat_To_Frame(P, Book_category, B_N)
print(df_P)
P_j = np.array(P_j_list)    #边际概率矩阵
print(P_j)
# ”马氏“检验
x = 0
for i in range(B_N):
    for j in range(B_N):
        if f_array[i][j] == 0:
            x += 0
        elif P_j[j] == 0:
            x += 0
        else:
            x += f_array[i][j] * abs(np.log(P[i][j] / P_j[j]))
X_2 = 2 * x
print('卡方分布X^2为:', X_2)

结果:

借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
步长为1的频数矩阵f:
 [[ 2.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  2.  0.  0.  0.  0.  0.]
 [ 0.  0. 26.  0. 22.  1.  6. 14.]
 [ 0.  0.  0.  0.  2.  0.  0.  0.]
 [ 0.  0. 20.  6. 12.  3.  1.  0.]
 [ 2.  0. 15.  0.  1.  0.  0.  0.]
 [ 0.  0.  5.  0.  2.  0.  1.  0.]
 [ 0.  0.  4.  0.  0.  0.  0.  0.]]
总频数为: 148.0
   0  1  2   3  4   5  6  7   8
0     B  C   D  E   F  H  I   K
1  B  2  1   0  0   0  0  0   0
2  C  0  0   2  0   0  0  0   0
3  D  0  0  26  0  22  1  6  14
4  E  0  0   0  0   2  0  0   0
5  F  0  0  20  6  12  3  1   0
6  H  2  0  15  0   1  0  0   0
7  I  0  0   5  0   2  0  1   0
8  K  0  0   4  0   0  0  0   0
步长为1的转移概率矩阵P_1:
 [[0.66666667 0.33333333 0.         0.         0.         0.
  0.         0.        ]
 [0.         0.         1.         0.         0.         0.
  0.         0.        ]
 [0.         0.         0.37681159 0.         0.31884058 0.01449275
  0.08695652 0.20289855]
 [0.         0.         0.         0.         1.         0.
  0.         0.        ]
 [0.         0.         0.47619048 0.14285714 0.28571429 0.07142857
  0.02380952 0.        ]
 [0.11111111 0.         0.83333333 0.         0.05555556 0.
  0.         0.        ]
 [0.         0.         0.625      0.         0.25       0.
  0.125      0.        ]
 [0.         0.         1.         0.         0.         0.
  0.         0.        ]] 

   0         1         2         3         4          5          6          7         8
0            B         C         D         E          F          H          I         K
1  B  0.666667  0.333333         0         0          0          0          0         0
2  C         0         0         1         0          0          0          0         0
3  D         0         0  0.376812         0   0.318841  0.0144928  0.0869565  0.202899
4  E         0         0         0         0          1          0          0         0
5  F         0         0   0.47619  0.142857   0.285714  0.0714286  0.0238095         0
6  H  0.111111         0  0.833333         0  0.0555556          0          0         0
7  I         0         0     0.625         0       0.25          0      0.125         0
8  K         0         0         1         0          0          0          0         0
[0.02027027 0.01351351 0.46621622 0.01351351 0.28378378 0.12162162
 0.05405405 0.02702703]
卡方分布X^2为: 183.92468307288394

基于绝对分布的马尔可夫链预测

预测第35次

import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)

# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
    Book_data_dic = {}
    i, j = 0, 0
    list = []
    for I in range(t_n + N - 1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO_data[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
    for i in range(test_n):
        Book_data_dic.pop(t_n-i)
    return Book_data_dic

#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
    ITEM_CALLNO = []
    n = t_n-test_n
    for i in range(n):
        for item in Book_data_dic[i+1]:
            ITEM_CALLNO.append(item)
    return ITEM_CALLNO

#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
    #状态去重
    ITEM_CALLNO.sort()  # 重新排序
    news_items = []
    for item in ITEM_CALLNO:
        if item not in news_items:
            news_items.append(item)
    print('借阅状态分类:',news_items)
    return news_items

def Mat_To_Frame(M, Book_category, B_N):
    M = M.tolist()  # 矩阵转换为列表
    for i in range(B_N):
        M[i].insert(0, Book_category[i])
    Book_category.insert(0, ' ')
    M.insert(0, Book_category)
    df = pd.DataFrame(M)
    if Book_category[0] == ' ':
        Book_category.remove(' ')
    return df


N = len(ITEM_CALLNO_data)    #样本数量
t_n = len(time_number)    #次数
test_n = 3   #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)

f_array = np.array(np.zeros((B_N,B_N)))    #频数矩阵
for I in range(t_n-test_n-1):
    for a in Book_data_dic[I+1]:
        i = 0
        for a1 in Book_category:
            if a1 == a:
                for b in Book_data_dic[I+2]:
                    j = 0
                    for b1 in Book_category:
                        if b1 == b:
                            f_array[i][j] += 1
                        else:
                            j += 1
            else:
                i += 1
# print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum()    #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = Mat_To_Frame(f_array, Book_category, B_N)
print(f'步长为1的频数矩阵f(数据框展示形式):\n',df_f)

P = np.array(np.zeros((B_N,B_N)))  # 转移概率矩阵
P_j_list = []    #边际概率列表
for i in range(B_N):
    f_sum_i = sum(f_array[i, :])  # 频数矩阵一行的和
    P_j_list.append(f_sum_i/n_sum_f)    #求边际概率矩阵
    if f_sum_i == 0:
        P[i][i] = 1     #整个行向量为零的,矩阵i行i列修正为1
    else:
        for j in range(B_N):
            P[i][j] = f_array[i][j]/f_sum_i
# print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = Mat_To_Frame(P, Book_category, B_N)
print(f'步长为1的转移概率矩阵P_1(数据框展示形式):\n', df_P, '\n')

#基于绝对分布的马尔可夫链预测
P_0 = np.mat([[0, 0, 4, 0, 0, 0, 0, 0]])
P_next = P_0 * P
P_next_dic = {}
for i in range(B_N):
    key = Book_category[i]
    value = P_next.tolist()[0][i]
    P_next_dic[key] = value
print(f"第{t_n-test_n+1}的状态:\n",P_next_dic)

结果:

借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
总频数为: 148.0
步长为1的频数矩阵f(数据框展示形式):
    0  1  2   3  4   5  6  7   8
0     B  C   D  E   F  H  I   K
1  B  2  1   0  0   0  0  0   0
2  C  0  0   2  0   0  0  0   0
3  D  0  0  26  0  22  1  6  14
4  E  0  0   0  0   2  0  0   0
5  F  0  0  20  6  12  3  1   0
6  H  2  0  15  0   1  0  0   0
7  I  0  0   5  0   2  0  1   0
8  K  0  0   4  0   0  0  0   0
步长为1的转移概率矩阵P_1(数据框展示形式):
    0         1         2         3         4          5          6          7         8
0            B         C         D         E          F          H          I         K
1  B  0.666667  0.333333         0         0          0          0          0         0
2  C         0         0         1         0          0          0          0         0
3  D         0         0  0.376812         0   0.318841  0.0144928  0.0869565  0.202899
4  E         0         0         0         0          1          0          0         0
5  F         0         0   0.47619  0.142857   0.285714  0.0714286  0.0238095         0
6  H  0.111111         0  0.833333         0  0.0555556          0          0         0
7  I         0         0     0.625         0       0.25          0      0.125         0
8  K         0         0         1         0          0          0          0         035的状态:
 {'B': 0.0, 'C': 0.0, 'D': 1.5072463768115942, 'E': 0.0, 'F': 1.2753623188405796, 'H': 0.057971014492753624, 'I': 0.34782608695652173, 'K': 0.8115942028985508}

用前34次数据预测第35次借阅可能,得到借阅“D”类书的可能性最大。
实际上第35次借阅的是“D”类书,3本

预测第36次
将test_n由3改成2即可

import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)

# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
    Book_data_dic = {}
    i, j = 0, 0
    list = []
    for I in range(t_n + N - 1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO_data[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
    for i in range(test_n):
        Book_data_dic.pop(t_n-i)
    return Book_data_dic

#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
    ITEM_CALLNO = []
    n = t_n-test_n
    for i in range(n):
        for item in Book_data_dic[i+1]:
            ITEM_CALLNO.append(item)
    return ITEM_CALLNO

#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
    #状态去重
    ITEM_CALLNO.sort()  # 重新排序
    news_items = []
    for item in ITEM_CALLNO:
        if item not in news_items:
            news_items.append(item)
    print('借阅状态分类:',news_items)
    return news_items

def Mat_To_Frame(M, Book_category, B_N):
    M = M.tolist()  # 矩阵转换为列表
    for i in range(B_N):
        M[i].insert(0, Book_category[i])
    Book_category.insert(0, ' ')
    M.insert(0, Book_category)
    df = pd.DataFrame(M)
    if Book_category[0] == ' ':
        Book_category.remove(' ')
    return df


N = len(ITEM_CALLNO_data)    #样本数量
t_n = len(time_number)    #次数
test_n = 2   #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)

f_array = np.array(np.zeros((B_N,B_N)))    #频数矩阵
for I in range(t_n-test_n-1):
    for a in Book_data_dic[I+1]:
        i = 0
        for a1 in Book_category:
            if a1 == a:
                for b in Book_data_dic[I+2]:
                    j = 0
                    for b1 in Book_category:
                        if b1 == b:
                            f_array[i][j] += 1
                        else:
                            j += 1
            else:
                i += 1
# print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum()    #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = Mat_To_Frame(f_array, Book_category, B_N)
print(f'步长为1的频数矩阵f(数据框展示形式):\n',df_f)

P = np.array(np.zeros((B_N,B_N)))  # 转移概率矩阵
P_j_list = []    #边际概率列表
for i in range(B_N):
    f_sum_i = sum(f_array[i, :])  # 频数矩阵一行的和
    P_j_list.append(f_sum_i/n_sum_f)    #求边际概率矩阵
    if f_sum_i == 0:
        P[i][i] = 1     #整个行向量为零的,矩阵i行i列修正为1
    else:
        for j in range(B_N):
            P[i][j] = f_array[i][j]/f_sum_i
# print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = Mat_To_Frame(P, Book_category, B_N)
print(f'步长为1的转移概率矩阵P_1(数据框展示形式):\n', df_P, '\n')

#基于绝对分布的马尔可夫链预测
P_0 = np.mat([[0, 0, 3, 0, 0, 0, 0, 0]])
P_next = P_0 * P
P_next_dic = {}
for i in range(B_N):
    key = Book_category[i]
    value = P_next.tolist()[0][i]
    P_next_dic[key] = value
print(f"第{t_n-test_n+1}的状态:\n",P_next_dic)

结果:

借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
总频数为: 160.0
步长为1的频数矩阵f(数据框展示形式):
    0  1  2   3  4   5  6  7   8
0     B  C   D  E   F  H  I   K
1  B  2  1   0  0   0  0  0   0
2  C  0  0   2  0   0  0  0   0
3  D  0  0  38  0  22  1  6  14
4  E  0  0   0  0   2  0  0   0
5  F  0  0  20  6  12  3  1   0
6  H  2  0  15  0   1  0  0   0
7  I  0  0   5  0   2  0  1   0
8  K  0  0   4  0   0  0  0   0
步长为1的转移概率矩阵P_1(数据框展示形式):
    0         1         2         3         4          5          6          7        8
0            B         C         D         E          F          H          I        K
1  B  0.666667  0.333333         0         0          0          0          0        0
2  C         0         0         1         0          0          0          0        0
3  D         0         0  0.469136         0   0.271605  0.0123457  0.0740741  0.17284
4  E         0         0         0         0          1          0          0        0
5  F         0         0   0.47619  0.142857   0.285714  0.0714286  0.0238095        0
6  H  0.111111         0  0.833333         0  0.0555556          0          0        0
7  I         0         0     0.625         0       0.25          0      0.125        0
8  K         0         0         1         0          0          0          0        036的状态:
 {'B': 0.0, 'C': 0.0, 'D': 1.4074074074074074, 'E': 0.0, 'F': 0.8148148148148148, 'H': 0.037037037037037035, 'I': 0.2222222222222222, 'K': 0.5185185185185185}

用前35次数据预测第36次借阅可能,得到借阅“D”类书的可能性最大。
实际上第36次借阅的是“D”类书,5本

预测第37次
将test_n改为1即可

import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)

# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
    Book_data_dic = {}
    i, j = 0, 0
    list = []
    for I in range(t_n + N - 1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO_data[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
    for i in range(test_n):
        Book_data_dic.pop(t_n-i)
    return Book_data_dic

#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
    ITEM_CALLNO = []
    n = t_n-test_n
    for i in range(n):
        for item in Book_data_dic[i+1]:
            ITEM_CALLNO.append(item)
    return ITEM_CALLNO

#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
    #状态去重
    ITEM_CALLNO.sort()  # 重新排序
    news_items = []
    for item in ITEM_CALLNO:
        if item not in news_items:
            news_items.append(item)
    print('借阅状态分类:',news_items)
    return news_items

def Mat_To_Frame(M, Book_category, B_N):
    M = M.tolist()  # 矩阵转换为列表
    for i in range(B_N):
        M[i].insert(0, Book_category[i])
    Book_category.insert(0, ' ')
    M.insert(0, Book_category)
    df = pd.DataFrame(M)
    if Book_category[0] == ' ':
        Book_category.remove(' ')
    return df


N = len(ITEM_CALLNO_data)    #样本数量
t_n = len(time_number)    #次数
test_n = 1   #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)

f_array = np.array(np.zeros((B_N,B_N)))    #频数矩阵
for I in range(t_n-test_n-1):
    for a in Book_data_dic[I+1]:
        i = 0
        for a1 in Book_category:
            if a1 == a:
                for b in Book_data_dic[I+2]:
                    j = 0
                    for b1 in Book_category:
                        if b1 == b:
                            f_array[i][j] += 1
                        else:
                            j += 1
            else:
                i += 1
# print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum()    #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = Mat_To_Frame(f_array, Book_category, B_N)
print(f'步长为1的频数矩阵f(数据框展示形式):\n',df_f)

P = np.array(np.zeros((B_N,B_N)))  # 转移概率矩阵
P_j_list = []    #边际概率列表
for i in range(B_N):
    f_sum_i = sum(f_array[i, :])  # 频数矩阵一行的和
    P_j_list.append(f_sum_i/n_sum_f)    #求边际概率矩阵
    if f_sum_i == 0:
        P[i][i] = 1     #整个行向量为零的,矩阵i行i列修正为1
    else:
        for j in range(B_N):
            P[i][j] = f_array[i][j]/f_sum_i
# print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = Mat_To_Frame(P, Book_category, B_N)
print(f'步长为1的转移概率矩阵P_1(数据框展示形式):\n', df_P, '\n')

#基于绝对分布的马尔可夫链预测
P_0 = np.mat([[0, 0, 5, 0, 0, 0, 0, 0]])
P_next = P_0 * P
P_next_dic = {}
for i in range(B_N):
    key = Book_category[i]
    value = P_next.tolist()[0][i]
    P_next_dic[key] = value
print(f"第{t_n-test_n+1}的状态:\n",P_next_dic)

结果:

借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
总频数为: 175.0
步长为1的频数矩阵f(数据框展示形式):
    0  1  2   3  4   5  6  7   8
0     B  C   D  E   F  H  I   K
1  B  2  1   0  0   0  0  0   0
2  C  0  0   2  0   0  0  0   0
3  D  0  0  53  0  22  1  6  14
4  E  0  0   0  0   2  0  0   0
5  F  0  0  20  6  12  3  1   0
6  H  2  0  15  0   1  0  0   0
7  I  0  0   5  0   2  0  1   0
8  K  0  0   4  0   0  0  0   0
步长为1的转移概率矩阵P_1(数据框展示形式):
    0         1         2         3         4          5          6          7         8
0            B         C         D         E          F          H          I         K
1  B  0.666667  0.333333         0         0          0          0          0         0
2  C         0         0         1         0          0          0          0         0
3  D         0         0  0.552083         0   0.229167  0.0104167     0.0625  0.145833
4  E         0         0         0         0          1          0          0         0
5  F         0         0   0.47619  0.142857   0.285714  0.0714286  0.0238095         0
6  H  0.111111         0  0.833333         0  0.0555556          0          0         0
7  I         0         0     0.625         0       0.25          0      0.125         0
8  K         0         0         1         0          0          0          0         037的状态:
{'B': 0.0, 'C': 0.0, 'D': 2.760416666666667, 'E': 0.0, 'F': 1.1458333333333333, 'H': 0.05208333333333333, 'I': 0.3125, 'K': 0.7291666666666667}

用前36次数据预测第37次借阅可能,得到借阅“D”类书的可能性最大。
实际上第37次借阅的是“D”类书,1本和“J”类书1本。
预测出现偏差,主要原因是因为前面35次数据记录,并没有“J”类书的借阅记录,第37次借阅记录,“J”类书是第一次出现,而之前的状态不包含未出现的“J”类,所以这种“突发性”影响了预测的准确性。

三、计算各阶的一步转移概率矩阵

import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)

# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
    Book_data_dic = {}
    i, j = 0, 0
    list = []
    for I in range(t_n + N - 1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO_data[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
    for i in range(test_n):
        Book_data_dic.pop(t_n-i)
    return Book_data_dic

#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
    ITEM_CALLNO = []
    n = t_n-test_n
    for i in range(n):
        for item in Book_data_dic[i+1]:
            ITEM_CALLNO.append(item)
    return ITEM_CALLNO

#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
    #状态去重
    ITEM_CALLNO.sort()  # 重新排序
    news_items = []
    for item in ITEM_CALLNO:
        if item not in news_items:
            news_items.append(item)
    print('借阅状态分类:',news_items)
    return news_items

def Mat_To_Frame(M, Book_category, B_N):
    M = M.tolist()  # 矩阵转换为列表
    for i in range(B_N):
        M[i].insert(0, Book_category[i])
    Book_category.insert(0, ' ')
    M.insert(0, Book_category)
    df = pd.DataFrame(M)
    if Book_category[0] == ' ':
        Book_category.remove(' ')
    return df


N = len(ITEM_CALLNO_data)    #样本数量
t_n = len(time_number)    #次数
test_n = 3   #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)

n_list = [1,2,3,4,5]    #步长为n,n_list为步长列表[1,2,3,4,5]
for n in n_list:
    print(f'步长为{n}:')
    f_array = np.array(np.zeros((B_N,B_N)))    #频数矩阵
    for I in range(t_n-test_n-n):
        for a in Book_data_dic[I+1]:
            i = 0
            for a1 in Book_category:
                if a1 == a:
                    for b in Book_data_dic[I+1+n]:
                        j = 0
                        for b1 in Book_category:
                            if b1 == b:
                                f_array[i][j] += 1
                            else:
                                j += 1
                else:
                    i += 1
    # print(f'步长为{n}的频数矩阵f:\n',f_array)
    n_sum_f = sum(f_array[:, :]).sum()    #总频数
    print('总频数为:',n_sum_f)
    # 矩阵显示太乱,用列表的形式显示出来
    df_f = Mat_To_Frame(f_array, Book_category, B_N)
    print(f'步长为{n}的频数矩阵f的列表形式:\n',df_f)

    P = np.array(np.zeros((B_N,B_N)))  # 转移概率矩阵
    P_j_list = []    #边际概率列表
    for i in range(B_N):
        f_sum_i = sum(f_array[i, :])  # 频数矩阵一行的和
        P_j_list.append(f_sum_i/n_sum_f)    #求边际概率矩阵
        if f_sum_i == 0:
            P[i][i] = 1     #整个行向量为零的,矩阵i行i列修正为1
        else:
            for j in range(B_N):
                P[i][j] = f_array[i][j]/f_sum_i
    # print(f'步长为{n}的转移概率矩阵P_{n}:\n', P, '\n')
    # 矩阵显示太乱,用列表的形式显示出来
    df_P = Mat_To_Frame(P, Book_category, B_N)
    print(f'步长为{n}的转移概率矩阵P_{n}的列表形式:\n', df_P, '\n')

结果:

借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
步长为1:
总频数为: 148.0
步长为1的频数矩阵f的列表形式:
    0  1  2   3  4   5  6  7   8
0     B  C   D  E   F  H  I   K
1  B  2  1   0  0   0  0  0   0
2  C  0  0   2  0   0  0  0   0
3  D  0  0  26  0  22  1  6  14
4  E  0  0   0  0   2  0  0   0
5  F  0  0  20  6  12  3  1   0
6  H  2  0  15  0   1  0  0   0
7  I  0  0   5  0   2  0  1   0
8  K  0  0   4  0   0  0  0   0
步长为1的转移概率矩阵P_1的列表形式:
    0         1         2         3         4          5          6          7         8
0            B         C         D         E          F          H          I         K
1  B  0.666667  0.333333         0         0          0          0          0         0
2  C         0         0         1         0          0          0          0         0
3  D         0         0  0.376812         0   0.318841  0.0144928  0.0869565  0.202899
4  E         0         0         0         0          1          0          0         0
5  F         0         0   0.47619  0.142857   0.285714  0.0714286  0.0238095         0
6  H  0.111111         0  0.833333         0  0.0555556          0          0         0
7  I         0         0     0.625         0       0.25          0      0.125         0
8  K         0         0         1         0          0          0          0         0 

步长为2:
总频数为: 131.0
步长为2的频数矩阵f的列表形式:
    0  1  2   3  4   5  6  7  8
0     B  C   D  E   F  H  I  K
1  B  0  2   2  0   0  0  0  0
2  C  0  0   0  0   0  0  1  0
3  D  0  0  48  2   3  5  2  2
4  E  0  0   0  0   0  2  0  0
5  F  2  0   6  0  14  1  1  0
6  H  1  0   1  0  18  0  0  1
7  I  0  0   1  0   5  3  1  0
8  K  0  0   1  0   0  0  6  0
步长为2的转移概率矩阵P_2的列表形式:
    0          1    2         3          4          5          6          7          8
0             B    C         D          E          F          H          I          K
1  B          0  0.5       0.5          0          0          0          0          0
2  C          0    0         0          0          0          0          1          0
3  D          0    0  0.774194  0.0322581  0.0483871  0.0806452  0.0322581  0.0322581
4  E          0    0         0          0          0          1          0          0
5  F  0.0833333    0      0.25          0   0.583333  0.0416667  0.0416667          0
6  H   0.047619    0  0.047619          0   0.857143          0          0   0.047619
7  I          0    0       0.1          0        0.5        0.3        0.1          0
8  K          0    0  0.142857          0          0          0   0.857143          0 

步长为3:
总频数为: 163.0
步长为3的频数矩阵f的列表形式:
    0  1  2   3   4  5  6  7  8
0     B  C   D   E  F  H  I  K
1  B  0  0   4   0  0  0  1  0
2  C  0  0   0   0  1  0  1  0
3  D  8  0  26   0  7  1  6  1
4  E  4  0   0   0  0  0  0  0
5  F  3  0   7   0  7  5  0  2
6  H  0  1  23  10  6  0  0  0
7  I  0  0  31   0  2  1  1  0
8  K  0  0   4   0  0  0  0  0
步长为3的转移概率矩阵P_3的列表形式:
    0         1      2         3     4          5          6          7          8
0            B      C         D     E          F          H          I          K
1  B         0      0       0.8     0          0          0        0.2          0
2  C         0      0         0     0        0.5          0        0.5          0
3  D  0.163265      0  0.530612     0   0.142857  0.0204082   0.122449  0.0204082
4  E         1      0         0     0          0          0          0          0
5  F     0.125      0  0.291667     0   0.291667   0.208333          0  0.0833333
6  H         0  0.025     0.575  0.25       0.15          0          0          0
7  I         0      0  0.885714     0  0.0571429  0.0285714  0.0285714          0
8  K         0      0         1     0          0          0          0          0 

步长为4:
总频数为: 122.0
步长为4的频数矩阵f的列表形式:
    0  1  2   3  4  5  6  7  8
0     B  C   D  E  F  H  I  K
1  B  0  0   0  0  1  0  3  0
2  C  0  0   1  0  1  0  0  0
3  D  4  0  38  0  6  2  5  2
4  E  2  0   0  0  0  0  0  0
5  F  7  1  12  0  6  1  0  2
6  H  0  0   3  0  6  0  0  0
7  I  0  0   4  0  6  0  0  3
8  K  0  0   4  0  0  2  0  0
步长为4的转移概率矩阵P_4的列表形式:
    0          1          2         3  4         5          6          7          8
0             B          C         D  E         F          H          I          K
1  B          0          0         0  0      0.25          0       0.75          0
2  C          0          0       0.5  0       0.5          0          0          0
3  D  0.0701754          0  0.666667  0  0.105263  0.0350877  0.0877193  0.0350877
4  E          1          0         0  0         0          0          0          0
5  F   0.241379  0.0344828  0.413793  0  0.206897  0.0344828          0  0.0689655
6  H          0          0  0.333333  0  0.666667          0          0          0
7  I          0          0  0.307692  0  0.461538          0          0   0.230769
8  K          0          0  0.666667  0         0   0.333333          0          0 

步长为5:
总频数为: 134.0
步长为5的频数矩阵f的列表形式:
    0  1  2   3  4  5  6  7  8
0     B  C   D  E  F  H  I  K
1  B  0  0   1  0  3  0  2  0
2  C  0  0   0  0  2  0  1  0
3  D  2  4  47  0  3  2  0  1
4  E  0  2   0  0  0  0  0  0
5  F  3  1   5  0  7  1  3  4
6  H  0  0   3  0  0  5  1  0
7  I  0  0   6  0  4  1  0  0
8  K  0  0  20  0  0  0  0  0
步长为5的转移概率矩阵P_5的列表形式:
    0          1          2         3  4          5          6         7          8
0             B          C         D  E          F          H         I          K
1  B          0          0  0.166667  0        0.5          0  0.333333          0
2  C          0          0         0  0   0.666667          0  0.333333          0
3  D  0.0338983  0.0677966   0.79661  0  0.0508475  0.0338983         0  0.0169492
4  E          0          1         0  0          0          0         0          0
5  F      0.125  0.0416667  0.208333  0   0.291667  0.0416667     0.125   0.166667
6  H          0          0  0.333333  0          0   0.555556  0.111111          0
7  I          0          0  0.545455  0   0.363636  0.0909091         0          0
8  K          0          0         1  0          0          0         0          0 

四、计算各阶相关系数(权重)

五、计算平稳分布

以34次数据集为例进行计算,所以测试集数量test_n为3

import pandas as pd
import numpy as np
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)

# #图书类别
# Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO_data = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

#列表转化为字典
def Book_list_to_dic(t_n, N, test_n):
    Book_data_dic = {}
    i, j = 0, 0
    list = []
    for I in range(t_n + N - 1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO_data[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
    for i in range(test_n):
        Book_data_dic.pop(t_n-i)
    return Book_data_dic

#训练集的借阅数据
def ITEM_CALLNO_Train(Book_data_dic, t_n, test_n):
    ITEM_CALLNO = []
    n = t_n-test_n
    for i in range(n):
        for item in Book_data_dic[i+1]:
            ITEM_CALLNO.append(item)
    return ITEM_CALLNO

#样本种类,只取用用户借阅的数据中图书种类,没有出现过的不纳入
def Book_Category(ITEM_CALLNO):
    #状态去重
    ITEM_CALLNO.sort()  # 重新排序
    news_items = []
    for item in ITEM_CALLNO:
        if item not in news_items:
            news_items.append(item)
    print('借阅状态分类:',news_items)
    return news_items

N = len(ITEM_CALLNO_data)    #样本数量
t_n = len(time_number)    #次数
test_n = 3   #测试集数量
Book_data_dic = Book_list_to_dic(t_n, N, test_n)
ITEM_CALLNO = ITEM_CALLNO_Train(Book_data_dic, t_n, test_n)
Book_category = Book_Category(ITEM_CALLNO)
B_N = len(Book_category)

f_array = np.array(np.zeros((B_N,B_N)))    #频数矩阵
for I in range(t_n-test_n-1):
    for a in Book_data_dic[I+1]:
        i = 0
        for a1 in Book_category:
            if a1 == a:
                for b in Book_data_dic[I+2]:
                    j = 0
                    for b1 in Book_category:
                        if b1 == b:
                            f_array[i][j] += 1
                        else:
                            j += 1
            else:
                i += 1
print(f'步长为1的频数矩阵f:\n',f_array)
n_sum_f = sum(f_array[:, :]).sum()    #总频数
print('总频数为:',n_sum_f)
# 矩阵显示太乱,用列表的形式显示出来
df_f = pd.DataFrame(f_array)
print(df_f)

P = np.array(np.zeros((B_N,B_N)))  # 转移概率矩阵
P_j_list = []    #边际概率列表
for i in range(B_N):
    f_sum_i = sum(f_array[i, :])  # 频数矩阵一行的和
    P_j_list.append(f_sum_i/n_sum_f)    #求边际概率矩阵
    if f_sum_i == 0:
        P[i][i] = 1     #整个行向量为零的,矩阵i行i列修正为1
    else:
        for j in range(B_N):
            P[i][j] = f_array[i][j]/f_sum_i
print(f'步长为1的转移概率矩阵P_1:\n', P, '\n')
# 矩阵显示太乱,用列表的形式显示出来
df_P = pd.DataFrame(P)
print(df_P)

#求平稳分布
from sympy import *
xB, xC, xD, xE, xF, xH, xI, xK= symbols('xB, xC, xD, xE, xF, xH, xI, xK=')
I = xB+xC+xD+xE+xF+xH+xI+xK -1
S = Matrix([[xB, xC, xD, xE, xF, xH, xI, xK]])
X = solve([S*P-S, I],[xB, xC, xD, xE, xF, xH, xI, xK])
print('平稳分布π:\n',X)

结果:

借阅状态分类: ['B', 'C', 'D', 'E', 'F', 'H', 'I', 'K']
步长为1的频数矩阵f:
 [[ 2.  1.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  2.  0.  0.  0.  0.  0.]
 [ 0.  0. 26.  0. 22.  1.  6. 14.]
 [ 0.  0.  0.  0.  2.  0.  0.  0.]
 [ 0.  0. 20.  6. 12.  3.  1.  0.]
 [ 2.  0. 15.  0.  1.  0.  0.  0.]
 [ 0.  0.  5.  0.  2.  0.  1.  0.]
 [ 0.  0.  4.  0.  0.  0.  0.  0.]]
总频数为: 148.0
     0    1     2    3     4    5    6     7
0  2.0  1.0   0.0  0.0   0.0  0.0  0.0   0.0
1  0.0  0.0   2.0  0.0   0.0  0.0  0.0   0.0
2  0.0  0.0  26.0  0.0  22.0  1.0  6.0  14.0
3  0.0  0.0   0.0  0.0   2.0  0.0  0.0   0.0
4  0.0  0.0  20.0  6.0  12.0  3.0  1.0   0.0
5  2.0  0.0  15.0  0.0   1.0  0.0  0.0   0.0
6  0.0  0.0   5.0  0.0   2.0  0.0  1.0   0.0
7  0.0  0.0   4.0  0.0   0.0  0.0  0.0   0.0
步长为1的转移概率矩阵P_1:
 [[0.66666667 0.33333333 0.         0.         0.         0.
  0.         0.        ]
 [0.         0.         1.         0.         0.         0.
  0.         0.        ]
 [0.         0.         0.37681159 0.         0.31884058 0.01449275
  0.08695652 0.20289855]
 [0.         0.         0.         0.         1.         0.
  0.         0.        ]
 [0.         0.         0.47619048 0.14285714 0.28571429 0.07142857
  0.02380952 0.        ]
 [0.11111111 0.         0.83333333 0.         0.05555556 0.
  0.         0.        ]
 [0.         0.         0.625      0.         0.25       0.
  0.125      0.        ]
 [0.         0.         1.         0.         0.         0.
  0.         0.        ]] 

          0         1         2         3         4         5         6         7
0  0.666667  0.333333  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
1  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000  0.000000  0.000000
2  0.000000  0.000000  0.376812  0.000000  0.318841  0.014493  0.086957  0.202899
3  0.000000  0.000000  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000
4  0.000000  0.000000  0.476190  0.142857  0.285714  0.071429  0.023810  0.000000
5  0.111111  0.000000  0.833333  0.000000  0.055556  0.000000  0.000000  0.000000
6  0.000000  0.000000  0.625000  0.000000  0.250000  0.000000  0.125000  0.000000
7  0.000000  0.000000  1.000000  0.000000  0.000000  0.000000  0.000000  0.000000
平稳分布π:
 {xK=: 0.0963640202016921, xI: 0.0551393912117535, xH: 0.0277274488287983, xF: 0.291820263401484, xE: 0.0416886090573549, xD: 0.474936956708340, xC: 0.00308082764764426, xB: 0.00924248294293277}
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值