马尔可夫链预测模型的应用——以个人图书借阅为例

本文通过个人图书借阅数据,运用马尔可夫链进行预测分析。首先,读取并转化图书借阅数据,接着构建步长为1的频率矩阵和转移概率矩阵。然后,进行马氏性检验,结果显示该序列不满足马尔可夫性质,不能直接用作马尔可夫链。尽管如此,文章仍继续计算了不同阶的转移概率矩阵和相关系数,但由于不具备平稳性,无法得出平稳分布。
摘要由CSDN通过智能技术生成

读取个人图书借阅数据

在这里插入图片描述

##图书类别:A马克思主义、列宁主义、毛泽东思想、邓小平理论;B哲学、宗教;C 社会科学总论;D 政治、法律;E 军事;F 经济;G 文化、科学、教育、体育;
##H 语言、文字;I 文学;J 艺术;K 历史、地理;N 自然科学总论;O 数理科学和化学;P 天文学、地球科学;Q 生物科学;R 医药、卫生;S 农业科学;
##T 工业技术;U 交通运输;V 航空、航天;X 环境科学、安全科学;Z 综合性图书。
import pandas as pd
# 显示Dateframe所有行
pd.set_option('display.max_rows',None)
#显示所有列
pd.set_option('display.max_columns',None)
#图书类别
Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# 读取数据
Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]

print(Person_data)

结果:

     LOAN_DATE ITEM_CALLNO   TIMESTAMP
0     2013/1/1           H  1356969600
1     2013/1/1           H  1356969600
2     2013/1/1           H  1356969600
3     2013/1/1           H  1356969600
4     2013/1/1           H  1356969600
5    2013/2/20           D  1361289600
6    2013/4/19           F  1366300800
7    2013/4/19           F  1366300800
8    2013/4/19           F  1366300800
9    2013/4/24           D  1366732800
10   2013/4/24           E  1366732800
11   2013/4/24           E  1366732800
12   2013/4/24           D  1366732800
13   2013/4/24           D  1366732800
14   2013/4/24           D  1366732800
15   2013/4/24           F  1366732800
16   2013/4/27           F  1366992000
17    2013/5/8           H  1367942400
18   2013/5/15           B  1368547200
19   2013/5/15           B  1368547200
20   2013/5/17           B  1368720000
21    2013/6/6           C  1370448000
22    2013/6/8           D  1370620800
23    2013/6/8           D  1370620800
24   2013/10/9           I  1381248000
25  2013/10/29           F  1382976000
26  2013/10/29           I  1382976000
27   2013/11/7           F  1383753600
28   2013/11/7           D  1383753600
29  2013/11/14           I  1384358400
30  2013/11/14           F  1384358400
31  2013/11/14           F  1384358400
32  2013/11/25           D  1385308800
33   2013/12/2           F  1385913600
34   2013/12/2           F  1385913600
35   2013/12/4           H  1386086400
36   2013/12/6           F  1386259200
37  2013/12/11           F  1386691200
38  2013/12/11           F  1386691200
39  2013/12/18           F  1387296000
40  2013/12/18           D  1387296000
41    2014/1/1           F  1388505600
42    2014/1/2           D  1388592000
43   2014/2/17           D  1392566400
44   2014/2/17           D  1392566400
45   2014/3/13           K  1394640000
46   2014/3/13           K  1394640000
47   2014/4/15           D  1397491200
48   2014/4/29           I  1398700800
49   2014/4/29           I  1398700800
50   2014/4/29           I  1398700800
51    2014/5/5           D  1399219200
52   2014/9/23           H  1411401600
53  2014/10/10           D  1412870400
54  2014/10/10           D  1412870400
55  2014/10/10           D  1412870400
56  2014/10/10           D  1412870400
57  2014/10/10           D  1412870400
58  2014/10/10           D  1412870400
59  2014/10/10           D  1412870400
60  2014/10/10           D  1412870400
61  2014/10/10           D  1412870400
62  2014/10/10           D  1412870400
63   2014/12/2           F  1417449600
64   2014/12/2           K  1417449600
65   2014/12/2           D  1417449600
66   2014/12/4           D  1417622400
67   2014/12/4           D  1417622400
68   2015/1/14           D  1421164800
69   2015/2/28           D  1425052800
70   2015/2/28           D  1425052800
71    2015/3/5           D  1425484800
72    2015/3/5           D  1425484800
73    2015/3/5           D  1425484800
74    2015/3/5           D  1425484800
75   2015/3/11           D  1426003200
76   2015/3/11           D  1426003200
77   2015/3/11           D  1426003200
78   2015/3/12           D  1426089600
79   2015/3/12           D  1426089600
80   2015/3/12           D  1426089600
81   2015/3/12           D  1426089600
82   2015/3/12           D  1426089600
83   2015/3/19           J  1426694400
84   2015/3/19           D  1426694400

将两列数据转化为字典

LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

t_n = len(time_number)
N = len(ITEM_CALLNO)    #样本数量
Book_data_dic = {
   }
i ,j = 0, 0
list = []
for I in range(t_n+N-1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
print(Book_data_dic)

结果:

{
   1: ['H', 'H', 'H', 'H', 'H'], 2: ['D'], 3: ['F', 'F', 'F'], 4: ['D', 'E', 'E', 'D', 'D', 'D', 'F'], 5: ['F'], 6: ['H'], 7: ['B', 'B'], 8: ['B'], 9: ['C'], 10: ['D', 'D'], 11: ['I'], 12: ['F', 'I'], 13: ['F', 'D'], 14: ['I', 'F', 'F'], 15: ['D'], 16: ['F', 'F'], 17: ['H'], 18: ['F'], 19: ['F', 'F'], 20: ['F', 'D'], 21: ['F'], 22: ['D'], 23: ['D', 'D'], 24: ['K', 'K'], 25: ['D'], 26: ['I', 'I', 'I'], 27: ['D'], 28: ['H'], 29: ['D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D'], 30: ['F', 'K', 'D'], 31: ['D', 'D'], 32: ['D'], 33: ['D', 'D'], 34: ['D', 'D', 'D', 'D'], 35: ['D', 'D', 'D'], 36: ['D', 'D', 'D', 'D', 'D'], 37: ['J', 'D']}

步长为1的频率矩阵与转移概率矩阵

import pandas as pd
import numpy as np
from sympy import *
# 横向最多显示多少个字符, 一般80不适合横向的屏幕,平时多用200
pd.set_option('display.width', 500)
# 显示所有列
pd.set_option('display.max_columns', None)

#图书类别
Book_category = ['A','B','C','D','E','F','G','H','I','J','K','N','O','Q','R','S','T','U','V','X','Z']
# # 读取数据
# Person_data = pd.read_excel(r'Person_8748847336.xlsx').iloc[0: , 1:]
LOAN_DATE = ['2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/1/1', '2013/2/20', '2013/4/19', '2013/4/19', '2013/4/19', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/6/8', '2013/10/9', '2013/10/29', '2013/10/29', '2013/11/7', '2013/11/7', '2013/11/14', '2013/11/14', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/11', '2013/12/18', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/2/17', '2014/3/13', '2014/3/13', '2014/4/15', '2014/4/29', '2014/4/29', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/10/10', '2014/12/2', '2014/12/2', '2014/12/2', '2014/12/4', '2014/12/4', '2015/1/14', '2015/2/28', '2015/2/28', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/5', '2015/3/11', '2015/3/11', '2015/3/11', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/12', '2015/3/19', '2015/3/19']
time_number = ['2013/1/1', '2013/2/20', '2013/4/19', '2013/4/24', '2013/4/27', '2013/5/8', '2013/5/15', '2013/5/17', '2013/6/6', '2013/6/8', '2013/10/9', '2013/10/29', '2013/11/7', '2013/11/14', '2013/11/25', '2013/12/2', '2013/12/4', '2013/12/6', '2013/12/11', '2013/12/18', '2014/1/1', '2014/1/2', '2014/2/17', '2014/3/13', '2014/4/15', '2014/4/29', '2014/5/5', '2014/9/23', '2014/10/10', '2014/12/2', '2014/12/4', '2015/1/14', '2015/2/28', '2015/3/5', '2015/3/11', '2015/3/12', '2015/3/19']
ITEM_CALLNO = ['H', 'H', 'H', 'H', 'H', 'D', 'F', 'F', 'F', 'D', 'E', 'E', 'D', 'D', 'D', 'F', 'F', 'H', 'B', 'B', 'B', 'C', 'D', 'D', 'I', 'F', 'I', 'F', 'D', 'I', 'F', 'F', 'D', 'F', 'F', 'H', 'F', 'F', 'F', 'F', 'D', 'F', 'D', 'D', 'D', 'K', 'K', 'D', 'I', 'I', 'I', 'D', 'H', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'F', 'K', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'D', 'J', 'D']

N = len(ITEM_CALLNO)    #样本数量
t_n = len(time_number)    #次数
B_N = len(Book_category)
#列表转化为字典
def Book_list_to_dic():
    Book_data_dic = {
   }
    i, j = 0, 0
    list = []
    for I in range(t_n + N - 1):
        if LOAN_DATE[i] == time_number[j]:
            list.append(ITEM_CALLNO[i])
            i += 1
        else:
            j += 1
            list = []
        Book_data_dic[j + 1] = list
    return Book_data_dic

Book_data_dic = Book_list_to_dic()
n_list = [1]    #步长为n,n_list为步长列表[1,2,3,4,5]
for n in n_list:
    print(f'步长为{n}:')
    f_array = np.array(np.zeros((B_N,B_N)))    #频数矩阵
    for I in range(t_n-n):
        for a in Book_data_dic[I+1]:
            i = 0
            for a1 in Book_category:
                if a1 == a:
                    for b in Book_data_dic[I+2]:
                        j = 0
                        for b1 in Book_category:
                            if b1 == b:
                                f_array[i][j] += 1
                            else:
                                j += 1
                else:
                    i += 1
    print(f'步长为{n}的频数矩阵f:\n',f_array)
    n_sum_f = sum(f_array[:, :]).sum()    #总频数
    print('总频数为:',n_sum_f)
    # 矩阵显示太乱,用列表的形式显示出来
    df_f = pd.DataFrame(f_array)
    print(df_f)

    P = np.array(np.zeros((B_N,B_N)))  # 转移概率矩阵
    for i in range(B_N):
        f_sum_i = sum(f_array[i, :])  # 频数矩阵一行的和
        if f_sum_i == 0:
            P[i][i] = 1     #整个行向量为零的,矩阵i行i列修正为1
        else:
            for j in range(B_N):
                P[i][j] = f_array[i][j]/f_sum_i
    print(f'步长为{n}的转移概率矩阵P_{n}:\n', P, '\n')
    # 矩阵显示太乱,用列表的形式显示出来
    df_P = pd.DataFrame(P)
    print(df_P)

结果:

步长为1:
步长为1的频数矩阵f:
 [[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  2.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  2.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0. 58.  0. 22.  0.  1.  6.  5. 14.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  2.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0. 20.  6. 12.  0.  3.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  2.  0. 15.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  5.  0.  2.  0.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  4.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
   0.  0.  0.]]
总频数为: 185.0
      0    1    2     3    4     5    6    7    8    9    10   11   12   13   14   15   16   17   18   19   20
0   0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1   0.0  2.0  1.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
2   0.0  0.0  0.0   2.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
3   0.0  0.0  0.0  58.0  0.0  22.0  0.0  1.0  6.0  5.0  14.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
4   0.0  0.0  0.0   0.0  0.0   2.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5   0.0  0.0  0.0  20.0  6.0  12.0  0.0  3.0  1.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
6   0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
7   0.0  2.0  0.0  15.0  0.0   1.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
8   0.0  0.0  0.0   5.0  0.0   2.0  0.0  0.0  1.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
9   0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
10  0.0  0.0  0.0   4.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
11  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
12  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
13  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
14  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
15  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
16  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
17  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
18  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
19  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
20  0.0  0.0  0.0   0.0  0.0   0.0  0.0  0.0  0.0  0.0   0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
步长为1的转移概率矩阵P_1:
 [[1.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.66666667 0.33333333 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]
 [0.         0.         0.         1.         0.         0.
  0.         0.         0.         0.         
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值