假设一个网站有8个网页。我们只关注计算最经常访问的3有序个网页。给出一个日志文件,上面有几行特定的时间信息。每行有如下的信息:时间,用户ID,访问的网页。设计一个理想的算法找到找到最经常访问的3个有序网页。
Assume there's a website with 8 pages. We are interested in calculating the most frequently visited page sequences of size 3( e.g 1->5->2 ).We are given a log file that has several rows for a particular time period. Each row has following info : time, UserID, page visited
Suggest an optimal algorithm to find the most frequest visited page sequence of size 3.
每个有序数列都是唯一的。假设按照时间的先后顺序有序数列为:1->3->6->3->5->4
有序数列可被分为:136,363,635,354,........计算出现次数最多的序列就是所求。
lets number every sequence like 1->3->8 = 138, so every sequence will be unique
Now let say the sequence with time is 1->3->6->3->5->4....
So the sequence combinations can be 136,363,635,354,........
Now counting the sequence which occurs most can be the answer
you can use Hashmap to save frequency of page sequence with this idea.That gives o(1) for lookup