近期有一个关于排行榜的需求,有以下几点需求:
一、排行榜只显示前500名
二、同一用户发布多个文章,取点赞量最高的那个文章
三、点赞数相同,先达到这个点赞数的前在前
建表语句及表设计:
CREATE TABLE `t_article_thumbup_cnt` (
`user_id` bigint(20) NOT NULL,
`article_id` varchar(128) NOT NULL,
`thumbup_cnt` bigint(20) NOT NULL,
`topic_id` varchar(128) NOT NULL,
`update_time` datetime DEFAULT NULL,
`status` int(1) NOT NULL,
PRIMARY KEY (`article_id`,`topic_id`),
KEY `index_user_id` (`user_id`) USING BTREE,
KEY `index_thumbup_cnt` (`thumbup_cnt`) USING BTREE,
KEY `index_status` (`status`) USING BTREE,
KEY `index_update_time` (`update_time`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
排行榜原SQL
SELECT user_id as userId, article_id as articleId, thumbup_cnt as thumbupCnt, topic_id as topicId, update_time as updateTime
FROM t_article_thumbup_cnt
WHERE topic_id=#{topicId} AND article_id IN
(
SELECT article_id FROM t_article_thumbup_cnt,
(
SELECT user_id as max_user_id, MAX(thumbup_cnt) as max_thumbup_cnt
FROM t_article_thumbup_cnt
WHERE (status = 1 OR status = 5) AND topic_id=#{topicId} GROUP BY user_id
)tmp
WHERE thumbup_cnt=tmp.max_thumbup_cnt AND user_id=tmp.max_user_id AND topic_id=#{topicId} AND (status = 1 OR status = 5) GROUP BY user_id
)
ORDER BY thumbup_cnt DESC, update_time ASC LIMIT 0, 500
排行榜优化后SQL
SELECT user_id as userId, article_id as articleId, thumbup_cnt as thumbupCnt, topic_id as topicId, update_time as updateTime
FROM t_article_thumbup_cnt a INNER JOIN
(
SELECT article_id as id FROM t_article_thumbup_cnt,
(
SELECT user_id as max_user_id, MAX(thumbup_cnt) as max_thumbup_cnt
FROM t_article_thumbup_cnt
WHERE (status = 1 OR status = 5) AND topic_id=#{topicId} GROUP BYuser_id
)tmp
WHERE thumbup_cnt=tmp.max_thumbup_cnt AND user_id=tmp.max_user_id AND topic_id=#{topicId} AND (status = 1 OR status = 5) GROUP BY user_id
) b ON a.article_id=b.id
WHERE topic_id=#{topicId} ORDER BY thumbup_cnt DESC, update_time ASC LIMIT 0, 500
性能分析:
表有一千条左右的数据,基于这一千条数据进行分析。
原SQL未加索引,执行时间50秒;
原SQL添加索引,执行时间7秒;
添加索引优化后SQL(IN优化为INNER JOIN ON),执行时间0.027秒。
结果分析:
一、从性能分析可以很明确地看出来对参与检索的字段加索引性能有8倍左右的提升。
二、IN较为容易理解,但是运行效果不理想,特别对于IN中有几十条几百条的情况。对于对执行效率要求不高或者IN中只有几条的情况可以使用IN,若IN中条数很多,建议改为内连的方式,执行效果会有量级的提升。
综上:不考虑硬件的情况下,参与检索的字段,请务必添加索引;请忘掉IN这个关键字吧,请使用OR或者JOIN替代。
另外,做排行榜的同学可以研究一下这个SQL,我就不深入分析了。