mysql数据库中每次取10条_MySQL 中随机选择10条记录

mysql手册中存在rand()命令,能获取到随机行, 并使用limit 10 只采取其中几行。

SELECT id FROM user ORDER BY RAND() LIMIT 10;

数据量小于1000行的时候,上面的 sql 执行的快。但是当数据大于10000行, 排序的开销就变得很重。上面的操作中,我们在排序完就把几乎所有的行都丢掉了。

只要我们有一个数字主键,我们可以有更好的方式去实现这个功能,不需要对所有数据进行排序。

在上面的例子中, 我们假设 id 从1开始, 并且在1和 id 的最大值之间是连续的。

通过应用程序解决问题

可以在应用程序中计算随机id, 简化整个计算。

SELECT MAX(id) FROM user;

## 在应用程序中生成区间内的随机数:random-id

SELECT name FROM user WHERE id =

由于MAX(id) == COUNT(id),我们只是生成1和 max (id) 之间的随机数, 并将其传递到数据库中检索随机行。

第一个select语句是NO-OP,并一直在被优化。第二个是针对常量的 eq 速度也很快。

通过数据库解决问题

# 生成一个随机ID

> SELECT RAND() * MAX(id) FROM user;

+------------------+

| RAND() * MAX(id) |

+------------------+

| 689.37582507297 |

+------------------+

# 返回值是double,但是我们需要的是 int

> SELECT CEIL(RAND() * MAX(id)) FROM user;

+-------------------------+

| CEIL(RAND() * MAX(id)) |

+-------------------------+

| 1000000 |

+-------------------------+

# 返回值是 int,分析性能

> EXPLAIN

SELECT CEIL(RAND() * MAX(id)) FROM random;

+----+-------------+-------+-------+------+-------------+

| id | select_type | table | type | rows | Extra |

+----+-------------+-------+-------+------+-------------+

| 1 | SIMPLE | random| index |1000000| Using index |

+----+-------------+-------+-------+------+-------------+

## 全表扫描?由于使用 MAX()函数了,导致优化丢失。

> EXPLAIN

SELECT CEIL(RAND() * (SELECT MAX(id) FROM random));

+----+-------------+-------+------+------+------------------------------+

| id | select_type | table | type | rows | Extra |

+----+-------------+-------+------+------+------------------------------+

| 1 | PRIMARY | NULL | NULL | NULL | No tables used |

| 2 | SUBQUERY | NULL | NULL | NULL | Select tables optimized away |

+----+-------------+-------+------+------+------------------------------+

## 子查询可以将性能损失挽回

通过上面的 sql 已经能够生成随机 id, 但如何获得行?

> EXPLAIN

SELECT name

FROM user

WHERE id = (SELECT CEIL(RAND() *

(SELECT MAX(id)

FROM user));

+----+-------------+--------+------+---------------+------+---------+------+---------+------------------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+--------+------+---------------+------+---------+------+---------+------------------------------+

| 1 | PRIMARY | user | ALL | NULL | NULL | NULL | NULL | 1000000 | Using where |

| 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |

+----+-------------+--------+------+---------------+------+---------+------+---------+------------------------------+

> show warnings;

+-------+------+------------------------------------------+

| Level | Code | Message |

+-------+------+------------------------------------------+

| Note | 1249 | Select 2 was reduced during optimization |

+-------+------+------------------------------------------+

上面的方法是最明显的, 但也是最错误的做法。原因是:where子查询中的select为外部select每一行都会执行。具体解释参考:sql语句嵌套查询性能低

要找一种方法,保证random-id只生成一次:

SELECT name

FROM user JOIN

(SELECT CEIL(RAND() *

(SELECT MAX(id)

FROM user)) AS id

) AS r2

USING (id);

+----+-------------+------------+--------+------+------------------------------+

| id | select_type | table | type | rows | Extra |

+----+-------------+------------+--------+------+------------------------------+

| 1 | PRIMARY | | system | 1 | |

| 1 | PRIMARY | user | const | 1 | |

| 2 | DERIVED | NULL | NULL | NULL | No tables used |

| 3 | SUBQUERY | NULL | NULL | NULL | Select tables optimized away |

+----+-------------+------------+--------+------+------------------------------+

内部select生成一个常量临时表, join 只在单行上执行。没有使用排序,没有通过应用程序,查询的大多数部分都被优化了。

非连续数据

删除一些行,构造ID非连续的记录。

SELECT name

FROM random AS r1 JOIN

(SELECT (RAND() *

(SELECT MAX(id)

FROM random)) AS id)

AS r2

WHERE r1.id >= r2.id

ORDER BY r1.id ASC

LIMIT 1;

+----+-------------+------------+--------+------+------------------------------+

| id | select_type | table | type | rows | Extra |

+----+-------------+------------+--------+------+------------------------------+

| 1 | PRIMARY | | system | 1 | |

| 1 | PRIMARY | r1 | range | 689 | Using where |

| 2 | DERIVED | NULL | NULL | NULL | No tables used |

| 3 | SUBQUERY | NULL | NULL | NULL | Select tables optimized away |

+----+-------------+------------+--------+------+------------------------------+

join现在获取所有大于或等于我们随机值的ID,如果不能直接匹配则选择邻居。 但是一旦找到一行,就停止执行(LIMIT 1)。根据索引(ORDER BY id ASC)读取行。 当使用 >= 而不是a = 时,我们可以摆脱CEIL并以更少的工作获得相同的结果。

平等分配

当我们的ID分布不再相等时,我们选择的行也不是真正随机的。

> select * from holes;

+----+----------------------------------+----------+

| id | name | accesses |

+----+----------------------------------+----------+

| 1 | d12b2551c6cb7d7a64e40221569a8571 | 107 |

| 2 | f82ad6f29c9a680d7873d1bef822e3e9 | 50 |

| 4 | 9da1ed7dbbdcc6ec90d6cb139521f14a | 132 |

| 8 | 677a196206d93cdf18c3744905b94f73 | 230 |

| 16 | b7556d8ed40587a33dc5c449ae0345aa | 481 |

+----+----------------------------------+----------+

RAND方法会生成9到15之类的ID,这些ID都会导致id 16被选为下一个更高的数字。

这个问题没有真正的解决方案,但是由于你的数据大多是不变的,你可以添加一个映射表,将行号映射到id:

> create table holes_map ( row_id int not NULL primary key, random_id int not null);

> SET @id = 0;

> INSERT INTO holes_map SELECT @id := @id + 1, id FROM holes;

> select * from holes_map;

+--------+-----------+

| row_id | random_id |

+--------+-----------+

| 1 | 1 |

| 2 | 2 |

| 3 | 4 |

| 4 | 8 |

| 5 | 16 |

+--------+-----------+

row_id现在再次是连续,我们可以再次运行随机查询

SELECT name FROM holes

JOIN (SELECT r1.random_id

FROM holes_map AS r1

JOIN (SELECT (RAND() *

(SELECT MAX(row_id)

FROM holes_map)) AS row_id)

AS r2

WHERE r1.row_id >= r2.row_id

ORDER BY r1.row_id ASC

LIMIT 1) as rows ON (id = random_id);

1000次提取后,我们再次看到平均分布:

> select * from holes;

+----+----------------------------------+----------+

| id | name | accesses |

+----+----------------------------------+----------+

| 1 | d12b2551c6cb7d7a64e40221569a8571 | 222 |

| 2 | f82ad6f29c9a680d7873d1bef822e3e9 | 187 |

| 4 | 9da1ed7dbbdcc6ec90d6cb139521f14a | 195 |

| 8 | 677a196206d93cdf18c3744905b94f73 | 207 |

| 16 | b7556d8ed40587a33dc5c449ae0345aa | 189 |

+----+----------------------------------+----------+

维护连续的表

DROP TABLE IF EXISTS r2;

CREATE TABLE r2 (

id SERIAL,

name VARCHAR(32) NOT NULL UNIQUE

);

DROP TABLE IF EXISTS r2_equi_dist;

CREATE TABLE r2_equi_dist (

id SERIAL,

r2_id bigint unsigned NOT NULL UNIQUE

);

当我们在r2中更改某些内容时,我们希望r2_equi_dist也会更新。

DELIMITER $$

DROP TRIGGER IF EXISTS tai_r2$$

CREATE TRIGGER tai_r2

AFTER INSERT ON r2 FOR EACH ROW

BEGIN

DECLARE m BIGINT UNSIGNED DEFAULT 1;

SELECT MAX(id) + 1 FROM r2_equi_dist INTO m;

SELECT IFNULL(m, 1) INTO m;

INSERT INTO r2_equi_dist (id, r2_id) VALUES (m, NEW.id);

END$$

DELIMITER ;

DELETE FROM r2;

INSERT INTO r2 VALUES ( NULL, MD5(RAND()) );

INSERT INTO r2 VALUES ( NULL, MD5(RAND()) );

INSERT INTO r2 VALUES ( NULL, MD5(RAND()) );

INSERT INTO r2 VALUES ( NULL, MD5(RAND()) );

SELECT * FROM r2;

+----+----------------------------------+

| id | name |

+----+----------------------------------+

| 1 | 8b4cf277a3343cdefbe19aa4dabc40e1 |

| 2 | a09a3959d68187ce48f4fe7e388926a9 |

| 3 | 4e1897cd6d326f8079108292376fa7d5 |

| 4 | 29a5e3ed838db497aa330878920ec01b |

+----+----------------------------------+

SELECT * FROM r2_equi_dist;

+----+-------+

| id | r2_id |

+----+-------+

| 1 | 1 |

| 2 | 2 |

| 3 | 3 |

| 4 | 4 |

+----+-------+

INSERT非常简单,DELETE操作我们必须更新equi-dist-id以保持id的连续设置:

DELIMITER $$

DROP TRIGGER IF EXISTS tad_r2$$

CREATE TRIGGER tad_r2

AFTER DELETE ON r2 FOR EACH ROW

BEGIN

DELETE FROM r2_equi_dist WHERE r2_id = OLD.id;

UPDATE r2_equi_dist SET id = id - 1 WHERE r2_id > OLD.id;

END$$

DELIMITER ;

DELETE FROM r2 WHERE id = 2;

SELECT * FROM r2;

+----+----------------------------------+

| id | name |

+----+----------------------------------+

| 1 | 8b4cf277a3343cdefbe19aa4dabc40e1 |

| 3 | 4e1897cd6d326f8079108292376fa7d5 |

| 4 | 29a5e3ed838db497aa330878920ec01b |

+----+----------------------------------+

SELECT * FROM r2_equi_dist;

+----+-------+

| id | r2_id |

+----+-------+

| 1 | 1 |

| 2 | 3 |

| 3 | 4 |

+----+-------+

update操作需要维护外键约束:

DELIMITER $$

DROP TRIGGER IF EXISTS tau_r2$$

CREATE TRIGGER tau_r2

AFTER UPDATE ON r2 FOR EACH ROW

BEGIN

UPDATE r2_equi_dist SET r2_id = NEW.id WHERE r2_id = OLD.id;

END$$

DELIMITER ;

UPDATE r2 SET id = 25 WHERE id = 4;

SELECT * FROM r2;

+----+----------------------------------+

| id | name |

+----+----------------------------------+

| 1 | 8b4cf277a3343cdefbe19aa4dabc40e1 |

| 3 | 4e1897cd6d326f8079108292376fa7d5 |

| 25 | 29a5e3ed838db497aa330878920ec01b |

+----+----------------------------------+

SELECT * FROM r2_equi_dist;

+----+-------+

| id | r2_id |

+----+-------+

| 1 | 1 |

| 2 | 3 |

| 3 | 25 |

+----+-------+

一次多行

如果要返回多行,您可以:

多次执行查询

编写执行查询的存储过程并将结果存储在临时表中

存储过程

存储过程为你了程序语言结构:

循环

控制结构

程序

...

对于此任务,我们只需要一个循环:

ELIMITER $$

DROP PROCEDURE IF EXISTS get_rands$$

CREATE PROCEDURE get_rands(IN cnt INT)

BEGIN

DROP TEMPORARY TABLE IF EXISTS rands;

CREATE TEMPORARY TABLE rands ( rand_id INT );

loop_me: LOOP

IF cnt < 1 THEN

LEAVE loop_me;

END IF;

INSERT INTO rands

SELECT r1.id

FROM random AS r1 JOIN

(SELECT (RAND() *

(SELECT MAX(id)

FROM random)) AS id)

AS r2

WHERE r1.id >= r2.id

ORDER BY r1.id ASC

LIMIT 1;

SET cnt = cnt - 1;

END LOOP loop_me;

END$$

DELIMITER ;

CALL get_rands(4);

SELECT * FROM rands;

+---------+

| rand_id |

+---------+

| 133716 |

| 702643 |

| 112066 |

| 452400 |

+---------+

性能

我们有3个不同的查询来解决我们的问题:

Q1. ORDER BY RAND()

Q2. RAND() * MAX(ID)

Q3. RAND() * MAX(ID) + ORDER BY ID

Q1预计成本为N * log2(N),Q2和Q3几乎恒定。

我们用N行(一千到一百万)填充表格并执行每次查询1000次。

100 1.000 10.000 100.000 1.000.000

Q1 0:00.718s 0:02.092s 0:18.684s 2:59.081s 58:20.000s

Q2 0:00.519s 0:00.607s 0:00.614s 0:00.628s 0:00.637s

Q3 0:00.570s 0:00.607s 0:00.614s 0:00.628s 0:00.637s

正如您所看到的那样,简单的ORDER BY RAND()已经落后于表中仅100 行的优化查询。

参考

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值