mysql nosql引擎_MySQL和NoSQL：帮助我选择合适的一个

最新推荐文章于 2024-01-28 14:31:07 发布

weixin_39622891

最新推荐文章于 2024-01-28 14:31:07 发布

阅读量103

点赞数

文章标签： mysql nosql引擎

本文链接：https://blog.csdn.net/weixin_39622891/article/details/113593057

版权

小编典典

您应该阅读以下内容，并学习一些有关精心设计的innodb表的优点以及如何最好地使用聚集索引的信息-只有innodb才有！

然后根据以下简化示例设计系统：

模式示例(简化)

重要的功能是表使用innodb引擎，并且线程表的主键不再是单个auto_incrementing键，而是基于forum_id和thread_id组合的复合

集群键。例如

threads - primary key (forum_id, thread_id)

forum_id thread_id

======== =========

1 1

1 2

1 3

1 ...

1 2058300

2 1

2 2

2 3

2 ...

2 2352141

...

每个论坛行都包含一个称为next_thread_id(无符号int)的计数器，该计数器由触发器维护，并在每次将线程添加到给定论坛时递增。这也意味着如果对thread_id使用单个auto_increment主键，则每个论坛可以存储40亿个线程，而不是总共40亿个线程。

forum_id title next_thread_id

======== ===== ==============

1 forum 1 2058300

2 forum 2 2352141

3 forum 3 2482805

4 forum 4 3740957

...

64 forum 64 3243097

65 forum 65 15000000 -- ooh a big one

66 forum 66 5038900

67 forum 67 4449764

...

247 forum 247 0 -- still loading data for half the forums !

248 forum 248 0

249 forum 249 0

250 forum 250 0

使用组合键的缺点是您不能再按以下单个键值选择线程：

select * from threads where thread_id = y;

你所要做的：

select * from threads where forum_id = x and thread_id = y;

但是，您的应用程序代码应知道用户正在浏览哪个论坛，因此实施起来并非难事-将当前查看的forum_id存储在会话变量或隐藏的表单字段等中。

这是简化的架构：

drop table if exists forums;

create table forums

(

forum_id smallint unsigned not null auto_increment primary key,

title varchar(255) unique not null,

next_thread_id int unsigned not null default 0 -- count of threads in each forum

)engine=innodb;

drop table if exists threads;

create table threads

(

forum_id smallint unsigned not null,

thread_id int unsigned not null default 0,

reply_count int unsigned not null default 0,

hash char(32) not null,

created_date datetime not null,

primary key (forum_id, thread_id, reply_count) -- composite clustered index

)engine=innodb;

delimiter #

create trigger threads_before_ins_trig before insert on threads

for each row

begin

declare v_id int unsigned default 0;

select next_thread_id + 1 into v_id from forums where forum_id = new.forum_id;

set new.thread_id = v_id;

update forums set next_thread_id = v_id where forum_id = new.forum_id;

end#

delimiter ;

您可能已经注意到，我已经将reply_count包含在主键中，这有点奇怪，因为(forum_id，thread_id)复合物本身是唯一的。这只是一个索引优化，在执行使用reply_count的查询时可以节省一些I

/ O。请参阅上面的2个链接，以获取有关此信息的更多信息。

查询示例

我仍在将数据加载到示例表中，到目前为止，我已加载了大约。5亿行(是系统的一半)。加载过程完成后，我应该期望大约：

250 forums * 5 million threads = 1250 000 000 (1.2 billion rows)

我故意使某些论坛包含超过500万个线程，例如，论坛65具有1500万个线程：

forum_id title next_thread_id

======== ===== ==============

65 forum 65 15000000 -- ooh a big one

查询运行时

select sum(next_thread_id) from forums;

sum(next_thread_id)

===================

539,155,433 (500 million threads so far and still growing...)

在innodb下，对next_thread_ids求和以得出总线程数要比通常快得多：

select count(*) from threads;

论坛65有多少个线程：

select next_thread_id from forums where forum_id = 65

next_thread_id

==============

15,000,000 (15 million)

再次，这比平常更快：

select count(*) from threads where forum_id = 65

好的，现在我们知道到目前为止，我们大约有5亿个线程，而论坛65上有1500万个线程-让我们看看模式是如何执行的:)

select forum_id, thread_id from threads where forum_id = 65 and reply_count > 64 order by thread_id desc limit 32;

runtime = 0.022 secs

select forum_id, thread_id from threads where forum_id = 65 and reply_count > 1 order by thread_id desc limit 10000, 100;

runtime = 0.027 secs

在我看来，性能非常好-因此，这是一个具有500+百万行(并且正在不断增长)的单表，其查询在0.02秒内(当处于负载状态下)覆盖了1500万行！

进一步优化

其中包括：

按范围划分

分片

投入金钱和硬件

等等…

希望这个答案对您有帮助:)

2020-05-17