mysql 复制db_以最简单的形式了解mysql db复制

最新推荐文章于 2021-02-08 18:52:48 发布

weixin_26729283

最新推荐文章于 2021-02-08 18:52:48 发布

阅读量389

点赞数

文章标签： mysql java 数据库 python 机器学习

原文链接：https://medium.com/@omify/understanding-mysql-db-replication-in-its-simplest-form-37df899ca166

版权

mysql 复制db

This is one of my favourite topic and as you know everyone love Database (as far as I know) as well as everyone hate databases (Sadly, when it comes to single point of failure). We at Goibibo Tech use MySQL to its core and we do have Master Slave database cluster in our system. Oh wait..

这是我最喜欢的主题之一，众所周知，每个人都喜欢数据库(据我所知)，每个人都讨厌数据库(可悲的是，当涉及到单点故障时)。 Goibibo Tech的核心使用MySQL，我们的系统中确实有Master Slave数据库集群。等一下..

Note — There is an assumption made in the blog that readers are aware of MySQL and AWS Infrastructure in its basic form.

注–博客中假设读者了解基本形式MySQL和AWS基础设施。

Before jumping into DB Replication lets understand what is Cluster, HA and HA Cluster in simple terms —

在进入数据库复制之前，我们先简单地了解一下集群，HA和HA集群-

Cluster is a provision where you can replicate the same instance type (be it AWS EC2 machines, Elastic Cache or RDS) to support Horizontal Scaling

群集是一项条款，您可以在其中复制相同的实例类型(无论是AWS EC2计算机，Elastic Cache还是RDS)以支持水平扩展

HA or High Availability is the provision where you have Master-Slave combination to segregate the READ & WRITE traffic to support high availability of the database system.

HA或高可用性是您具有主从组合的功能，可将读写流量分开以支持数据库系统的高可用性。

HA Cluster as the name suggest itself, its a provision to have Master-Slave combination with flexibility to horizontally scale for high availability.

顾名思义， HA群集本身就是一项主从服务器组合，可灵活地水平扩展以实现高可用性。

Below is an example of Redis DB, Redis Cluster, Redis HA Cluster -

以下是Redis DB，Redis群集，Redis HA群集的示例-

Image for post — High Availability Redis DB System

Thus we also have Master-Slave HA Cluster to support the numerous transactional and non-transactional queries on our MySQL DB.

因此，我们也具有主从HA群集，以支持MySQL数据库上的大量事务和非事务查询。

But is this cluster sufficient enough to handle the load coming from Application Layer? Of course NOT thus we should have a DB Load Balancer or in AWS term let’s say an ELB (Elastic Load Balancer)

但是，此集群足以应付来自应用程序层的负载吗？当然不是，因此我们不应该拥有一个数据库负载平衡器，或者用AWS术语来说，就是ELB(弹性负载平衡器)

GitHub uses “Ferno” as DB Throttling Layer before their Master-Slave Cluster.

GitHub在其主从集群之前将“ Ferno”用作数据库限制层。

Good, so our set up is done and we look good to handle the DB connections, horizontally scale the DB instances on real time basis and distribute the centralised load of course. But hold on isn’t this blog is about Master Slave Replication and how does it really works? Oh yeah about that..

很好，因此我们的设置已完成，并且看起来很不错，可以处理数据库连接，实时地水平扩展数据库实例，当然还可以分散集中的负载。但是请保留，这不是有关主从复制的博客，它实际上如何工作？哦，是的。

数据库复制如何工作？ (How does DB Replication Work?)

Replication relies on three threads per master/slave connection:

复制依赖于每个主/从连接的三个线程：

One is created on the master and Two are created on the slaves.

在主机上创建一个，在从机上创建两个。

The Slave I/O Thread - When you issue START SLAVE on a slave server, the slave creates this thread which connects to the master and requests a copy of the master’s binary log.
从属I / O线程-当您在从属服务器上发布START SLAVE时，从属将创建此线程，该线程连接到主服务器并请求主服务器的二进制日志的副本。
The Binlog Dump Thread - When the slave connects to the master, the master uses this thread to send the slave the contents of its binary log.
Binlog转储线程-当从属服务器连接到主服务器时，主服务器使用此线程向从属服务器发送其二进制日志的内容。
The Slave SQL Thread - The slaves creates this SQL (or applier) thread to read the contents of the retrieved binary log and apply its contents.
从属SQL线程-从属创建此SQL(或applier )线程以读取检索到的二进制日志的内容并应用其内容。

那么什么是数据库复制“滞后”？ (What is DB Replication “Lag” then?)

Replication lag is caused when either the I/O Thread or SQL Thread cannot cope with the demands placed upon it.

当I / O线程或SQL线程无法满足对它的要求时，就会导致复制滞后。

If the I/O Thread is suffering, this means that the network connection between the master and its slaves is slow. You might want to consider enabling the slave_compressed_protocol to compress network traffic or speaking to your network administrator.

如果I / O线程受到影响，则意味着主服务器及其从属服务器之间的网络连接很慢。您可能要考虑启用slave_compressed_protocol来压缩网络流量或与网络管理员交谈。

If it’s the SQL thread then your problem is probably due to poorly-optimized queries that are taking the slave too long to apply. There may be long-running transactions or too much I/O activity. Having no primary key on the slave tables when using the ROW or MIXED replication format is also a common cause of lag on this thread: check that both your master and slave versions of tables have a primary key.

如果是SQL线程，那么您的问题可能是由于查询优化不佳而导致从属服务器花费了太长时间而无法应用。可能有长期运行的事务或过多的I / O活动 。使用ROW或MIXED复制格式时，从属表上没有主键也是导致此线程滞后的常见原因：检查表的主版本和从属版本都具有主键。

Let’s see the Replication flow again NOW you will be able to understand the microscopic view :) (I hope so)

现在让我们再次查看复制流程，您将能够理解微观视图：)(我希望如此)

Curious about bin-log? Don’t worry MySQL provides you under the hood system level queries which you can fire for eg. MySql >SHOW VARIABLES

对bin-log感到好奇吗？不用担心，MySQL为您提供了系统级的查询，例如，您可以对其进行触发。 MySql> SHOW变量

如何解决数据库复制滞后问题？ (How to Troubleshoot DB Replication Lag Issues?)

There are various ways you can approach this issue whenever there is a lag and if you have the admin access you can follow the steps mentioned here -

每当出现滞后时，都有多种方法可以解决此问题，如果您拥有管理员权限，则可以按照此处提到的步骤进行操作-

Simply run this SQL statement in your master as well as slave node that is suspected experiencing a replication lag.

只需在怀疑存在复制滞后的主节点和从节点中运行此SQL语句即可。

master> SHOW MASTER STATUS;
+---------------+-----------+
| File          | Position  |...
+---------------+-----------+
| binlog.000002 | 121752008 |...
+---------------+-----------+slave> SHOW SLAVE STATUS;
********************* 1. row *********************
         Slave_IO_State: Waiting for master to send event
            Master_Host: 127.0.0.1
            Master_User: replication
            Master_Port: 22808
          Connect_Retry: 60
        Master_Log_File: binlog.000002
    Read_Master_Log_Pos: 121409852
         Relay_Log_File: relaylog.000002
          Relay_Log_Pos: 119819329
  Relay_Master_Log_File: binlog.000002
       Slave_IO_Running: Yes
      Slave_SQL_Running: Yes
...
    Exec_Master_Log_Pos: 120003004
        Relay_Log_Space: 121226377

The initial fields that are common to trace for problems are,

跟踪问题的常见初始字段是：

Slave_IO_State — It tells you what the thread is doing. This field will provide you good insights if the replication health is running normally, facing network problems such as reconnecting to a master, or taking too much time to commit data which can indicate disk problems when syncing data to disk.
Slave_IO_State —它告诉您线程在做什么。如果复制运行状况是否正常运行，是否存在网络问题(例如重新连接到主数据库)或花费太多时间来提交数据(这可能表明在将数据同步到磁盘时出现磁盘问题)，此字段将为您提供深刻的见解。
Master_Log_File — Master’s binlog file name where the I/O thread is currently fetch.
Master_Log_File —当前正在获取I / O线程的Master的binlog文件名。
Read_Master_Log_Pos — binlog file position from the master where the replication I/O thread has already read.
Read_Master_Log_Pos —来自复制I / O线程已读取的主服务器上的binlog文件位置。
Relay_Log_File — the relay log filename for which the SQL thread is currently executing the events
Relay_Log_File — SQL线程当前正在为其执行事件的中继日志文件名
Relay_Log_Pos — binlog position from the file specified in Relay_Log_File for which SQL thread has already executed.
Relay_Log_Pos —在Relay_Log_File中指定的文件中的binlog位置，已对其执行SQL线程。
Relay_Master_Log_File — The master’s binlog file that the SQL thread has already executed and is a congruent to Read_Master_Log_Pos value.
Relay_Master_Log_File — SQL线程已执行且与Read_Master_Log_Pos值完全相同的主设备的binlog文件。
Seconds_Behind_Master — this field shows an approximation for difference between the current timestamp on the slave against the timestamp on the master for the event currently being processed on the slave. However, this field might not be able to tell you the exact lag if the network is slow because the difference in seconds are taken between the slave SQL thread and the slave I/O thread. So there can be cases that it can be caught up with slow-reading slave I/O thread.
Seconds_Behind_Master-该字段显示从站上当前时间戳与当前在从站上正在处理的事件的主站时间戳之间的差异的近似值。但是，如果网络速度较慢，此字段可能无法告诉您确切的滞后时间，因为从SQL线程和从I / O线程之间的时间差为秒。因此，在某些情况下，它可能会被读取速度较慢的从属I / O线程所追赶。
Slave_SQL_Running_State — state of the SQL thread and the value is identical to the state value displayed in SHOW PROCESSLIST.
Slave_SQL_Running_State — SQL线程的状态，该值与SHOW PROCESSLIST中显示的状态值相同。

In some cases, SHOW SLAVE STATUS is not enough to tell us the culprit. It’s possible that the replicated statements are affected by internal processes running in the MySQL database slave. Running the statements

在某些情况下，SHOW SLAVE STATUS不足以告诉我们罪魁祸首。复制的语句可能会受到MySQL数据库从站中运行的内部进程的影响。运行语句

SHOW [FULL] PROCESSLIST and SHOW ENGINE INNODB STATUS also provides informative data that gives you insights about the source of the problem.

SHOW [FULL] PROCESSLIST和SHOW ENGINE INNODB STATUS还提供了有用的数据，使您可以深入了解问题的根源。

A simple POC will have much more value than theoretical knowledge so please check the same in Staging DB (Master Slave Replica) Get Dirty!!

一个简单的POC将比理论知识具有更多的价值，因此请在登台DB(主从副本)中获取同样的信息！

Hope this article will be able to clarify some of the common doubts and myths about Master Slave Replication and of course the famous “Lag”. Cheers.

希望本文能够阐明一些有关主从复制以及当然著名的“滞后”的常见疑问和神话。干杯。