mysql集群的最佳实践 Best Practices For MySQL Cluster (文档 ID 1926680.1)

本文来自oracle官方support


 

In this Document

Purpose
 Scope
 Details
 Architecture and Hardware
 General Performance Considerations
 Configuration of mysqld Nodes
 Binary Logging
 Indexes
 BLOB/TEXT Columns
 Disk Data
 Access Methods
 Monitoring
 Backups
 References

APPLIES TO:

MySQL Cluster - Version 7.1 and later
Information in this document applies to any platform.

PURPOSE

Provide an overview of the best practices for MySQL Cluster.

SCOPE

Detailed general guidance for DBAs already familiar with MySQL Cluster seeking guidance for considerations how to use MySQL Cluster the best way.

DETAILS

MySQL Cluster is a real-time transactional mostly in-memory storage engine. These three characteristics are the main components shaping how MySQL Cluster is best used.

In general the many of the best practice advices that apply to other database systems such as InnoDB also apply to MySQL Cluster, but the real-time design of MySQL Cluster will in general magnifies the advices:

  • keep transactions small - in Cluster all the transaction related objects are stored in memory, so large transactions directly reflects the amount of memory the data nodes require
  • do not leave transactions uncommitted
  • use indexes to reduce the cost of queries:
    • Primary Lookups are the cheapest - this is a strong point of MySQL Cluster.
    • Unique key lookups are in general cheaper than non-unique indexes.
    • Non-unique indexes lookups and scans are better than table scans.
  • the application should be prepared to handle temporary errors such as lock wait timeout

 

Additionally it applies to NDBCluster that all tables must have a Primary Key; if none are added explicitly a hidden Primary Key is added. The hidden Primary Key cannot be used in connection with BLOB/TEXT columns and will not be the same on the master and slave when replication is set up. As MySQL Cluster only supports ROW based replication, an explicit Primary Key or not-NULL unique index is required for optimal performance. So it is important always to explicitly define a Primary Key.

It is also important to keep in mind that the MySQL Cluster data nodes allocate all memory up front. This is to ensure that delays do not occur if more memory is needed as that would violate the real-time promise. As transactions, operations, tables, etc. also requires memory, allocations must be specified those capacity settings up front as well. This means that capacity planning is important to avoid query failures.

The recommendations in MySQL Performance Tuning Best Practices also holds true for MySQL Cluster.

In the following sections various aspects of MySQL Cluster will be considered in more details.

 

Architecture and Hardware

The following are important considerations for setting up a MySQL Cluster:

  • Make sure nodes using for arbitration are hosted on different physical hardware than the data nodes. Otherwise failure handling may not work as expected. For example if the arbitrator is on the same physical server as half the data nodes, and the server is shut down, it will cause a total cluster outage as the remaining data nodes cannot contact the arbitrator. In this connection it is important to note that only one node (management node in general) is arbitrator at a time and the node being the arbitrator will not change while handling a failure of one or more nodes.
  • Data nodes may be hosted on the same physical server, but you should have the following in mind:
    • Make sure the server has enough memory and CPU cores to avoid swapping and contention.
    • More so than for a single data node per host, consider using ThreadConfig to lock the data node threads to CPU cores to avoid contention between the data nodes. This requires the use of ndbmtd.
  • If CPU locking is configured be aware of:
    • Non-ndbmtd processes also uses CPU, so these should be locked to other CPU cores or excluded from the ones used by ndbtmd.
    • If you have hyperthreading enabled, make sure the two threads locked to the same physical CPU can share the instruction cache in the L2 cache. This can for example be achieved by assigning two LDM threads. Alternatively if you cannot use all CPUs, ensure that the other virtual core is not used.
  • It is preferred to use physical hosts rather than virtual hardware. The reason for this is that it is easier to predict the performance and avoid interference between virtual machines on the same physical host. This is true for all database systems, but in particularly for MySQL Cluster as it aims to promise real-time performance.
  • While MySQL Cluster is primarily an in-memory database, unless you are using diskless mode, a lot of disk activity will still happen. This means you should ensure that you disk I/O bandwidth is high enough to be able to sustain the I/O generated by the data nodes. You should consider spreading the I/O activity across multiple disks to avoid saturation the disks. Activities causing I/O includes:
    • Local Checkpoints (LCP) to make the data persistent
    • the REDO log to store data changes made between LCPs
    • backups
    • Disk data
  • As MySQL Cluster is a distributed system the network can easily become a bottleneck. Make sure you have sufficient bandwidth and consider having direct links between data nodes. A 10Gbit network can quickly become too slow for a high performance cluster. For the same reason it is recommended that the data nodes have their own dedicated network.
  • MySQL Cluster does not encrypt data send between nodes or require login to connect (other than if you connect through mysqld). So make sure that the cluster is appropriately isolated and behind a firewall.

 

General Performance Considerations

The following general performance considerations should be considered when using MySQL Cluster:

  • MySQL Cluster works best with single table Primary Key lookups.
  • While MySQL Cluster 7.2 and later has much improved support for JOINs, due to the distributed nature of the data nodes JOINs are still relatively more expensive than for non-distributed systems.
  • Range scans have relatively poor performance compared to InnoDB and MyISAM.
  • NULLs are not supported for unique hash index lookups (see also below).
  • In MySQL Cluster 7.2 and later use ANALYZE TABLE to calculate index statistics. Note these are not persistent.
  • The Records in range statistics are not as reliable as for e.g. InnoDB. Use index hints to change the execution plan if a non-optimal index has been chosen.
  • Both LCPs and backups support compression. This can be used to reduce I/O at the cost of increased CPU usage.
  • Consider denormalising the schema to reduce the number of JOINs.
  • mysqld nodes and the NDB API are partition aware, so by choosing a column for partitioning that allows partition pruning, you can reduce the number of data nodes used for a query and thus the amount of messaged required.
  • If you have tables that does not need to retain their data when the cluster is restarted, create the table with the ndb_table_no_logging option enabled to avoid that LCPs will include it. For example:
    mysql> SET SESSION ndb_table_no_logging = ON;
    Query OK, 0 rows affected (0.58 sec)
    
    mysql> CREATE TABLE t_nologgin (
        ->    id int unsigned NOT NULL auto_increment PRIMARY KEY,
        ->    val varchar(10) NOT NULL DEFAULT ''
        -> ) ENGINE=NDBCluster;
    Query OK, 0 rows affected (1.07 sec)
    
    mysql> SET SESSION ndb_table_no_logging = OFF;
    Query OK, 0 rows affected (0.00 sec)
  • Use batching: for example instead of five INSERT statements each inserting one row, use one INSERTstatement inserting all five rows. Though bear in mind that relatively small transactions are preferred.
  • Keep transactions small. As a rule of thumb try not to change more that 1000-10000 rows per transaction.
  • Consider enabling transaction_allow_batching for transactions supporting it. For example:
    mysql> SET SESSION ndb_table_no_logging = OFF;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> SET SESSION autocommit = OFF;
    Query OK, 0 rows affected (0.01 sec)
    
    mysql> SET SESSION transaction_allow_batching = ON;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> START TRANSACTION;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> INSERT INTO t1 (val) VALUES ('a');
    Query OK, 1 row affected, 1 warning (0.01 sec)
    
    mysql> INSERT INTO t1 (val) VALUES ('b');
    Query OK, 1 row affected, 1 warning (0.01 sec)
    
    mysql> INSERT INTO t1 (val) VALUES ('c');
    Query OK, 1 row affected, 1 warning (0.00 sec)
    
    mysql> INSERT INTO t1 (val) VALUES ('d');
    Query OK, 1 row affected, 1 warning (0.00 sec)
    
    mysql> UPDATE t1 SET val = 'a1' WHERE id = 1;
    Query OK, 0 rows affected (0.04 sec)
    Rows matched: 0  Changed: 0  Warnings: 0
    
    mysql> COMMIT;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> SET SESSION autocommit = ON;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> SET SESSION transaction_allow_batching = OFF;
    Query OK, 0 rows affected (0.00 sec)
    Note: Be sure to read the documentation for transaction_allow_batching before using it.
  • Do not enable the Query Cache. It is very expensive to invalidate it across all the mysqld nodes. A write on either server will have to invalidate the caches on all other mysqld nodes.
  • If you need to generate large reports (OLAP) consider creating an InnoDB slave of the Cluster.

 

Configuration of mysqld Nodes

If you are using mysqld nodes to execute queries, you can consider the following options for a general performance improvement:

  • ndb_autoincrement_prefetch_sz - can be used to improve auto-increment performance. The higher, the less auto-increment overhead but also the more gaps and interleaving of auto-increment values.
  • ndb-cluster-connection-pool - this specified the number of connections the mysqld node makes to the data nodes. The more connections, the better the mysqld node can support concurrent queries on NDBCluster tables. As a rule of thumb, set this to the number of CPU cores available for the mysqld node.

    Note: each connection requires an mysqld/API node id. So you need to make sure the cluster configuration file has enough mysqld/API nodes configured for the  ndb-cluster-connection-poolsetting chosen.

While it can seem like increasing the option ndb-batch-size from its default value will improve performance, it is recommended to leave it at its default value, as large values can easily cause temporary overloads on the data nodes causing queries to fail.

 

Binary Logging

Because updates can happen in multiple API nodes concurrently, mysqld nodes that have the binary log enabled, will not only have to write the binary log for a single node. This increases the load on the binlogging mysqld nodes compared to non-Cluster MySQL instances.

For this reason, it is recommended to dedicate mysqld nodes to handle the binary logging and not use those to execute queries. It is also recommended to ensure that the binary logs are not written to the same disk system as is used for other uses.

That each binlogging mysqld node logs for all API nodes also means that restarting the mysqld node will cause gaps in the binary log if any other node makes any data changes while mysqld is offline. For that reason it is recommended to have two binlogging mysqld nodes, so there can be at least one online at all times. See also: Using Two Replication Channels for MySQL Cluster Replication.

 

Indexes

When adding indexes to NDBCluster tables in addition to consider which indexes can aid your queries, you should also take into consideration that MySQL Cluster supports two types of indexes:

  • Hash indexes - these are required for unique indexes (including the Primary Key) and are stored in IndexMemory. Hash indexes are used for equality comparisons (e.g. WHERE val = 1) as an index lookup using a hash is cheap compared to finding a row using an ordered index.

    Note that NDBCluster implements unique indexes by storing the index values as the Primary Key in a separate table. This has the following consequences:
    • NULL values for unique indexes can't be stored in the unique index tables, so index access with NULL results in a full ordered index or table scan.
    • Using a unique key for an index lookup involves using the index table as well. This makes a Primary Key lookup cheaper than the corresponding unique key lookup.
  • Ordered indexes - this is for non-unique indexes and is optional for unique indexes (added by default). Ordered indexes are particular useful for range conditions and inequality comparisons.

That the indexes are stored in their own tables can also be seen from the following example:

mysql> CREATE TABLE t1 (
    ->    id int unsigned NOT NULL,
    ->    val int NOT NULL,
    ->    PRIMARY KEY (id),
    ->    UNIQUE KEY (val)
    -> ) ENGINE=NDBCluster;
Query OK, 0 rows affected (0.21 sec)

shell$ ndb_show_tables
id    type                 state    logging database     schema   name       
...
12    UserTable            Online   Yes     db1          def      t1
13    OrderedIndex         Online   No      sys          def      PRIMARY
14    OrderedIndex         Online   No      sys          def      val
15    UniqueHashIndex      Online   Yes     sys          def      val$unique

Note how the ndb_show_tables output has two table entries for the val index: one for the ordered index (which isn't actually a table), and one for the unique hash index (which is using a table). You can also see that the Primary Key does not have an additional table for the unique hash index. That's not needed as the unique has tables for secondary unique indexes are used to store the index values as the Primary Key.

That the secondary unique indexes creates an additional table also means that when you configureMaxNoOfTables, you should also take the number of unique indexes into account.

The support for two index types is worth having in mind when adding unique indexes. If you only need the unique index for equality comparisons (including the constraint checks), you can save memory (remember all indexes are stored in-memory for NDBCluster tables) by only creating the hash index. You can do that with the USING HASH clause when creating the index, for example:

CREATE TABLE t1 (
   id int unsigned NOT NULL,
   val int NOT NULL,
   PRIMARY KEY (id),
   UNIQUE KEY (val) USING HASH
) ENGINE=NDBCluster;

This will also help reduce the value needed for MaxNoOfOrderedIndexes.

 

BLOB/TEXT Columns

MySQL Cluster implements BLOB/TEXT columns by storing the first 256 bytes (plus an 8 byte pointer) in the main table and the rest of the field in a complementary table with one complementary table per BLOB/TEXT column. These tables can for example be seen from the ndb_desc output:

mysql> CREATE TABLE t2 (
    ->    id int unsigned NOT NULL,
    ->    val TEXT,
    ->    PRIMARY KEY (id)
    -> ) ENGINE=NDBCluster;
Query OK, 0 rows affected (0.16 sec)

mysql> INSERT INTO t2 VALUES (1, REPEAT('a', 4096));
Query OK, 1 row affected (0.06 sec)

shell$ ndb_desc --extra-partition-info --blob-info --database=db1 t2
-- t2 --
...
-- Attributes --
id Unsigned PRIMARY KEY DISTRIBUTION KEY AT=FIXED ST=MEMORY
val Text(256,2000,0;latin1_swedish_ci) NULL AT=MEDIUM_VAR ST=MEMORY BV=2 BT=NDB$BLOB_28_1
-- Indexes --
PRIMARY KEY(id) - UniqueHashIndex
PRIMARY(id) - OrderedIndex
-- Per partition info for t2 --
Partition       Row count       Commit count    Frag fixed memory       Frag varsized memory    Extent_space    Free extent_space
1               1               2               32768                   32768                   0               0

-- Per partition info for NDB$BLOB_28_1 --
Partition       Row count       Commit count    Frag fixed memory       Frag varsized memory    Extent_space    Free extent_space
1               2               2               32768                   32768                   0               0

There are a few points to note about the above output:

  • For the val column note "BT=NDB$BLOB_28_1". This is the name of the complementary BLOB table.
  • From the "Per partition info for t2" it can be seen there is 1 row, but the NDB$BLOB_28_1has two rows. This is because each row in the complementary BLOB table will hold at most 2000 bytes, i.e. in this case:
    • The first 256 bytes are stored in the main table.
    • Bytes 257 to 2256 are stored in the first row of the complementary BLOB table.
    • Bytes 2257 to 4096 are stored in the second row of the complementary BLOB table.

The consequences of this are that the use of BLOB/TEXT columns both increases the number of tables needed and the number of rows to read. The additional tables involved in queries using BLOB/TEXT columns also causes MySQL Cluster to use additional locking.

For these reasons it is important to limit the BLOB/TEXT usage as much as possible. For example do not use SELECT * on a table with one or more BLOB/TEXT columns if you do not actually need the BLOB/TEXT data in the result.

Note: It is in general recommended to avoid the use of SELECT *, but it is particularly the case for tables containing BLOB/TEXT columns.

 

Disk Data

While MySQL Cluster originally was an entirely in-memory database, it has since MySQL Cluster 5.1.6 been possible to store non-indexed data on disk. This reduces the memory requirements, but if you use disk data have the following in mind:

  • Only non-indexes columns can be stored on disk, so if most or all of the columns in a table are indexed, the benefit of disk data may be small or not existing at all.
  • Accessing disk data is slower and more complex than in-memory data, so there will be a performance impact. Configure Disk Data Configuration Parameters to reduce the impact of accessing the disks.
  • Data stored on disk is with the exception of BLOB/TEXT columns always fixed width. This will increase the amount of storage required, so make sure columns are not defined wider than necessary.
  • Each column uses 8 bytes in memory for a pointer to the disk data (BLOB/TEXT uses 256 bytes plus the pointer). This means that for some data types, storing the data on disk requires more memory than storing the data in-memory.

The recommendation is to limit the use of disk data and make sure you test well before implementing disk data in a production cluster.

 

Access Methods

MySQL Cluster supports a range of methods to access the data nodes. The methods have varying overhead, so for high performance clusters, you should take that into consideration. The access methods in order of increasing overhead are:

  • The C NDB API
  • No SQL APIs such as Cluster/J
  • mysqld nodes

 

Monitoring

It is strongly recommended to monitor both the mysqld/API nodes and the data nodes.

MySQL Enterprise Monitor 3.0 is one possibility and supports monitoring all of the node types. See How to Monitor a MySQL Cluster Data Node With MySQL Enterprise Monitor (MEM)? for more information.

In addition to monitoring metrics such as disk I/O, query response time, etc. that a monitoring solution can provide, it is important to review the logs at a regular basis.

 

Backups

As with all other database systems backups are recommended:

  • Make sure to copy the backups out of the servers where the data nodes are stored.
  • To the extend possible make backups when the load is low.
  • Consider whether the CPU overhead of compressing the backup is worth the savings in disk I/O.
  • Combine with the binary logs to allow for Point-in-Time recovery.
  • Create separate backup of the schema using mysqldump.

See also: What is the Recommended Process for a Backup of MySQL Cluster

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值