MySQL 5.7-8.4.1 Optimizing Data Size

MySQL 5.7Design your tables to minimize their space on the disk.

在设计表时尽量减少它们在磁盘上的空间。

This can result in huge improvements by reducing the amount of data written to and read from disk.

这可以通过减少写入和从磁盘读取的数据量而产生巨大的改进。

Smaller tables normally require less main memory while their contents are being actively processed during query execution.

较小的表通常需要较少的主内存,而它们的内容在查询执行期间被积极处理。

Any space reduction for table data also results in smaller indexes that can be processed faster.

任何表数据空间的减少也会导致更小的索引,处理速度更快。

MySQL supports many different storage engines (table types) and row formats.

MySQL支持许多不同的存储引擎(表类型)和行格式。

For each table, you can decide which storage and indexing method to use. Choosing the proper table format for your application can give you a big performance gain.

对于每个表,您可以决定使用哪种存储和索引方法。为应用程序选择合适的表格式可以极大地提高性能。

See Chapter 14, The InnoDB Storage Engine, and Chapter 15, Alternative Storage Engines.

You can get better performance for a table and minimize storage space by using the techniques listed here:

通过使用下面列出的技术,您可以获得更好的表性能,并最小化存储空间:

Table Columns

  • Use the most efficient (smallest) data types possible. MySQL has many specialized types that save disk space and memory. For example, use the smaller integer types if possible to get smaller tables. MEDIUMINT is often a better choice than INT because a MEDIUMINT column uses 25% less space.

  • 尽可能使用最有效(最小)的数据类型。MySQL有许多专门的类型,可以节省磁盘空间和内存。例如,如果可能,可以使用较小的整数类型来获得较小的表。MEDIUMINT通常是比INT更好的选择,因为MEDIUMINT列使用的空间少25%。

  • Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.

  • 如果可能,将列声明为NOT NULL。通过更好地使用索引和消除测试每个值是否为NULL的开销,它使SQL操作更快。您还可以节省一些存储空间,即每列节省一位。如果您的表中确实需要NULL值,那么就使用它们。只需避免默认设置,即允许在每个列中都有NULL值。

Row Format

InnoDB tables are created using the DYNAMIC row format by default.

InnoDB表默认使用DYNAMIC行格式创建。

To use a row format other than DYNAMIC, configure innodb_default_row_format, or specify the ROW_FORMAT option explicitly in a CREATE TABLE or ALTER TABLE statement.

要使用非DYNAMIC的行格式,需要配置innodb_default_row_format,或者在CREATE TABLE或ALTER TABLE语句中显式指定ROW_FORMAT选项。

The compact family of row formats, which includes COMPACTDYNAMIC, and COMPRESSED, decreases row storage space at the cost of increasing CPU use for some operations.

紧凑行格式家族(包括compact、DYNAMIC和COMPRESSED)减少了行存储空间,但某些操作增加了CPU使用。

If your workload is a typical one that is limited by cache hit rates and disk speed it is likely to be faster. If it is a rare case that is limited by CPU speed, it might be slower.

如果您的工作负载是受缓存命中率和磁盘速度限制的典型工作负载,那么它可能会更快。如果是受CPU速度限制的罕见情况,则可能会更慢。

The compact family of row formats also optimizes CHAR column storage when using a variable-length character set such as utf8mb3 or utf8mb4. With ROW_FORMAT=REDUNDANTCHAR(N) occupies N × the maximum byte length of the character set.

当使用utf8mb3或utf8mb4等变长字符集时,紧凑的行格式家族还可以优化CHAR列存储。使用ROW_FORMAT=REDUNDANT, CHAR(N)占用字符集的最大字节长度的N ×。

Many languages can be written primarily using single-byte utf8 characters, so a fixed storage length often wastes space. With the compact family of rows formats, InnoDB allocates a variable amount of storage in the range of N to N × the maximum byte length of the character set for these columns by stripping trailing spaces.

许多语言可以主要使用单字节utf8字符编写,因此固定的存储长度通常会浪费空间。在紧凑的行格式家族中,InnoDB通过剥离尾随空格,为这些列分配了N到N ×字符集的最大字节长度的可变容量。

The minimum storage length is N bytes to facilitate in-place updates in typical cases.

最小存储长度为N字节,以便在典型情况下就地更新。

For more information, see Section 14.11, “InnoDB Row Formats”.

To minimize space even further by storing table data in compressed form, specify ROW_FORMAT=COMPRESSED when creating InnoDB tables, or run the myisampack command on an existing MyISAM table.

为了通过以压缩形式存储表数据来进一步减少空间,在创建InnoDB表时指定ROW_FORMAT= compressed,或者在现有的MyISAM表上运行myisampack命令

(InnoDB compressed tables are readable and writable, while MyISAM compressed tables are read-only.)

(InnoDB压缩表是可读可写的,MyISAM压缩表是只读的。)

For MyISAM tables, if you do not have any variable-length columns (VARCHARTEXT, or BLOB columns), a fixed-size row format is used.

对于MyISAM表,如果没有任何变长列(VARCHAR、TEXT或BLOB列),则使用固定大小的行格式。

This is faster but may waste some space.

这样更快,但可能会浪费一些空间。

See Section 15.2.3, “MyISAM Table Storage Formats”.

You can hint that you want to have fixed length rows even if you have VARCHAR columns with the CREATE TABLE option ROW_FORMAT=FIXED.

您可以使用CREATE TABLE选项ROW_FORMAT= fixed提示您想要固定长度的行,即使您有VARCHAR列。

Indexes

The primary index of a table should be as short as possible.

表的主索引应该尽可能短。

This makes identification of each row easy and efficient.

这使得识别每一行变得简单而高效。

For InnoDB tables, the primary key columns are duplicated in each secondary index entry, so a short primary key saves considerable space if you have many secondary indexes.

对于InnoDB表,主键列在每个辅助索引项中都是重复的,所以如果有很多辅助索引,短主键可以节省相当大的空间。

Create only the indexes that you need to improve query performance.

只创建需要提高查询性能的索引。

Indexes are good for retrieval, but slow down insert and update operations.

索引有利于检索,但会降低插入和更新操作的速度。

If you access a table mostly by searching on a combination of columns, create a single composite index on them rather than a separate index for each column.

如果访问表主要是通过对列的组合进行搜索,那么在这些列上创建单个复合索引,而不是为每个列创建单独的索引。

The first part of the index should be the column most used. If you always use many columns when selecting from the table, the first column in the index should be the one with the most duplicates, to obtain better compression of the index.

索引的第一部分应该是最常用的列。如果在从表中进行选择时总是使用许多列,那么索引中的第一列应该是重复次数最多的一列,以获得更好的索引压缩。

If it is very likely that a long string column has a unique prefix on the first number of characters, it is better to index only this prefix, using MySQL's support for creating an index on the leftmost part of the column (see Section 13.1.14, “CREATE INDEX Statement”).

如果很有可能一个长字符串列在第一个字符上有一个唯一的前缀,那么最好只索引这个前缀,使用MySQL支持在列的最左边创建索引(参见13.1.14节,“CREATE index Statement”)。

Shorter indexes are faster, not only because they require less disk space, but because they also give you more hits in the index cache, and thus fewer disk seeks.

更短的索引更快,不仅因为它们需要更少的磁盘空间,还因为它们在索引缓存中提供了更多的命中,从而减少了磁盘搜索。

See Section 5.1.1, “Configuring the Server”.

Joins

In some circumstances, it can be beneficial to split into two a table that is scanned very often.

在某些情况下,将一个经常被扫描的表分成两个可能是有益的。

This is especially true if it is a dynamic-format table and it is possible to use a smaller static format table that can be used to find the relevant rows when scanning the table.

如果它是一个动态格式的表,并且可以使用更小的静态格式表来查找扫描表时的相关行,这一点尤其正确。

Declare columns with identical information in different tables with identical data types, to speed up joins based on the corresponding columns.

在具有相同数据类型的不同表中声明具有相同信息的列,以基于相应列加速连接。

Keep column names simple, so that you can use the same name across different tables and simplify join queries.

保持列名简单,以便可以在不同的表中使用相同的名称,并简化连接查询。

For example, in a table named customer, use a column name of name instead of customer_name. To make your names portable to other SQL servers, consider keeping them shorter than 18 characters.

例如,在名为customer的表中,使用name的列名而不是customer_name。要使您的名字可移植到其他SQL服务器,请考虑将名称保持在18个字符以下。

Normalization

标准化

Normally, try to keep all data nonredundant (observing what is referred to in database theory as third normal form).

通常情况下,尽量使所有数据都是非冗余的(观察数据库理论中所谓的第三种范式)。

Instead of repeating lengthy values such as names and addresses, assign them unique IDs, repeat these IDs as needed across multiple smaller tables, and join the tables in queries by referencing the IDs in the join clause.

不要重复冗长的值,如名称和地址,而是为它们分配惟一的id,根据需要在多个较小的表中重复这些id,并在查询中通过引用join子句中的id来连接表。

If speed is more important than disk space and the maintenance costs of keeping multiple copies of data, for example in a business intelligence scenario where you analyze all the data from large tables, you can relax the normalization rules, duplicating information or creating summary tables to gain more speed.

如果速度比磁盘空间和保存多个数据副本的维护成本更重要,例如在分析大型表中的所有数据的业务智能场景中,您可以放松规范化规则,复制信息或创建汇总表,以获得更高的速度。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值