high performance mysql sakila_HPM Note4, Schema Optimization and Indexing

Choosing Optimal Data Types

Smaller is usually better

Simple is good

Avoid NULL if possible

Indexing Basics

Indexes are implemented in the storage engine layer, not the server layer

B-Tree indexes is the default index type of MySQL

B-Trees store the indexed columns in order, they're usefull for searching for ranges of data

B-Tree indexes work well for lookups by the full key value, a key range, or a key prefix

B-Tree limitations:

1. They are not usefull if the lookup does not start from the leftmost side of the indexed columns

2. You can't skip columns in the index

3. The storage engine can't optimize accesses with any columns to the right of the first range condition

Hash indexes

Hash index is built on a hash table and is useful only for exact lookups that use every column in the index

For each row, the storage engine computes a hash code of the indexed columns, which is a small value that will probably differ from the hash codes computed for other rows with different key values

It stores the hash codes in the index and stores a pointer to each row in a hash table

Hash indexes are the default index type for Memory tables in MySQL

Hash index limitations:

1. Index contains only hash code and row pointers, MySQL can't use the values in the index to avoid reading the rows

2. MySQL can't use has indexes for sorting bc they don't store rows in sorted order

3. Hash indexes don't support partial key matching bc they compute the hash from the entire indexed value

4. Hash indexes support only equality comparisons that use the =, IN(), and <=> operators, they don't support range queries

5. Accessing data in a hash index is very quick, unless there are many collisions(When there are multiple values with the same hash, the storage engine must follow each row pointer in the linked list and compare their values to the lookup value to find the right rows)

6. Some index maintenance operations can be slow if there are many hash collisions

Hash index are useful for lookup tables in a classic "star" schema data-warehousing

The InnoDB storage engine has a special feature called adaptive hash indexes

When InndoDB notices that some index values are being accessed very frequently, it builds a hash index for them in memory on top of B-Tree index

This process is completely automatic, and you can't control or configure it

You can build your own hash index if storage engine does not support it

Spatial(R-Tree) indexes

MyISAM supports spatial indexes, which you can use with geospatial types such as GEOMETRY

Full-text indexes

FULLTEXT is a special type of index for MyISAM tables

Full-text indexes are for MATCH AGAINST operations, not ordinary WHERE clause operations

Indexing Strategies for High performance

Isolate the Column

If you don't isolate the indexed columns in a query, MySQL generally can't use indexes on columns unless the columns are isolated in the query

// these queries can't use index

SELECT actor_id FROM sakila.actor WHERE actor_id + 1 = 5;

SELECT ... WHERE TO_DAYS(CURRENT_DATE) - TO_DAYS(date_col) <= 10;

Prefix Inndexes and Index Selectivity

For long character columns such as BLOB or TEXT, we can index the prefix and make index selectivity close to full column's selectivity

Clustered Indexes

From wikipedia:

Index architectures can be classified as clustered or non-clustered.

Unclustered

An Unclustered index is structured in a way that doesn't correspond to the order of the actual data records. It resembles the words index at the back of a book. For this reason, it will perform worse than clustered indexes on ranged searches where the result set is large, since each result could cost an additional I/O-operation to get the actual data record. One can have any number of these kinds of indexes, since all that is required is the space to store the index itself -- one does not copy the actual data to create a new index.

Clustered

Clustering alters the data block into a certain distinct order to match the index, hence it is also an operation on the data storage blocks as well as on the index. An address book ordered by first name resembles a clustered index in its structure and purpose. The exact operation of database systems vary, but because storing data is very redundant the row data can only be stored in one order. Therefore, only one clustered index can be created on a given database table. Clustered indexes can greatly increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index, or when a range of items are selected.

Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or after the last one, and so fewer data block reads are required. The primary feature of a clustered index is therefore the ordering of the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and index blocks into separate files, others put two completely different data blocks within the same physical file(s).Clusturing is a great example of Indexing and is far more sophisticated than an unclustered index.

Now only solidDB and InnoDB support clustered indexes

InnoDB clusters the data by the primary key

If you don't define a primary key, InnoDB will try to use a unique nonnullable index instead

If there's no such index, InnoDB will define a hidden primary key for you and the cluster on that

Clustered indexes advantages:

1. You can keep related data close together

2. Data access is fast

3. Queries that use covering indexes can use the primary key values contained at the leaf node

Clustered indexes disadvantages:

1. Clustering gives the largest improvement for I/O-bound workfloads. If the data fits in memory the roder in which it's accessed doesn't really matter, so clustering doesn't give much benefit

2. Insert speeds depend heavily on insertion order

3. Updating the clustered index columns is expensive, because it forces InnoDB to move each updated row to a new location

4. Tables built upon clustered indexes are subject to page splits when new rows are inserted, or when a row's primary key is updated such that the row must be moved

5. Clustered tables can be slower for full table scans

6. Secondary(nonclustered) indexes can be larger than you might expect, because their leaf nodes contain the primary key columns of the referenced rows

7. Secondary index accesses require two index lookups instead of one. To find a row from a secondary index, the storage engine first finds the leaf node in the secondary index and then uses the primary key values stores there to navigate the primary key and find the row

An index that contains or covers all the data needed to satisfy a query is called a covering index

An Indexing Case Study

You should put range column to the end of index, because MySQL uses only the leftmost prefix, up to and including the first condition that specifies a range of values

Avoid multiple range conditions

The query can be slow if the user interface is paginated and someone requests a page that's not near the beginning

Denormalizing, precomputing, and caching are likely to be the only strategies that work for queries like this

An even better strategy is to limit the number of pages you let the user view

This is unlikely to impact the user's experience, because no one really cares about the 10,000th page of search results

Index and Table Maintenance

Finding and Repairing Table Corruption

CHECK TABLE

REPAIR TABLE

ALTER TABLE innodb_tbl ENGINE=INNODB;

myisamchk

backup and recovery

Updating Index Statistics

ANALYZE TABLE

SHOW INDEX FROM

INFORMATION_SCHEMA.STATICTICS

Reducing Index and Data Fragmentation

OPTIMIZE TABLE

dump and reload the data

Normalization and Denormalization

Database Normalization

Advantages of Normalized Schema:

1, Normalized upodates are usually faster than denormalized updates

2, When the data is well normalized, there's little or no duplicated data, so there's less data to change

3, Normalized tables are usually smaller, so they fit better in memory and perform better

4, The lack of redundant data means there's less need for DISTINCT or GROUP BY queries when retrieving lists of values

Drawbacks of Normalized Schema:

Any non-trivial query on a well-normalized schema will probably require at least one join, and perhaps several

This is not only expensive, but it can make some indexing strategies impossible

Advantages of Normalized Schema:

1, Everyhing is in the same table, which avoids joins. This can be much faster than a join, because it avoids random I/O

2, A single table can allow more efficient indexing strategies

In the real world, you often need to mix the approaches, possibly using a partially normalized schema, cache tables, and other techniques

The most common way to denormalized data is to duplicate or cache selected columns from one table in anotehr table, you can use triggers to udpate the cached values which makes the implementation easier

You'll often need extra idnexes, redundant fields, or even cache and summary tables to speed up read queries

Speeding Up ALTER TABLE

MySQL's ALTER TABLE performance can become a problem with very large tables, MySQL performs most alterations by making an empty table with the desired new structure, inserting all the data from the old table into the new one, and deleting the old table

ALTER TABLE sakila.film MODIFY COLUMN rental_duration TINYINT(3) NOT NULL DEFAULT 5; // slow

ALTER TABLE sakila.film ALTER COLUMN rental_duration SET DEFAULT 5; // fast. This statement modifies the .frm file and leaves the table alone

Modifying Only the .frm file to convince MySQL not to rebuild the table

CREATE TABLE sakila.film_new LIKE sakila.film

ALTER TABLE sakila.film_new MODIFY COLUMN rating ENUM('G', 'PG', 'PG-13', 'R', 'NC-17', 'PG-14') DEFAULT 'G'

FLUSH TABLES WITH READ LOCK;

// swap the .frm files of table film and film_new

mv film.frm film_tmp.frm

mv film_new.frm film.frm

mv film_tmp.frm film_new.frm

UNLOCK TABLES

DROP TABLE sakila.film_new

Builing MyISAM Indexes Quickly

The usual trick for loading MyISAM tables efficiently is to disable keys, load the data, and reenable the keys:

ALTER TABLE test.load_data DISABLE KEYS

-- load the data

ALTER TABLE test.load_data ENABLE KEYS

This only works for nonunique indexes.

For unique indexes, the faster way:

1, create a table of the desied structure, without any indexes

2, load the data into the table to build the .MYD file

3, create another empty table with the desired structure, including indexes, this will create .frm and .MYI files

4, FLush the tables with a read lock

5, Rename the second tables's .frm and .MYI files, so MySQL uses them for the first table

6, Release the read lock

7, Use REPAIR TABLE to build the table's indexes

Notes on Storage Engines

MyISAM

Table locks

MyISAM tables have table-level locks, be careful this doesn't become a bottleneck

No automated data recovery

If the MySQL server crashes or power goes down, you should check and possibly repair your MyISAM tables before using them. If you have large tables, this could take hours

No transactions

MyISAM table don't support transactions

Only indexes are cached in memory

MyISAM caches only the index inside the MySQL process, in the key buffer. The operating system caches the table's data

Compact storage

Rows are stored jam-packed one after anotehr, so you get a small disk footprint and fast full table scans for on-disk data

Memory

Table locks

Memory tables have table locks but isn't usually a problem because queries on Memory tables are normally fast

No dynamic rows

Memory tables don't support dynamic(variable-length) rows, so they don't support BLOB and TEXT fields at all. Even a VARCHAR(5000) turns into a CHAR(5000) -- a huge memory waste if most values are small

Hash indexes are the default index type

No index statistics

You may get bad execution palns for some complex queries

Content is lost on restart

Memory tables don't persist any data to disk, so the data is lost when the server restarts, even though the table's definitions remain

InnoDB

Transactional

InnoDB supports transactions adn four transaction isolation levels

Foreign keys

As of MySQL 5.0, InnoDB is the only storage engine that supports foreign keys

Row-level locks

Locks are set at the row level, with no escalation and nonblocking selects--standard selects don't set any locks at all, which gives very good concurrency

Multiversioning

InnoDb uses multiversion concurrecncy control, so by default your selects may read stale data

Clustering by primary key

All InnoDB tables are clustered by the primary key, which you can use to your advantage in schema design

All indexes contain the primary key columns

Indexes refer to the rows by the primary key, so if you don't keep your primary key short, the indexes will grow very large

Optimized caching

InnoDB caches both data and memory in the buffer pool, it also automatically builds hash indexes to speed up row retrieval

Unpacked indexes

Indexes are not packed with prefix compression, so they can be much larger than for MyISAM tables

Slow data load

As of MySQL 5.0, InnoDb does not specially optimize data load operations. It builds indexes a row at a time, instead of building them by sorting. This may result in significantly slower data loads

Blocking AUTO_INCREMENT

In versions earlier than MySQL 5.1, InnoDB uses a table-level lock to generate each new AUTO_INCREMENT value

No cached COUNT(*) value

Unlick MyISAM or Memory tables, InnoDB tables don't store the number of rows in the table, which means COUNT(*) queries without a WHERE clause can't be optimized away and require full table or index scans

分享到:

18e900b8666ce6f233d25ec02f95ee59.png

72dd548719f0ace4d5f9bca64e1d7715.png

2009-07-16 18:04

浏览 1262

评论

We had several goals in mind for this book. Many of them were derived from think- ing about that mythical perfect MySQL book that none of us had read but that we kept looking for on bookstore shelves. Others came from a lot of experience helping other users put MySQL to work in their environments. We wanted a book that wasn’t just a SQL primer. We wanted a book with a title that didn’t start or end in some arbitrary time frame (“...in Thirty Days,” “Seven Days To a Better...”) and didn’t talk down to the reader. Most of all, we wanted a book that would help you take your skills to the next level and build fast, reliable systems with MySQL—one that would answer questions like “How can I set up a cluster of MySQL servers capable of handling millions upon millions of queries and ensure that things keep running even if a couple of the servers die?” We decided to write a book that focused not just on the needs of the MySQL appli- cation developer but also on the rigorous demands of the MySQL administrator, who needs to keep the system up and running no matter what the programmers or users may throw at the server. Having said that, we assume that you are already rela- tively experienced with MySQL and, ideally, have read an introductory book on it. We also assume some experience with general system administration, networking, and Unix-like operating systems. This revised and expanded second edition includes deeper coverage of all the topics in the first edition and many new topics as well. This is partly a response to the changes that have taken place since the book was first published: MySQL is a much larger and more complex piece of software now. Just as importantly, its popularity has exploded. The MySQL community has grown much larger, and big corporations are now adopting MySQL for their mission-critical applications. Since the first edi- tion, MySQL has become recognized as ready for the enterprise.* People are also * We think this phrase is mostly marketing fluff, but it seems to convey a sense of importance to a lot of people. using it more and more in applications that are exposed to the Internet, where down- time and other problems cannot be concealed or tolerated. As a result, this second edition has a slightly different focus than the first edition. We emphasize reliability and correctness just as much as performance, in part because we have used MySQL ourselves for applications where significant amounts of money are riding on the database server. We also have deep experience in web applications, where MySQL has become very popular. The second edition speaks to the expanded world of MySQL, which didn’t exist in the same way when the first edition was written.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值