My table structure is something like below:
CREATE TABLE test (
id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
field_1 VARCHAR(60) NOT NULL,
field_2 INT(10) UNSIGNED NULL,
field_3 INT(10) UNSIGNED NULL,
field_4 INT(10) UNSIGNED NULL,
field_5 CHAR(2) NULL,
field_6 INT(10) UNSIGNED NOT NULL,
rank TINYINT(2) NOT NULL DEFAULT '0',
status TINYINT(3) NOT NULL DEFAULT '0',
PRIMARY KEY (id),
INDEX (status)
) DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci ENGINE = MyISAM;
On above table the fields rank and status will have integer value between 0-9 and 0-4 respectively.
Currently the table is filled with around 950K data and I am trying to optimize my queries as much as possible.
Basically I need to select fields with some where clause with a descending order on field rank.
For example, below are few sql queries:
SELECT field_1, field_2, field_3 FROM test WHERE field_1 = 'data1' && status IN ('0', '1', '2') ORDER BY rank DESC LIMIT 0, 20;
SELECT field_1, field_2, field_3 FROM test WHERE field_2 = '5' && status IN ('1', '2') ORDER BY rank DESC LIMIT 0, 20;
SELECT field_1, field_2, field_3 FROM test WHERE field_5 = 'US' && status IN ('0', '2') ORDER BY rank DESC LIMIT 0, 20;
On above query ORDER BY rank DESC is very important. So I am quite confuse whether I should add index on single column or multi-column.
Could anyone suggest me the best solution.
解决方案
Your key problem is that over 950k rows your status column has up to 4 distinct values. On a BTREE index, this will be a real pain to process.
Some more effective indexes to use for the 3 queries mentioned are probably the following
INDEX forQuery1 ( field_1 , status , rank ) USING BTREE,
INDEX forQuery2 ( field_2 , status , rank ) USING BTREE,
INDEX forQuery3 ( field_5 , status , rank ) USING BTREE,
You'll find that the second query especially should benefit however you're still going to have the issue where the variance of the data is very low for the size of dataset and most likely MySQL will fall back to a table scan though your EXPLAIN will probably show the LIMIT to lessen the effect of that. The indexes mentioned should be suitable for determining which rows to return though.
For further information on how MySQL uses indexes take a look through 13.1.13. CREATE INDEX Syntax in particular the section on B-Tree Index Characteristics and the following extracts
If the table has a multiple-column index, any leftmost prefix of the
index can be used by the optimizer to find rows. For example, if you
have a three-column index on (col1, col2, col3), you have indexed
search capabilities on (col1), (col1, col2), and (col1, col2, col3).
MySQL cannot use an index if the columns do not form a leftmost prefix
of the index. Suppose that you have the SELECT statements shown here:
and
Sometimes MySQL does not use an index, even if one is available. One
circumstance under which this occurs is when the optimizer estimates
that using the index would require MySQL to access a very large
percentage of the rows in the table. (In this case, a table scan is
likely to be much faster because it requires fewer seeks.) However, if
such a query uses LIMIT to retrieve only some of the rows, MySQL uses
an index anyway, because it can much more quickly find the few rows to
return in the result.
As an additional note, you don't need to quote numeric data types and so field_2 = 5 && status IN ( 1 , 2 ) is valid (and in fact I have had some strange issue in the past due to quoting integer datatypes instead of specifying them as numerics)