Enum VS Varchar VS Int + Joined: What is Faster

最新推荐文章于 2022-02-25 18:19:43 发布

andyao

最新推荐文章于 2022-02-25 18:19:43 发布

阅读量92

点赞数

分类专栏： mysql 文章标签： SQL MySQL performance C C++

mysql 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

Enum Fields VS Varchar VS Int + Joined table: What is Faster?

from MySQL Performance Blog by Alexey Kovyrin

Really often in customers' application we can see a huge tables with varchar/char fields, with small sets of possible values. These are "state", "gender", "status", "weapon_type", etc, etc. Frequently we suggest to change such fields to use ENUM column type, but is it really necessary (from performance standpoint)? In this post I'd like to present a small benchmark which shows MySQL performance when you use 3 different approaches: ENUM, VARCHAR and tinyint (+joined table) columns.

In practice you can also often use 4th variant which is not comparable directly, which is using integer value and having value mapping done on application level.

So, first of all, a few words about our data set we've used for this benchmark. We have 4 tables:
1) Table with ENUM:

SQL:

CREATE TABLE cities_enum (
id int ( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT,
state enum ( 'Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri' ) NOT NULL,
city varchar ( 255 ) NOT NULL,
PRIMARY KEY (id ),
KEY state (state )
) ENGINE=MyISAM;

2) Table with VARCHAR:

SQL:

CREATE TABLE cities_varchar (
id int ( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT,
state varchar ( 50 ) NOT NULL,
city varchar ( 255 ) NOT NULL,
PRIMARY KEY (id ),
KEY state (state )
) ENGINE=MyISAM;

3) Table with INT:

SQL:

CREATE TABLE cities_join (
id int ( 10 ) UNSIGNED NOT NULL AUTO_INCREMENT,
state_id tinyint ( 3 ) UNSIGNED NOT NULL,
city varchar ( 255 ) NOT NULL,
PRIMARY KEY (id ),
KEY state_id (state_id )
) ENGINE=MyISAM;

4) Dictionary table for cities_join:

SQL:

CREATE TABLE IF NOT EXISTS `states` (
`id` tinyint ( 3 ) NOT NULL AUTO_INCREMENT,
`name` char ( 40 ) NOT NULL,
PRIMARY KEY ( `id` ),
UNIQUE KEY `name` ( `name` )
) ENGINE=MyISAM;

All cities_* tables have 1,5M records each and records are distributed among 29 different states (just happens to be data we had available for tests)

Two important notes about this table before we get to results - this is rather small table which fits in memory in all cases (and dictionary table does too). Second - the rows are relatively short in this table so changing state from VARCHAR to ENUM or TINYINT affects row size significantly. In many cases size difference will be significantly less.

All tests are runned 1000 times and the result time is average from those 1000 runs.

So, our first benchmark is simple: we need to get 5 names of cities, located in Minnesota and, to make things slower, we'll take those records starting from record #10000 making MySQL to discard first 10000 records.

1) Results for ENUM:

SQL:

SELECT SQL_NO_CACHE city FROM cities_enum WHERE state= 'Minnesota' LIMIT 10000, 5;
Result time (mean ): 0. 082196

2) Results for VARCHAR:

SQL:

SELECT SQL_NO_CACHE city FROM cities_varchar WHERE state= 'Minnesota' LIMIT 10000, 5;
Result time (mean ): 0. 085637

3) Results for INT + join:

SQL:

SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id ) WHERE s.name= 'Minnesota' LIMIT 10000, 5;
Result time (mean ): 0. 083277

So, as you can see, all three approaches are close with ENUM being fastest and VARCHAR few percent slower.

This may look counterintuitive because table is significantly smaller with ENUM or TINYINT but in fact it is quite expected - This is MyISAM table which is accessed via index, which means to retrieve each row MySQL will have to perform OS system call to read the row, at this point there is not much difference if 20 or 30 bytes are being read. For Full Table Scan operation difference often would be larger.

It is also interesting to note performance of Innodb tables in this case: for VARCHAR it takes about 0.022 per query which makes it about 4 times faster than for MyISAM. This is great example of the case when Innodb is much faster than MyISAM for Read load.

The other surprise could be almost zero cost of the join, which we always claimed to be quite expensive. Indeed there is no cost of the join in this case because there is really no join:

SQL:

mysql> EXPLAIN SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id ) WHERE s.name= 'Minnesota' LIMIT 10000, 5 \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
TABLE: s
type: const
possible_keys: PRIMARY,name
KEY: name
key_len: 40
ref: const
rows: 1
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
TABLE: c
type: ref
possible_keys: state
KEY: state
key_len: 1
ref: const
rows: 225690
Extra:
2 rows IN SET ( 0. 10 sec )

Because we refer state by name, which is unique,it is pre-read and query executed basically on single table querying state by ID.

Next test was a result of my curiosity. I've tried to order results by states.

1) Results for ENUM:

SQL:

SELECT SQL_NO_CACHE city FROM cities_enum ORDER BY state LIMIT 10000, 5;
Result time (mean ): 0. 077549

2) Results for VARCHAR:

SQL:

SELECT SQL_NO_CACHE city FROM cities_varchar ORDER BY state LIMIT 10000, 5;
Result time (mean ): 0. 0854793

3)

SQL:

SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id ) ORDER BY s.name LIMIT 10000, 5;
Result time (mean ): 26. 0854793

As you can see, ENUM and VARCHAR show close performance, while join performance degraded dramatically.

Here is why:

SQL:

mysql> EXPLAIN SELECT SQL_NO_CACHE c.city FROM cities_join c JOIN states s ON (s.id = c.state_id ) ORDER BY s.name LIMIT 10000, 5\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
TABLE: c
type: ALL
possible_keys: state
KEY: NULL
key_len: NULL
ref: NULL
rows: 1439943
Extra: USING TEMPORARY; USING filesort
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
TABLE: s
type: eq_ref
possible_keys: PRIMARY
KEY: PRIMARY
key_len: 1
ref: test.c.state_id
rows: 1
Extra:
2 rows IN SET ( 0. 00 sec )

Because we're sorting by name we have to perform the join for each row to retrieve it. This also means sort can't be done by index and extra sort pass (filesort) is required, which also makes MySQL to store Join result in temporary table to do the sort, all together makes things quite miserable. Note this might not be best execution plan to pick in this case but this is other story.

To avoid part of this problem we of course arrange state ids in the alphabetical order and do sort by state_id, though join cost still could be significant.

And the last test - selecting city and name in arbitrary order, skipping first 10000 rows to make query times longer.

1) Results for ENUM:

SQL:

SELECT SQL_NO_CACHE city, state FROM cities_enum LIMIT 10000, 5;
Result time (mean ): 0. 003125

2) Results for VARCHAR:

SQL:

SELECT SQL_NO_CACHE city, state FROM cities_varchar LIMIT 10000, 5;
Result time (mean ): 0. 003283

3)

SQL:

SELECT SQL_NO_CACHE c.city, s.name FROM cities_join c JOIN states s ON (s.id = c.state_id ) LIMIT 10000, 5;
Result time (mean ): 0. 004170

As you can see, ENUM and VARCHAR results are almost the same, but join query performance is 30% lower.
Also note the times themselves - traversing about same amount of rows full table scan performs about 25 times better than accessing rows via index (for the case when data fits in memory!)

So, if you have an application and you need to have some table field with a small set of possible values, I'd still suggest you to use ENUM, but now we can see that performance hit may not be as large as you expect. Though again a lot depends on your data and queries.

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Enum VS Varchar VS Int + Joined: What is Faster

Enum Fields VS Varchar VS Int + Joined table: What is Faster?from MySQL Performance Blog by Alexey KovyrinReallyoften in customers' application we can see a huge tables withvarchar/char fields, with...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。