I've got a table structure that can be summarized as follows:
pagegroup
* pagegroupid
* name
has 3600 rows
page
* pageid
* pagegroupid
* data
references pagegroup;
has 10000 rows;
can have anything between 1-700 rows per pagegroup;
the data column is of type mediumtext and the column contains 100k - 200kbytes data per row
userdata
* userdataid
* pageid
* column1
* column2
* column9
references page;
has about 300,000 rows;
can have about 1-50 rows per page
The above structure is pretty straight forwad, the problem is that that a join from userdata to page group is terribly, terribly slow even though I have indexed all columns that should be indexed. The time needed to run a query for such a join (userdata inner_join page inner_join pagegroup) exceeds 3 minutes. This is terribly slow considering the fact that I am not selecting the data column at all. Example of the query that takes too long:
SELECT userdata.column1, pagegroup.name
FROM userdata
INNER JOIN page USING( pageid )
INNER JOIN pagegroup USING( pagegroupid )
Please help by explaining why does it take so long and what can i do to make it faster.
Edit #1
Explain returns following gibberish:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE userdata ALL pageid 372420
1 SIMPLE page eq_ref PRIMARY,pagegroupid PRIMARY 4 topsecret.userdata.pageid 1
1 SIMPLE pagegroup eq_ref PRIMARY PRIMARY 4 topsecret.page.pagegroupid 1
Edit #2
SELECT
u.field2, p.pageid
FROM
userdata u
INNER JOIN page p ON u.pageid = p.pageid;
/*
0.07 sec execution, 6.05 sec fecth
*/
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE u ALL pageid 372420
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 topsecret.u.pageid 1 Using index
SELECT
p.pageid, g.pagegroupid
FROM
page p
INNER JOIN pagegroup g ON p.pagegroupid = g.pagegroupid;
/*
9.37 sec execution, 60.0 sec fetch
*/
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE g index PRIMARY PRIMARY 4 3646 Using index
1 SIMPLE p ref pagegroupid pagegroupid 5 topsecret.g.pagegroupid 3 Using where
Moral of the story
Keep medium/long text columns in a separate table if you run into performance problems such as this one.
解决方案
What's the data type and purpose of columnX in the userdata table? It should be noted that any text data type (i.e excluding char, varchar) forces any temporary tables to be created on disk. Now since you're doing a straight join without conditions, grouping or ordering, it probably won't need any temporary tables, except for aggregating the final result.
I think it would also be very helpful if you show us how your indexes are created. One thing to remember is that while InnoDB concatenates the primary key of the table to each index, MyISAM does not. This means that if you index column name and search for it with LIKE, but still want to get the id of the page group; Then the query would still need to visit the table to get the id instead of being able to retrieve it from the index.
What this means, in your case, if I understand your comment to apphacker correctly, is to get the name of each users pagegroups. The query optimizer would want to use the index for the join, but for each result it would also need to visit the table to retrieve the page group name. If your datatype on name is not bigger than a moderate varchar, i.e. no text, you could also create an index (id, name) which would enable the query to fetch the name directly from the index.
As a final try, you point out that the whole query would probably be faster if the mediumtext was not in the page table.
This column is excluded from the query you are running I presume?
You could also try to separate the page data from the page "configuration", i.e. which group it belongs to. You'd then probably have something like:
Pages
pageId
pageGroupId
PageData
pageId
data
This would hopefully enable you to join quicker since no column in Pages take up much space. Then, when you needed to display a certain page, you join with the PageData table on the pageId-column to fetch the data needed to display a particular page.