Join Decomposition
Many high-performance web sites use join decomposition. You can decompose a join
by running multiple single-table queries instead of a multitable join, and then per-
forming the join in the application. For example, instead of this single query:
mysql> SELECT * FROM tag
-> JOIN tag_post ON tag_post.tag_id=tag.id
-> JOIN post ON tag_post.post_id=post.id
-> WHERE tag.tag='mysql';
You might run these queries:
mysql> SELECT * FROM tag WHERE tag='mysql';
mysql> SELECT * FROM tag_post WHERE tag_id=1234;
mysql> SELECT * FROM post WHERE post.id in (123,456,567,9098,8904);
This looks wasteful at first glance, because you’ve increased the number of queries
without getting anything in return. However, such restructuring can actually give sig-
nificant performance advantages:
• Caching can be more efficient. Many applications cache “objects” that map
directly to tables. In this example, if the object with the tag mysql is already
cached, the application can skip the first query. If you find posts with an id of
123, 567, or 9098 in the cache, you can remove them from the IN( ) list. The
query cache might also benefit from this strategy. If only one of the tables
changes frequently, decomposing a join can reduce the number of cache
invalidations.
• For MyISAM tables, performing one query per table uses table locks more effi-
ciently: the queries will lock the tables individually and relatively briefly, instead
of locking them all for a longer time.
• Doing joins in the application makes it easier to scale the database by placing
tables on different servers.
• The queries themselves can be more efficient. In this example, using an IN( ) list
instead of a join lets MySQL sort row IDs and retrieve rows more optimally than
might be possible with a join. We explain this in more detail later.
• You can reduce redundant row accesses. Doing a join in the application means
you retrieve each row only once, whereas a join in the query is essentially a
denormalization that might repeatedly access the same data. For the same rea-
son, such restructuring might also reduce the total network traffic and memory
usage.
• To some extent, you can view this technique as manually implementing a hash
join instead of the nested loops algorithm MySQL uses to execute a join. A hash
join may be more efficient.