8 Things I learned from "High Performance MySQL"_baron schwartz, peter zaitsev, vadim tkachenko. hi-CSDN博客

8 Things I learned from "High Performance MySQL"

I was reading through...
"High Performance MySQL"
Written by Baron Schwartz; Peter Zaitsev; Vadim Tkachenko; Jeremy Zawodny; Arjen Lentz; Derek J. Balling
today and found a few interesting things, I just thought I'd share...

1. One Complex Query or Several Simple Ones?
"One important query design question is whether it's preferable to break up a complex query into several simpler queries. The traditional approach to database design emphasizes doing as much work as possible with as few queries as possible. This approach was historically better because of the cost of network communication and the overhead of the query parsing and optimization stages.

However, this advice doesn't apply as much to MySQL, because it was designed to handle connecting and disconnecting very efficiently and to respond to small and simple queries very quickly. Modern networks are also significantly faster than they used to be, reducing network latency. MySQL can run more than 50,000 simple queries per second on commodity server hardware and over 2,000 queries per second from a single correspondent on a Gigabit network, so running multiple queries isn't necessarily such a bad thing."

2. Establishing a standard for how you write queries will help give you a performance boost
"Before even parsing a query, MySQL checks for it in the query cache, if the cache is enabled. This operation is a case sensitive hash lookup. If the query differs from a similar query in the cache by even a single byte, it won't match, and the query processing will go to the next stage."

This means that "SELECT firstname, lastname FROM people" is different from "SELECT lastname, firstname FROM people" even though they essentially fetch the same data result.

3. DATETIME columns require twice as much space as TIMESTAMP
"a DATETIME and a TIMESTAMP column can store the same kind of data: date and time, to a precision of one second. However, TIMESTAMP uses only half as much storage space, is time zone–aware, and has special autoupdating capabilities"

4. An ENUM column can store up to 65,535 distinct string values

5. Indexes can affect row level locking
"Indexes play a very important role for InnoDB, because they let queries lock fewer rows. This is an important consideration, because in MySQL 5.0 InnoDB never unlocks a row until the transaction commits.

If your queries never touch rows they don't need, they'll lock fewer rows, and that's better for performance for two reasons. First, even though InnoDB's row locks are very efficient and use very little memory, there's still some overhead involved in row locking. Secondly, locking more rows than needed increases lock contention and reduces concurrency."

6. COUNT(*) is better than COUNT(table.id)
"One of the most common mistakes we see is specifying column names inside the parentheses when you want to count rows. When you want to know the number of rows in the result, you should always use COUNT(*). This communicates your intention clearly and avoids poor performance"

7. SQL_CALC_FOUND_ROWS is an evil shortcut
"Another common technique for paginated displays is to add the SQL_CALC_FOUND_ROWS hint to a query with a LIMIT, so you'll know how many rows would have been returned without the LIMIT. It may seem that there's some kind of "magic" happening here, whereby the server predicts how many rows it would have found. But unfortunately, the server doesn't really do that; it can't count rows it doesn't actually find. This option just tells the server to generate and throw away the rest of the result set, instead of stopping when it reaches the desired number of rows. That's very expensive.

A better design is to convert the pager to a "next" link. Assuming there are 20 results per page, the query should then use a LIMIT of 21 rows and display only 20. If the 21st row exists in the results, there's a next page, and you can render the "next" link.

Another possibility is to fetch and cache many more rows than you need—say, 1,000—and then retrieve them from the cache for successive pages. This strategy lets your application know how large the full result set is. If it's fewer than 1,000 rows, the application knows how many page links to render; if it's more, the application can just display "more than 1,000 results found." Both strategies are much more efficient than repeatedly generating an entire result and discarding most of it.

Even when you can't use these tactics, using a separate COUNT(*) query to find the number of rows can be much faster than SQL_CALC_FOUND_ROWS, if it can use a covering index."

8. Using "MATCH (columns) AGAINST (value)" allows you to conduct searches in either Boolean or Natural Language Modes and ranks your results, a significant improvement over "LIKE %value%'"
"A natural-language search query determines each document's relevance to the query. Relevance is based on the number of matched words and the frequency with which they occur in the document. Words that are less common in the entire index make a match more relevant. In contrast, extremely common words aren't worth searching for at all. A natural-language full-text search excludes words that exist in more than 50% of the rows in the table, even if they're not in the stopword list.

In Boolean searches, the query itself specifies the relative relevance of each word in a match.
When constructing a Boolean search query, you can use prefixes to modify the relative ranking of each keyword in the search string. The most commonly used modifiers are shown below...
dinosaur Rows containing "dinosaur" rank higher.
~dinosaur Rows containing "dinosaur" rank lower.
+dinosaur Rows must contain "dinosaur".
-dinosaur Rows must not contain "dinosaur".
dino* Rows containing words that begin with "dino" rank higher."