那么性能分析的功能已经得知后,我们现在就需要使用索引来对数据库的查询进行有化!!!!!至于如何优化每个人都有自己的方法,但是有些业界准则是可以提前先考虑的。
先说如何建立索引,在postgrelSQL里面有两种索引,一种是hash, 一种是btree,
以下两条命令是在postgre里面建立index:
create index index_name on table using index_type(column[,column[,...]]);
cluster index_name on table;
比如建立以下索引:
create index assessment_point_index on assessment using btree(total_points);
cluster assessment_point_index on assessment;
create index assessment_id_index on assessment using hash(assessment_id);
记住cluster索引会对数据库的数据进行重新排序,所以如果建立cluster索引会导致原来的table被排序后的table覆盖,所以原来建立在table上索引失效,这个特别注意不能写成:
create index assessment_id_index on assessment using hash(assessment_id);create index assessment_point_index on assessment using btree(total_points);
cluster assessment_point_index on assessment;
这样的hash索引就失效。
最后说明一个例子如:
比如优化:
select student_id, assessment_id
from student_assessment
where score between 50 and 80;
这句query:
1. analysis
school=> explain analyze select student_id, assessment_id
from student_assessment
where score between 50 and 80;
NOTICE: QUERY PLAN:
Seq Scan on student_assessment (cost=0.00..6691.98 rows=78990 width=8)
(actual time=114.35..790.73 rows=77833 loops=1)
Total runtime: 847.33 msec
2. 对score建立index:
create index student_assessment_index on student_assessment using btree(score);
3.再次analysis:
school=> explain analyze select student_id, assessment_id
from student_assessment
where score between 50 and 80;
NOTICE: QUERY PLAN:
Seq Scan on student_assessment (cost=0.00..6692.02 rows=78991 width=8)
(actual time=233.38..907.74 rows=77835 loops=1)
Total runtime: 963.98 msec
发现效率降低!!!!!!索引建立索引并不能都加快query
对于这种简单的btree的索引,如果满足50~80的tuple占所有的tuple很少时,就是low selective时,才会比较试用,但是在我们的表中这种50~80的tuple占了所有索引的31%,所以,建立这种btree的作用不是很大,反而降低效率,对于这种问题,我们可以建立cluster index,让table的tuple根据score排序,命令如下
cluster student_assessment_index on student_assessment;
再次analysis:
school=> explain analyze select *
from student_assessment
where score between 50 and 80;
OTICE: QUERY PLAN:
Index Scan using student_assessment_score_index on student_assessment
(cost=0.00..3533.07 rows=1250 width=12)
(actual time=0.42..496.26 rows=77837 loops=1)
Total runtime: 554.51 msec
效率提高了!!!!!
最后做一个总结吧对,如何针对特定的query,建立何种索引,下面是一些业界良心的准则,我不翻译了,翻译了还会误导人,大家可以一条,一条看:
Index Choice and Optimization Heuristics
Choosing the best index structure for a table depends upon the types of queries that run against that table. The index structures we have studied each have different properties. These properties allow the indexes to improve query performance in some but not all circumstances. Consider hashing which maps the key of a record to the address of the disk page where that record is stored. This type of index is useful for queries where equality selection takes place. Hashing, however, may not improve performance on range queries as the data file may not be ordered based on the selection condition. In the case of range queries, a b-tree is a much better index choice. The following heuristics will help you choose what columns to index and what indexes to use for those columns.
1. When deciding whether or not to index a table, you must consider the size of the table and the size of the index. Indexes may not improve performance on tables that are small, e.g. that are stored on only a few disk blocks.
2. An index will most likely hurt performance for insert, update, and delete queries. Each update operation on an indexed table may require an update to the value stored in the index. Carefully consider the tradeoff between the benefit of indexing tables for select queries and the disadvantages of indexing for insert, delete, and update queries.
3. An index has greatest effectiveness when column values are highly selective.
4. Columns that are involved in equality selection conditions may be hashed. Hashing allows the address of each record's disk page to be calculated directly. Thus, an average O(1) operation is performed in finding the appropriate disk page.
5. Group by queries group all records for each unique value. A hash, or a clustered b-tree may improve the performance of such queries as the address of each unique value may be calculated using a hash function.
6. The sorting that is maintained by a b-tree allows quick access to data in order, potentially speeding up order by queries.
7. Those columns that are used as selection conditions for range queries should use an index that allows fast ordered access to the data, e.g. a b-tree.
8. Clustering a table orders the table based on the index value. This may be useful in speeding up range queries on data that is not highly selective.
9. The implementation of each DBMS varies. Both the analysis tools and optimizations strategies that work on one DMBS may not work on another. The indexing strategies we have examined will work on PostgreSQL. Some may not work on MySQL as MySQL only supports b-tree indexes; it does not hash indexes. Oracle, on the other hand, implements more indexes (such as bitmaps) than PostgreSQL does. Some optimization strategies used for Oracle may not work on PostgreSQL or MySQL.