Database Design
Rule | Description | Value | Source | Problem Description |
1 | High Frequency queries having a high number of table joins. | >4 | Sys.dm_exec_sql_text Sys.dm_exec_cached_plans | High frequency queries with lots of joins can be too normalized for high OLTP scalability. |
2 | Frequently updated tables having # indexes. | >3 | Sys.indexes sys.dm_db_index_operational_stats | Excessive index maintenance for OLTP. |
3 | Big IOs Table Scans Range Scans | >1 | Perfmon object SQL Server Access Methods Sys.dm_exec_query_stats | A missing index flushes the cache. |
4 | Unused Indexes. | Index not in Sys.dm_db_index_usage_stats. If an index is NEVER used, it will not appear in the DMV sys.dm_db_index_usage_stats | Avoid Index maintenance for unused indexes. |
CPU
Rule | Description | Value | Source | Problem Description |
1 | Signal Waits | >25% | Sys.dm_os_wait_stats | Time in runnable queue is pure CPU wait. |
2 | Plan reuse | <90% | Perfmon object SQL Server Statistics | OLTP identical transactions should ideally have >95% plan reuse. |
3 | Parallelism: Cxpacket waits | >5% | Sys.dm_os_wait_stats | Parallelism reduces OLTP throughput. CXPACKET indicates that multiple CPUs are working in parallel, dividing up the query in smaller pieces. Ordinarily a well tuned OLTP application would not parallelize unless an index is missing, there is an incomplete WHERE clause, or the query is not a true OLTP transaction. |
Memory
Rule | Description | Value | Source | Problem Description |
1 | Page life expectancy | <300 sec | Perfmon object SQL Server Buffer Manager SQL Server Buffer Nodes | Page life expectancy is the average number of seconds a data page stays in cache. Low values could indicate a cache flush that is caused by a big read. Pure OLTP workloads do NOT issue big reads, thus possible missing index. |
2 | Page life expectancy | Drops by 50% | Perfmon object SQL Server Buffer Manager | Page life expectancy is the average number of seconds a data page stays in cache. Low values could indicate a cache flush that is caused by a big read. Pure OLTP workloads do NOT issue big reads, thus possible missing index. |
3 | Memory Grants Pending | >1 | Perfmon object SQL Server Memory Manager | Current number of processes waiting for a workspace memory grant. |
4 | SQL cache hit ratio | <90% | SQL cache hit ratio falls under 90% for sustained periods of time greater than 60 sec. | It is likely that large scans have to be performed, which in turn flushes out the buffer cache. |
IO
Rule | Description | Value | Source | Problem Description |
1 | Average Disk sec/read | >20 ms | Perfmon object Physical Disk | Reads should take 4-8 ms without any IO pressure. |
2 | Average Disk sec/write | >20 ms | Perfmon object Physical Disk | Writes (sequential) can be as fast as 1 ms for transaction log. |
3 | Big IOs Table Scans Range Scans | >1 | Perfmon object SQL Server Access Methods | A missing index flushes the cache. |
4 | If Top 2 values for wait stats are any of the following: ASYNCH_IO_COMPLETION IO_COMPLETION LOGMGR WRITELOG PAGEIOLATCH_x | Top 2 | Sys.dm_os_wait_stats | If top 2 wait_stats values include IO, there is an IO bottleneck. |
5 | Low bytes per sec. |
| Perfmon object Physical Disk |
|
Blocking
Rule | Description | Value | Source | Problem Description |
1 | Block percentage | >2% | Sys.dm_db_index_operational_stats | Frequency of blocks. |
2 | Block process report | 30 sec | Sp_configure profiler | Report of statements. |
3 | Average Row Lock Waits | >100ms | Sys.dm_db_index_operational_stats | Duration of blocks. |
4 | If Top 2 values for wait stats are any of the following: LCK_M_BU LCK_M_IS LCK_M_IU LCK_M_IX LCK_M_RIn_NL LCK_M_RIn_S LCK_M_RIn_U LCK_M_RIn_X LCK_M_RS_S LCK_M_RS_U LCK_M_RX_S LCK_M_RX_U LCK_M_RX_X LCK_M_S LCK_M_SCH_M LCK_M_SCH_S LCK_M_SIU LCK_M_SIX LCK_M_U LCK_M_UIX LCK_M_X | Top 2 | Sys.dm_os_wait_stats | If top 2 wait_stats values include locking, there is a blocking bottleneck. |
5 | High number of deadlocks | >5 per hour | Trace flag 1204 to display in the errorlog and or the profiler deadlock graph. | If the deadlock occurs with the same participant SQL commands or operations multiple times, it is likely that there is a locking problem. |
Network
Rule | Description | Value | Source | Problem Description |
1 | High network latency coupled with an application that has many round trips to the database. | Output queue length >2 | Perfmon object: Network Interface | Indicates that the latency between the application server and the database is high. Could be caused by significant network infrastructure between the application and the instance of SQL Server. |
2 | Network bandwidth is used up. | Packets Outbound Discarded Packets Outbound Errors Packets Received Discarded Packets Received Errors | Perfmon object: Network Interface | Dropped packets are detected. |
In summary, given the high volume of identical small transactions that characterize OLTP, transactions per second and resource usage can be improved as follows:
- Database designs usually keep the number of indexes to a functional minimum as every insert, update, and delete incurs index maintenance.
- CPU can be reduced with plan reuse and join reduction.
- IO performance can be reduced with good indexing, join reduction, and high page life expectancy.
- Memory is optimal when there are no sudden drops in Page Life Expectancy.
- Sorts can be limited with index usage. That is, a certain sort order is supported by an index that is sorted the same way, either ascending or descending.
- Blocking can be reduced with index design and short transactions.