【好文转载】Disk Layout

最新推荐文章于 2024-09-30 09:09:52 发布

cmupa22628

最新推荐文章于 2024-09-30 09:09:52 发布

阅读量166

点赞数

文章标签：数据库

The design and physical layout of the table spaces within a database will have a very big impact on the overall performance of the database and its applications. The type, design, and physical location of the table spaces will ultimately determine the efficiency of the I/O performed against that table space.

[@more@]

Data Placement Considerations

It is important to understand the following before beginning the physical design of a database.

Workload considerations

The type of workload that will be executed against the database must be taken into account when designing the physical layout of the database and its table spaces. To understand the performance issues associated with a database, it is helpful to have an understanding of the different database profiles and their unique workload characteristics.

Online Transaction Processing (OLTP)
Reporting/Decision Support Systems (DSSs)

OLTP databases are among the most mission-critical and widely deployed of any of the database types. Literally, millions of transactions encompassing billions of dollars are processed on OLTP systems around the world on a daily basis. The primary defining characteristic of OLTP systems is that the transactions are processed in real time, or online, and often require immediate response back to the user.

From a workload perspective, OLTP databases typically:

Process a large number of concurrent user sessions
Process a large number of transactions using simple SQL statements
Process a single database row at a time
Are expected to complete transactions in seconds, not minutes or hours

Reporting/DSS systems differ from the typical transaction-oriented systems in that they most often consist of data extracted from multiple-source systems for the purpose of supporting the end user:

Data analysis applications using predefined queries
Application-generated queries
Ad-hoc user queries
Reporting requirements

DSS systems typically deal with substantially larger volumes of data than do OLTP systems, due to their role in supplying users with large amounts of historical data. Whereas a database of a couple hundred GB would be considered large for an OLTP system, a large DSS system is normally many TB. The increased storage requirements of DSS systems can also be attributed to the fact that they often contain multiple, aggregated views of the same data.

Whereas OLTP queries tend to be centered around one specific business function, DSS queries are often substantially more complex. The need to process large amounts data results in many CPU-intensive database sort and join operations. The complexity and variability of these types of queries must be given special consideration when designing a DSS system for performance.

Available disk space

As a rule of thumb, the following formula can be used to determine the minimum amount of disk space required to support a database:

The space is required for the following:

Data and indexes
Database logs
Backup images
System paging space
Data export and staging space
Fudge factor

Although it is important to ensure that there is enough disk space to hold the database, the amount of disk space available does not have a significant impact on the performance of the database.

Number of physical disks available for the database

As a rule of thumb, there should be a minimum of six to ten disks per CPU for the database. This does not include the disks required for paging space, backup images, etc.

An OLTP database will typically scan an index, then retrieve a small number of rows from a table. Because a typical OLTP database will have a number of applications connected, in order to make the index scans as efficient as possible and to attempt to avoid any contention when reading the indexes and data, it is important to have the database spread across enough physical disks.

A reporting/DSS system will typically scan large amounts of data and retrieve a number of rows from one or more tables. Although there are normally few users accessing the system, the number of tables that are normally scanned within a single SQL statement will normally result in a heavy I/O load on the system.

The best way to ensure that the database is able to retrieve the data as quickly as possible is to spread the database across a large number of physical disks to take advantage of the I/O parallelism.

NOTE

There are also practical limits to the number of disks that a disk adapter can effectively handle. Use multiple disk adapters when using a large number of disks.

Rate of growth

For tables that will become very large, it is important to consider the maximum sizes of table spaces within DB2. The maximum size of a table space is determined by the page size used for the table space. The maximum table space sizes are shown in Table 1.5.

Table 1.5. Maximum Table Space Sizes

Page Size

Table Space Size

4 KB

64 GB

8 KB

128 GB

16 KB

256 GB

32 KB

512 GB

However, each page in DB2 can hold only 255 rows of data, regardless of the page size. Therefore, tables with short row lengths should be placed in a table space with 4-KB or 8-KB pages to avoid having pages with a lot of empty space.

Log placement

The factors that affect logging performance, such as log file size, the number of logs, and the type of underlying disks used for logs, will be discussed in a later chapter; however, before any log files ever exist, the database administrator must decide on the following factors:

On which disks are the logs to be placed?

In a highly active database, the log files will have a great deal of I/O. It is recommended that the log files reside on their own physical disk(s), separate from the rest of the database. These disks should be dedicated to DB2 logging to avoid the possibility of any other processes accessing or writing to these disks and causing contention.

Availability

Whether the database is configured to use circular or archive logging, the logs must be available and accessible at all times. In previous versions of DB2, this typically required operating system or disk mirroring or striping. However, in DB2 Version 7.2, the concept of dual logging (also known as mirrored logging) was built into DB2.

In DB2 Version 7.2, the registry variable DB2_NEWLOGPATH2 had to be set to Y to enable dual logging. In DB2 Version 8, there is a new database configuration parameter, mirror log path (MIRRORLOGPATH), which is used to specify the path for the mirrored logs.

Table-to-table space assignments

When deciding which table space that a table should be created in, consider the following:

Place tables that have referential constraints within the same table space.
Place tables that are related through the use of triggers in the same table space.
Place tables that share structured data types in the same table space.
Place tables that are used by the same applications in the same table space.
Place tables in the same table space for backup and recovery reasons.
Tables do NOT need to be put into different table spaces.

With DB2 Version 8, there is no need to place tables into separate table spaces simply to use the LOAD utility. The LOAD utility will now work online and no longer locks the entire table space.

Because an OLTP-type system normally scans an index, then retrieves individual rows, there is little chance of I/O contention. In a DSS system, DB2 will typically scan the indexes and build a RID list that is then used to retrieve the data. Because these are separate operations, there is little chance of I/O contention. Therefore, there is no need to separate a table's data and indexes into different table spaces, unless they will be assigned to use different buffer pools. However, in order to place the data and indexes on different physical disks, they must be placed in separate table spaces.

Data Placement Summary

Designing the physical layout of the database on disk is one of the most important decisions a database administrator will make, and it will have a big impact on the throughput of the database system.

There is a common misconception that, in order to get the best performance from a database, its indexes and data need to be put into different table spaces, and the table spaces need to be created on separate disks. Although separating data and indexes into separate table spaces does allow the table and index to be assigned to different buffer pools, there is no need to separate them onto different physical disks, especially if the server does not have a lot of physical disks. In fact, separating the data and indexes onto different disks can limit performance if the server does not have enough physical disks to parallelize the I/O adequately.

There are three general guidelines to consider when determining how to place the table spaces on the available physical disks:

Spread the table spaces across as many separate physical disks as possible
If there are only a small number of disks available (i.e., less than 6–10 disks per CPU), it is better to spread all of the table spaces across all of the available physical disks, rather than to attempt to carve up the disks for different table spaces.
If there are enough physical disks (i.e., at least 6–10 disks per CPU), place heavily accessed tables on different physical disks.

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/7437037/viewspace-914363/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/7437037/viewspace-914363/