Oracle Performance Tuning 11g2 (5)

最新推荐文章于 2021-04-08 07:50:42 发布

csnkza7163

最新推荐文章于 2021-04-08 07:50:42 发布

阅读量130

点赞数

5 Automatic Performance Statistics

This chapter discusses the gathering of performance statistics. This chapter contains the following topics:

这章讨论收集性能统计数据。包含以下主题：

Overview of Data Gathering 数据收集概要
Overview of the Automatic Workload Repository AWR简介
Managing the Automatic Workload Repository 管理AWR

5.1 Overview of Data Gathering

To effectively diagnose performance problems, statistics must be available. Oracle Database generates many types of cumulative statistics for the system, sessions, and individual SQL statements. Oracle Database also tracks cumulative statistics on segments and services. When analyzing a performance problem in any of these scopes, you typically look at the change in statistics (delta value) over the period you are interested in. Specifically, you look at the difference between the cumulative value of a statistic at the start of the period and the cumulative value at the end.

Cumulative values for statistics are generally available through dynamic performance views, such as the V$SESSTAT and V$SYSSTAT views. Note that the cumulative values in dynamic views are reset when the database instance is shutdown. The Automatic Workload Repository (AWR) automatically persists the cumulative and delta values for most of the statistics at all levels except the session level. This process is repeated on a regular time period and the result is called an AWR snapshot. The delta values captured by the snapshot represent the changes for each statistic over the time period.

A metric is another type of statistic collected by Oracle Database. A metric is defined as the rate of change in some cumulative statistic. That rate can be measured against a variety of units, including time, transactions, or database calls. For example, the number database calls per second is a metric. Metric values are exposed in some V$ views, where the values are the average over a fairly small time interval, typically 60 seconds. A history of recent metric values is available through V$ views, and some data is also persisted by AWR snapshots.

A third type of statistical data collected by Oracle is sampled data. The active session history (ASH) sampler performs the sampling. ASH samples the current state of all active sessions. The database collects this data into memory, where you can access it with a V$ view. AWR snapshot processing also writes it to persistent storage.

A powerful tool for diagnosing performance problems is the use of statistical baselines. A statistical baseline is collection of statistic rates usually taken over time period where the system is performing well at peak load. Comparing statistics captured during a period of bad performance to a baseline helps discover specific statistics that have increased significantly and could be the cause of the problem.

AWR supports the capture of baseline data by enabling you to specify and preserve a pair or range of AWR snapshots as a baseline. Carefully consider the time period you choose as a baseline; the baseline should be a good representation of the peak load on the system. In the future, you can compare these baselines with snapshots captured during periods of poor performance.

Oracle Enterprise Manager is the recommended tool for viewing both real time data in the dynamic performance views and historical data from the AWR history tables. Enterprise Manager can also be used to capture operating system and network statistical data that can be correlated with AWR data.

为了有效的诊断数据库问题，统计数据是必须要有的。Oracle产生了许多类型的累积性统计数据，包含系统的，session的，和独立SQL语句的。数据库同时跟踪段和service的累积统计数据。当你在任何一个范围内（比如网络，SQL级别等等）分析某个性能问题时，关注你感兴趣的那段时间里更改过的统计数据（delta值，应该是高中数学上的那个变量名，或者是增量值）。特别是应该关注两个阶段之间的累积数据差值。

统计数据累积值通常通过 v$sesstat, v$sysstat 视图获得。注意这些值在数据库实例重启后将重置。AWR将除了session级别以外的累积或者增量统计数据持久化（即写到硬盘上了，一般在sysaux表空间里）。这个收集过程，间隙性（定期性）的去收集，最终的统计结果我们称之为AWR快照（收集一次就产生一个快照）。这个被快照捕捉到的增量统计数据代表了自上一次以来的一个变化。

度量是oracle收集的另一类统计数据。（度量翻译着看很不舒服，我将直接用英文来表示，只需要知道matric就是度量的意思就行） matric被定义为：增量统计变化的速率。这个速率是一系列被衡量的单元值，包含时间，事务，数据库调用。例如，每秒钟数据库调用的次数就是一个matric。matric的值存在于v$的动态视图中，视图中的值是一些非常小的时间间隙内的平均值，比如在60秒内的平均值。最近的历史matric值存在在v$视图中，有些数据在AWR快照中。

第三种oracle收集的统计数据类型是取样数据。ASH采样器执行这个采样工作。ASH采样当前的所有活跃session的状态。然后将这个采样数据放到v$视图中。AWR快照将这些信息写到硬盘上。

一个最强大的诊断性能问题工具是使用统计基线。统计基线是系统正常运行时，且运行在高峰期的一段时间内，系统收集的统计速率值的集合。性能差的时候获取的统计数据与这个基线值相比较，就有助于帮助我们发现特定的一些明显增加的统计值，这些可能就是导致问题发生的原因。

AWR支持指定和保留一对或一段时间内AWR的快照作为基线(至少需要两个快照才能确定一个基线范围，所以是一对。但是也可以由多个连续的快照组成一个基线)。在选择这些快照作为基线一部分时要考虑清楚；这个基线是在系统稳定运行在高峰期的表现。未来你需要将性能差的时候的快照与做的这个基线进行比较的。

建议使用OEM来查看实时的或历史的AWR报表。OEM也可以被用来捕捉操作系统和网络方面的统计数据，然后把他们作为AWR的一部分。

This section covers the following topics: 这节包括了下面一些主题：

Database Statistics 数据库统计数据
Operating System Statistics 操作系统统计数据
Interpreting Statistics 解析统计数据

5.1.1 Database Statistics

Database statistics provide information on the type of load on the database and the internal and external resources used by the database. This section describes some of the more important statistics.

数据库的统计数据提供了各种类型负荷下数据库的信息，以及数据库内部和外部的资源数据。本章描述那么更加重要的一些统计数据。

5.1.1.1 Wait Events 等待事件

Wait events are statistics that are incremented by a server process or thread to indicate that it had to wait for an event to complete before being able to continue processing. Wait event data reveals various symptoms of problems that might be impacting performance, such as latch contention, buffer contention, and I/O contention.

To enable easier high-level analysis of the wait events, events are grouped into classes. The classes include: Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System I/O, and User I/O.

The wait classes are based on a common solution that usually applies to fixing a problem with the wait event. For example, exclusive TX locks are generally an application level issue and HW locks are generally a configuration issue.

等待事件是指一个server process显示出它必须去等待某一事件出现才能继续它的工作的一种逐步增加的统计数据。等待事件提示出可能影响性能的一些问题征兆，例如latch竞争，buffer竞争，io竞争。

为了更容易的分析这些等待事件，我们将其归成类。这些类包括了：Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System I/O, and User I/O. 这些在v$system_wait_class中wait_class列中可以看到一些。

事件类基于通用的解决方案，即可以被用作解决等待事件问题。例如，排他的TX锁通常是一个应用级别的问题（Application），HW锁通常是配置的问题（Configuration）。

下面的列表中包含通用的的一些类中的等待事件例子：

The following list includes common examples of the waits in some of the classes:

Application: locks waits caused by row level locking or explicit lock commands 行级锁导致的锁等待或者显示的锁命令
Commit: waits for redo log write confirmation after a commit 在commit后等待redo写入完成
Idle: wait events that signify the session is inactive, such as SQL*Net message from client 会话非活跃等待，比如等等客户端的消息（在v$session中，如果客户端没有操作的话，查看wait,wait_class,wait_class#等等就是上面那个等待语句）
Network: waits for data to be sent over the network 等待数据通过网络发送出去
User I/O: wait for blocks to be read off a disk 等待从磁盘中读取数据块

Wait event statistics for an instance include statistics for both background and foreground processes. Because you would typically focus your effort in tuning foreground activities, overall instance activity is broken down into foreground and background statistics in the relevant V$ views to facilitate tuning.

The V$SYSTEM_EVENT view shows wait event statistics for the foreground activities of an instance and the wait event statistics for the instance. The V$SYSTEM_WAIT_CLASS view shows these foreground and wait event instance statistics after aggregating to wait classes. V$SESSION_EVENT and V$SESSION_WAIT_CLASS show wait event and wait class statistics at the session level.

实例级的等待事件统计包括后台和前台进程的统计。因为你通常主要集中在调优前台进程活动上，总的实例活动被分解成前台和后台统计数据，存储在v$ 视图中以协助调优。

v$system_event 显示前台活跃等待事件统计，和实例等待事件统计。

v$system_wait_class 显示在聚集到类的前台和实例等待事件统计（通过这个视图可以了解一类的总的统计，理解成是v$system_event按wait_class进行group by一下）。

v$session_event, v$session_wait_class 显示session级别的统计。

5.1.1.2 Time Model Statistics

When tuning an Oracle database, each component has its own set of statistics. To look at the system as a whole, it is necessary to have a common scale for comparisons. For this reason, most Oracle Database advisories and reports describe statistics in terms of time. In addition, the V$SESS_TIME_MODEL and V$SYS_TIME_MODEL views provide time model statistics. Using the common time instrumentation helps to identify quantitative effects on the database operations.

The most important of the time model statistics is DB time. This statistics represents the total time spent in database calls and is an indicator of the total instance workload. It is calculated by aggregating the CPU and wait times of all sessions not waiting on idle wait events (non-idle user sessions).

DB time is measured cumulatively from the time of instance startup. Because DB time it is calculated by combining the times from all non-idle user sessions, it is possible that the DB time can exceed the actual time elapsed after the instance started. For example, an instance that has been running for 30 minutes could have four active user sessions whose cumulative DB time is approximately 120 minutes.

The objective for tuning an Oracle system could be stated as reducing the time that users spend in performing some action on the database, or simply reducing DB time. Other time model statistics provide quantitative effects (in time) on specific actions, such as logon operations and hard and soft parses.

当优化数据库时，每个数据库部件都有自己的统计数据集。将系统将为一个整体看，需要有一个通用的比较尺度。基于这个原因，数据库的许多建议和报告都依据时间进行统计描述。另外， v$sess_time_model, v$sys_time_model 提供了时间模型统计。使用通用的时间手段有助于数据库操作的量化统计。

最重要的一个时间模型统计是DB TIME。这个统计代表了总的数据库调用时间，以及指示了总的实例压力（数据库的负荷）。它的计算方法是： CPU时间 + （未花费在idle等待事件上的）所有session所花费的等待时间（非idle的用户会话） ---&gt db time = cpu time + session wait time(non-idle user session)。上面介绍过比如连接到数据库以后，没有做任何的操作时，server process就是牌Idle状态的;只能接收到sql语句才会从Idle中醒过来。因此这个DB TIME就是所有系统花在非空闲状态的总时间。

DB TIME是从实例启动就开始累计了。因为DB TIME是一个包括了所有non-idle用户会话的时间，那么DB TIME有可能会超过实例启动后的总时间。例如，一个实例启动30分钟，但是一启动便有4个session都花费了30分钟去执行，那么DB TIME就大约是120分钟了。（也就是说session是并发的，他们在多CPU上是并发执行的，因此这个时间在生产系统上一定会超过数据库实例启动以来的总时间的。可能从v$instance中查看实例启动的时间，也可能从alert日志中查看）

数据库优化的目标就是减少用户花费在数据库上的时间，或者简单说就是减少DB TIME。其他的一个时间模型统计提供了特定操作的量化结果，比如登录操作，硬软解析操作。

5.1.1.3 Active Session History

The V$ACTIVE_SESSION_HISTORY view provides sampled session activity in the instance. Active sessions are sampled every second and are stored in a circular buffer in SGA. Any session that is connected to the database and is waiting for an event that does not belong to the Idle wait class is considered as an active session. This includes any session that was on the CPU at the time of sampling.

Each session sample is a set of rows and the V$ACTIVE_SESSION_HISTORY view returns one row for each active session per sample, returning the latest session sample rows first. Because the active session samples are stored in a circular buffer in SGA, the greater the system activity, the smaller the number of seconds of session activity that can be stored in the circular buffer. This means that the duration for which a session sample appears in the V$ view, or the number of seconds of session activity that is displayed in the V$ view, is completely dependent on the database activity.

As part of the AWR snapshots, the content of V$ACTIVE_SESSION_HISTORY is also flushed to disk. Because the content of this V$ view can get quite large during heavy system activity, only a portion of the session samples is written to disk.

By capturing only active sessions, a manageable set of data is represented with the size being directly related to the work being performed rather than the number of sessions allowed on the system. Using ASH enables you to examine and perform detailed analysis on both current data in the V$ACTIVE_SESSION_HISTORY view and historical data in the DBA_HIST_ACTIVE_SESS_HISTORY view, often avoiding the need to replay the workload to gather additional performance tracing information. ASH also contains execution plan information for each captured SQL statement. You can use this information to identify which part of SQL execution contributed most to the SQL elapsed time. The data present in ASH can be rolled up on various dimensions that it captures, including the following:

v$active_session_history 视图提供了实例的采样会话活动。 active session是每隔1秒钟采样一次，并且被存储在一个SGA中的循环缓存中。任何连接到数据库，正在等待事的session，只要它不是属于idle状态，就属于atcive session。这也包含了正在使用CPU资源的session采样。（也就是说属于DB TIME要统计范围的session都要被采样，如果在采样时有多个session在运行那么就记录多条，如果没有active session，那么在v$active_session_history表中就没有此时间的记录）

每个会话采样是一组记录（几行记录），v$active_session_history 视图每次采样为每个active session记录一行，返回最近的记录（按照sample_id,sample_time进行逆序排列的）。因为active session采样被存储在一个循环的SGA缓存中，那么系统压力越大（活跃性越强），那个这个缓存中能够存储的数据就越少（即只能存储几秒钟的数据了）。这就意味着一个session能够出现在v$视图中，或者能够存储几秒的活跃会话在v$视图中，是完全依赖于数据库的活跃性的（压力程度）

v$active_session_history 的内容作为awr快照的一部分也被存储到硬盘上。因为在系统繁忙时 v$ 视图里的内容非常的多，所以只有一部分的session采样被写到硬盘上了。

通过仅仅捕捉active session, 一个可管理的数据集展现出来，连同直接相关联的正在执行的任务规模，而不是目前系统中允许的session数。使用ASH，你将能检查和展开更详细的分析，在v$active_session_history 视图和 dba_hist_active_sess_history的历史视图中，这样常常可以避免系统重放的需要以便去收集额外的跟踪性能信息。ASH 还包含了每个捕捉到的SQL的执行计划信息。你可以使用这些信息去确定哪部件SQL执行消耗了时间。ASH中的数据可以被分成几个维度，包括以下：

SQL identifier of SQL statement ---&gt SQL语句
SQL plan identifier and hash value of the SQL plan used to execute the SQL statement ---&gt SQL的hash值
SQL execution plan information ---&gt 执行计划信息
Object number, file number, and block number ---&gt 对象号，文件号，块号（可以关联dba_objects,dba_data_files,dba_extents）
Wait event identifier and parameters ---&gt 等待事件
Session identifier and session serial number ---&gt sid和serial#(可以关联v$session)
Module and action name ---&gt module和action name（在v$session中的module,action列）
Client identifier of the session ---&gt session的客户端标识
Service hash identifier ---&gt service的hash标识（可以看到是从怎样连接上来的，如果按service分资源组的话，都可以推断出下来的consumer组）
Consumer group identifier ---&gt 消费组标识（可以看到当时它在哪个消费组中）

You can gather ASH information over a specified duration into a report.

Active session history sampling is also available for Active Data Guard physical standby instances and Oracle Automatic Storage Management (Oracle ASM) instances. On these instances, the current session activity is collected and displayed in the V$ACTIVE_SESSION_HISTORY view, but not written to disk.

你可以定期的将收集的ASH信息写到报表中。

活跃会话历史采样同样适用于 physical standby实例和ASM实例。在这些实例中，信息仍然收到到了v$active_session_history视图中，但是没有被写到硬盘上。（也就是说虽然有数据，但是在高峰期这些数据基本上看不到，因此收集了等于没有收集，中国电费很贵的！）

5.1.1.4 System and Session Statistics

A large number of cumulative database statistics are available on a system and session level through the V$SYSSTAT and V$SESSTAT views.

更大的一部分累积数据库统计是存储在 v$sysstat, v$sesstat 视图中的

5.1.2 Operating System Statistics

Operating system statistics provide information on the usage and performance of the main hardware components of the system, and the performance of the operating system itself. This information is crucial for detecting potential resource exhaustion, such as CPU cycles and physical memory, and for detecting bad performance of peripherals, such as disk drives.

Operating system statistics are an indication of how the hardware and operating system are working. Many system analysts react to a hardware resource shortage by installing more hardware. This is a reactionary response to a series of symptoms shown in the operating system statistics. It is best to consider operating system statistics as a diagnostic tool, similar to the way doctors use body temperature, pulse rate, and patient pain when making a diagnosis. To help identify bottlenecks, gather operating system statistics for all servers in the system under performance analysis.

操作系统统计数据提供信息：硬件部件的使用率和性能，以及操作系统本身的性能。在检查系统资源不足时这些系统非常重要，例如CPU运转和物理内存，极差的外围设备性能，例如磁盘驱动。

操作系统统计是一个关于硬件和操作系统工作的指示。许多系统分析反映硬件资源不足，需要增加硬件配置。这是一个对操作系统统计显示的一系列症状的保守响应。最好将操作系统统计作为一个诊断的工具，就像医生使用温度，脉搏率，患者痛苦程度做诊断一样。为了确定性能瓶颈，在性能分析时，收集所有系统的操作系统统计。

Operating system statistics include the following: 操作系统统计包含以下内容：

CPU Statistics CPU统计
Virtual Memory Statistics 虚拟内存统计
Disk I/O Statistics 磁盘IO统计
Network Statistics 网络统计

5.1.2.1 CPU Statistics

CPU utilization is the most important operating system statistic in the tuning process. Get CPU utilization for the entire system and for each individual CPU on multi-processor environments. Utilization for each CPU can detect single-threading and scalability issues.

Most operating systems report CPU usage as time spent in user space or mode and time spent in kernel space or mode. These additional statistics allow better analysis of what is actually being executed on the CPU.

On a system running Oracle Database, where only one application is typically running, the system runs database activity in user space. Activities required to service database requests (such as scheduling, synchronization, I/O, memory management, and process/thread creation and tear down) run in kernel mode. In a system where CPU is fully utilized, a healthy Oracle database runs between 65% and 95% in user space.

The V$OSSTAT view captures machine-level information in the database, making it easier for you to determine if hardware-level resource issues exist. The V$SYSMETRIC_HISTORY view shows a one-hour history of the Host CPU Utilization metric, a representation of percentage of CPU usage at each one-minute interval. The V$SYS_TIME_MODEL view supplies statistics on the CPU usage by the Oracle database. Using both sets of statistics enable you to determine whether the Oracle database or other system activity is the cause of the CPU problems.

在性能调优时，CPU使用率真是最重要的一个操作系统统计数据。获取整个系统的CPU使用率真，以及每个单独CPU的使用率。单独CPU统计可以侦测是单个线程和扩展性的问题。

多数操作系统会报告CPU使用率，以花费在用户空间或模式的时间和花费在内核空间或模式的时间（我个人通常叫内核模式，在读linux内核时这些词出现的会非常多）。这些额外的统计可以更好的分析CPU到底在执行些什么。

在一个运行oracle的系统上，通常只有一个应用在运行，数据库是运行在用户空间的（即通常来说数据库服务器只有数据库在运行，不会再让应用程序在上面跑）。而为数据库服务的请求运行在内核模式下（或者内核空间下）（比如：任务调度，同步，IO，内存管理，进程创建和删除）。在一个CPU合理高负荷运行的系统中，数据库应该使用65%~95%的CPU。（即如果CPU多数被操作系统占用了，那就有问题了）

v$osstat 视图捕捉机器级别的信息，当硬件资源问题出现时使你更容易地判断。（包括CPU个数，IDLE_TIME，BUSY_TIME，USER_TIME，SYS_TIME，IOWAIT_TIME，PHYSICAL_MEMORY_BYTES）

v$sysmetric_history 视图显示1小时的历史CPU使用度量，以1分钟的间隙去展示CPU使用率（CPU Usage Per Sec，CPU Usage Per Txn）。

v$sys_time_model 视图提供了数据库的CPU使用（DB time，DB CPU）。使用这几个统计数据，以确定是数据库还是其他程序消耗了CPU。

5.1.2.2 Virtual Memory Statistics

V irtual memory statistics should mainly be used as a check to validate that there is very little paging or swapping activity on the system. System performance degrades rapidly and unpredictably when paging or swapping occurs.

Individual process memory statistics can detect memory leaks due to a programming failure to deallocate memory taken from the process heap. These statistics are necessary to validate that memory usage does not increase after the system has reached a steady state after startup. This problem is particularly acute on shared server applications on middle tier computers where session state may persist across user interactions, and on completion state information that is not fully deallocated.

虚拟内存统计应该主要被用作检查验证是否有一些换页发生。一旦发生换页系统性能将迅速而不可预测的下降。

个别进程内存统计可以侦测由于程序未释放内存而造成内存泄露问题（一般是C语言malloc内存，这种分配的内存一定是在heap上的。如果程序死掉的话，操作系统会回收进程分配的内存，但最怕就是程序没死却一直在消耗内存；通常在系统运行稳定之后通过内存监控就可以得知是否有泄露发生，或者直接扫描程序也会发现此类情况）。这些统计是必须的，用于验证系统达到一个稳定状态后，内存的使用未一直增加。这种问题在共享服务应用的中间层上表现的比较严重，session状态可能存留在用户交互中，但在完成后却没有释放掉。

5.1.2.3 Disk I/O Statistics

Because the database resides on a set of disks, the performance of the I/O subsystem is very important to the performance of the database. Most operating systems provide extensive statistics on disk performance. The most important disk statistics are the current response time and the length of the disk queues. These statistics show if the disk is performing optimally or if the disk is being overworked.

Measure the normal performance of the I/O system; typical values for a single block read range from 5 to 20 milliseconds, depending on the hardware used. If the hardware shows response times much higher than the normal performance value, then it is performing badly or is overworked. This is your bottleneck. If disk queues start to exceed two, then the disk is a potential bottleneck of the system.

Oracle Database also maintains a consistent set of I/O statistics for the I/O calls it issues. These statistics are captured for both single and multi block read and write operations in the following dimensions:

因为数据库使用许多的硬盘，因此IO的性能对于数据库非常的重要。多数操作系统都提供了大量的硬盘性能统计。最重要的磁盘统计是当前的响应时间和磁盘队列长度。这些统计会显示出磁盘目前工作正常还是高负荷动作。

测量正常的IO系统性能；依据不同的硬件配置，通常磁盘单块读取时间是5~20微秒（我的虚拟机上大概是34微秒左右，通过DBMS_RESOURCE_MANAGER.CALIBRATE_IO可以测量出来）。假如硬件显示响应时间比正常要高的话，那么说明磁盘负荷很高或者出现问题了。这就是你的瓶颈。假如磁盘队列中有超过2个，那么磁盘就是系统一个潜在的瓶颈。

数据库也维护了一份关于IO调用的统计。这些统计是在单块或多块读或写操作时被捕捉的，表现在以下几个方面：

Consumer group 消费组

When Oracle Database Resource Manager is enabled, the V$IOSTAT_CONSUMER_GROUP view captures I/O statistics for all consumer groups that are part of the currently enabled resource plan. The database samples cumulative statistics every hour and stores them as historical statistics in the AWR.

当数据库资源管理功能打开的话， v$iostat_consumer_group 视图存储所有目前资源计划中的消费组的IO统计信息。数据库将每隔1小时采样一次将数据累积起来供AWR作为历史统计使用。

Database file 数据库文件

I/O statistics of database files that are or have been accessed are captured in the V$IOSTAT_FILE view.

数据库文件被访问的IO统计存储在 v$iostat_file 视图中

Database function 数据库功能

I/O statistics for database functions (such as the LGWR and DBWR) are captured in the V$IOSTAT_FUNCTION view.

数据库的后台IO统计（LGWR,DBWR所产生的IO消耗）存储在 v$iostat_function 视图中。

5.1.2.4 Network Statistics

You can use network statistics in much the same way as disk statistics to determine if a network or network interface is overloaded or not performing optimally. In today's networked applications, network latency can be a large portion of the actual user response time. For this reason, these statistics are a crucial debugging tool.

Oracle Database maintains a set of network I/O statistics in the V$IOSTAT_NETWORK view.

当网络接口负荷超标或者非最佳运行时，你可以像硬盘统计一样使用网络统计数据。在今天的网络应用中，网络延迟占据着用户响应时间的很大一部分。基于这个原因，这些统计数据是调试的重要工具。数据库将网络IO的统计信息存储在 v$iostat_network 视图中。

5.1.2.5 Operating System Data Gathering Tools

Table 5-1 shows the various tools for gathering operating statistics on UNIX.

下图显示了UNIX上的一些收集统计信息的工具

Table 5-1 UNIX Tools for Operating Statistics

Component	UNIX Tool
CPU	sar, vmstat, mpstat, iostat
Memory	sar, vmstat
Disk	sar, iostat
Network	netstat

5.1.3 Interpreting Statistics 统计解析

When initially examining performance data, you can formulate potential theories by examining your statistics. One way to ensure that your interpretation of the statistics is correct is to perform cross-checks with other data. This establishes whether a statistic or event is really of interest. Also, because foreground activities are tunable, it is better to first analyze the statistics from foreground activities before analyzing the statistics from background activities.

当开始检查性能数据时，你可以通过检查你的统计数据形成你的理论。一种确认你解析正确的方法是再检查其他的数据共同去验证。这样确认一个统计或事件是真正有用的。同时因为前台进程是可调优的，因此最好是先分析前台进程的统计数据，然后再分析后台的活动统计。

下面讨论一下一些常见的陷阱：

Some pitfalls are discussed in the following sections:

Hit ratios 命中率

When tuning, it is common to compute a ratio that helps determine whether there is a problem. Such ratios include the buffer cache hit ratio, the soft-parse ratio, and the latch hit ratio. Do not use these ratios as definitive identifiers of whether a performance bottleneck exists. Rather, use them as indicators. To identify whether a bottleneck exists, examine other related evidence.

当调优的时候，通常去计算一个比率以判断是否存在一个问题。一些比率包括缓存命中率，软解析率，以及latch命中率。不要使用这些比率去明确的判断一个性能瓶颈是否存在。而是把它作为一个参考。为了确认是否存在瓶颈，需要再去检查一下其他的证据。

Wait events with timed statistics 基于时间的等待事件

Setting TIMED_STATISTICS to true at the instance level directs the database to gather wait time for events, in addition to available wait counts. This data is useful for comparing the total wait time for an event to the total elapsed time between the data collections. For example, if the wait event accounts for only 30 seconds out of a 2-hour period, then little is to be gained by investigating this event, although it may be the highest ranked wait event when ordered by time waited. However, if the event accounts for 30 minutes of a 45-minute period, then the event is worth investigating.

在实例组设置 timed_statistics = true，让数据库收集等待事件，包含等待次数。这个数据对于比较某一数据收集段间一个事件的总等待时间与总的消耗时间。例如，假如在过去的2小时里有30秒的等待事件，那么调查这个事件就是没有太大意义的，尽管根据时间等待排序后它排在最前面也是没意义的。然而，如果在45分钟里有30分钟在等待，那么就需要去仔细地调查了。

Note:

Timed statistics are automatically collected for the database if the initialization parameter STATISTICS_LEVEL is set to TYPICAL or ALL. If STATISTICS_LEVEL is set to BASIC, then you must set TIMED_STATISTICS to TRUE to enable collection of timed statistics. Note that setting STATISTICS_LEVEL to BASIC disables many automatic features and is not recommended.

If you explicitly set DB_CACHE_ADVICE, TIMED_STATISTICS, or TIMED_OS_STATISTICS, either in the initialization parameter file or by using ALTER_SYSTEM or ALTER SESSION, then the explicitly set value overrides the value derived from STATISTICS_LEVEL.

时间统计在 statistics_level = [ typical | all ]时自动打开了。假如statistics_level = BASIC的话，那么你必须手动去打开timed_statistics。建议你永远别设置成BASIC。

假如你显式的设置了DB_CACHE_ADVICE, TIMED_STATISTICS 或者 TIME_OS_STATISTICS，无论是通过alter system还是alter session方式都将覆盖statistics_level的值。

Comparing Oracle Database statistics with other factors 使用其他因素对比数据库的统计

When looking at statistics, it is important to consider other factors that influence whether the statistic is of value. Such factors include the user load and the hardware capability. Even an event that had a wait of 30 minutes in a 45-minute period might not be indicative of a problem if you discover that there were 2000 users on the system, and the host hardware was a 64-node computer.

当查看统计数据的时候，考虑一些会影响统计数据的因素是非常重要的，即误导了统计数据的正确性。包括了用户的压力及硬件的能力。即使你发现在45分钟内有30分钟的等待事件也不意味着有问题，因为可能有2000个用户同时在访问数据库，同时主机是一个64-节点计算机（不清楚什么是64-node）

Wait events without timed statistics 非时间统计的等待事件

If TIMED_STATISTICS is false, then the amount of time waited for an event is not available. Therefore, it is only possible to order wait events by the number of times each event was waited for. Although the events with the largest number of waits might indicate the potential bottleneck, they might not be the main bottleneck. This can happen when an event is waited for a large number of times, but the total time waited for that event is small. The converse is also true: an event with fewer waits might be a problem if the wait time is a significant proportion of the total wait time. Without having the wait times to use for comparison, it is difficult to determine whether a wait event is really of interest.

假如timed_statistics设置成了FALSE，那么时间的统计是没有的。因此，只有一种可能就是通过每个事件的次数排序等待事件。尽管最大的等待时间事件可能暗示一个可能的瓶颈，他们可能不是主要瓶颈。这种情况在一个需要大量调用次数的等待时发生，但是单个的那个事件等待时间是比较小的。（简单说就是发生了10000次并发，每个并发处理时间是1ms，那么一共处理时间是10S，即使每个并发等待时间是0.1ms，加起来也有1S，这个时间不算什么，因为是10000个并发加起来的时间。但是如果有一个交易发生了2次，每次正常处理时间是0.1ms，但是等待事件发生了0.4S，那总的0.4S*2的等待时间是比10000个并发加起来的1S是少的，但是明显处理是慢的。因为人家10000个每一个等待时间只有0.1ms，是你2个交易0.4S = 400ms，因此人家的性能比你的高250倍。）。反过来也是正确的：一个等待事件仅仅发生几次，但是这个等待事件占据了等待事件的大部分时间就可能是一个问题。如果等待事件的发生次数作为比较，即使真是一个问题，判断问题所在也是非常困难的。

Idle wait events 空闲等待事件

Oracle Database uses some wait events to indicate if the Oracle server process is idle. Typically, these events are of no value when investigating performance problems, and they should be ignored when examining the wait events.

数据库使用一些等待事件指示那么idle的服务进程。通常情况下，这些事件中是没有值记录的，因此在统计的时候要把它们去掉（也就是说在where条件中加上 xxx <> ‘idle’）

Computed statistics 计算的统计

When interpreting computed statistics (such as rates, statistics normalized over transactions, or ratios), it is important to cross-verify the computed statistic with the actual statistic counts. This confirms whether the derived rates are really of interest: small statistic counts usually can discount an unusual ratio. For example, on initial examination, a soft-parse ratio of 50% generally indicates a potential tuning area. If, however, there was only one hard parse and one soft parse during the data collection interval, then the soft-parse ratio would be 50%, even though the statistic counts show this is not an area of concern. In this case, the ratio is not of interest due to the low raw statistic counts.

当研究计算的统计数据时（比如速率，事务的标准化统计，或者比率），使用交叉验证计算的统计与实际的统计数量值是非常重要的。这确保派生出来的比率是真正有用的：小的统计数可能使不寻常的比率大大折扣。例如，在一开始检查时，发现一个50%的软解析比率通常暗示一个可能的调优区域。但是假如现在只有1条硬解析和1个软解析发生在这段时间里，那么软解析就是50%，尽管这个统计数显示这个不是一个关注区域（感觉这里oracle写错了，应该是统计显示是要关注的地方，这样才能突出说明这个50%是没有意义的）。在这种情况下，这个比率就不应该是关注点，因为统计数太低。

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/1696240/viewspace-1175046/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/1696240/viewspace-1175046/

csnkza7163

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Oracle Performance Tuning 11g2 (5)

5 Automatic Performance Statistics This chapter discusses the gathering of performance statistics. This c...
复制链接

扫一扫