Using the Oracle Wait Interface to Troubleshoot I/O Issues

from: http://www.pythian.com/news/275/using-the-oracle-wait-interface-to-troubleshoot-io-issues

 

By: Shakir Sadikali

Or, Why I Like Broccoli and Wait Events

There’s been a fair bit of discussion with respect to whose vegetables are the freshest in Ottawa. From one DBA to another, I’m here to tell you that St. Laurent Fruit and Vegetable (beside the Mideast Meat Market) is probably the place to go for the best prices on… well… fruits and vegetables. They also carry lots of bags, and they’re strategically placed so you’ll never need old men to help you get more :-)

What’s more interesting about grocery shopping is that it lets you daydream and go over all the good work you’ve done over the last week (assuming you go weekly and a certain amount good work done). Last week was particularly tough. One of our more stable clients had some serious “mystery” production problems. To paraphrase, their app that was running “OK” starting running “sucky”. I knew that nothing had changed from our end and so being the intrepid super-sleuths that we are, we determined that the problems were related to I/O. How? Here’s how (forgive me, but I haven’t really gone into massive amounts of detail on this one, it’s more of a general framework).

Step 1

Run a handy query originally written by Steve Adams. I use this as my very first step to troubleshooting anything that resembles an Oracle performance issue, and work from there.

-----------------------------------------------------------
--
-- Script:      waiters.sql
-- Purpose:     to count the waiters for each event type
-- For:         8.0 and higher
--
-- Copyright:   (c) 2000 Ixora Pty Ltd
-- Author:      Steve Adams
--
-----------------------------------------------------------
select /*+ ordered */
  substr(n.name, 1, 29)  event,
  t0,
  t1,
  t2,
  t3,
  t4,
  t5,
  t6,
  t7,
  t8,
  t9
from
  sys.v_$event_name  n,
  (select event e0, count(*)  t0 from sys.v_$session_wait group by event),
  (select event e1, count(*)  t1 from sys.v_$session_wait group by event),
  (select event e2, count(*)  t2 from sys.v_$session_wait group by event),
  (select event e3, count(*)  t3 from sys.v_$session_wait group by event),
  (select event e4, count(*)  t4 from sys.v_$session_wait group by event),
  (select event e5, count(*)  t5 from sys.v_$session_wait group by event),
  (select event e6, count(*)  t6 from sys.v_$session_wait group by event),
  (select event e7, count(*)  t7 from sys.v_$session_wait group by event),
  (select event e8, count(*)  t8 from sys.v_$session_wait group by event),
  (select event e9, count(*)  t9 from sys.v_$session_wait group by event)
where
  n.name != 'Null event' and
  n.name != 'null event' and
  n.name != 'rdbms ipc message' and
  n.name != 'pipe get' and
  n.name != 'virtual circuit status' and
  n.name not like '%timer%' and
  n.name not like '%slave wait' and
  n.name not like 'SQL*Net message from %' and
  n.name not like 'io done' and
  n.name != 'queue messages' and
  e0 (+) = n.name and
  e1 (+) = n.name and
  e2 (+) = n.name and
  e3 (+) = n.name and
  e4 (+) = n.name and
  e5 (+) = n.name and
  e6 (+) = n.name and
  e7 (+) = n.name and
  e8 (+) = n.name and
  e9 (+) = n.name and
  nvl(t0, 0) + nvl(t1, 0) + nvl(t2, 0) + nvl(t3, 0) + nvl(t4, 0) +
  nvl(t5, 0) + nvl(t6, 0) + nvl(t7, 0) + nvl(t8, 0) + nvl(t9, 0) > 0
order by
  nvl(t0, 0) + nvl(t1, 0) + nvl(t2, 0) + nvl(t3, 0) + nvl(t4, 0) +
  nvl(t5, 0) + nvl(t6, 0) + nvl(t7, 0) + nvl(t8, 0) + nvl(t9, 0)
/

It’s really a nifty “group by” from v$session_wait. So, if you really want, you can go straight there. This query showed me the following:

SQL*Net more data from client 1 1 1 1 1 1 1 1 1 1
latch free   2 2 2 2 2 2 2 2 2 2
db file sequential read  2 2 2 2 2 2 2 2 2 2
db file scattered read  3 3 3 3 3 3 3 3 3 3
enqueue    8 8 8 8 8 8 8 8 8 8
buffer busy waits  15 15 15 15 15 15 15 15 15 15

Ok, so now I know we definitely had a problem with enqueues and buffer_busy_waits. In addition, when I ran this a few times, I also encountered significant latch free waits. So what’s the deal? Poking around the web, you’ll come up with a few standard answers for each of these. To summarize, here’s a great paper on resolving latch free waits.

Looking in v$session_wait told me the latch I was waiting on was the kghalo latch for the shared_pool. This was expected. Our client was known for not binding any SQL, and so we were parsing everything. This wasn’t unusual.

The P2 column in v$session_wait tells you the latch# it’s waiting for, which can be looked up in v$latchname. In addition, for some system stats, you can run this:

select * from v$latch_misses order by 5 desc;
Step 2

What was unusual, however, were the absolute number of enqueues and buffer_busy_waits. Linking the sid from v$session_wait to v$session and then to v$sqlarea, a trend emerged. All the buffer_busy_waits were against the same object (in memory). In addition, the enqueues were also on the same object. How did I know this? Well, the waiting queries were one indication, however, I also looked at v$session_wait to get the information I needed.

When you look at buffer_busy waits, the P1 column gives you the file_name and the P2 column gives you the block#. P3 gives you the reason you’re waiting. Run the following to get the object you’re waiting on.

select * from dba_extents where file_id = [[your_file] and [your_block] between block_id and block_id + blocks;

Typically, you’re hitting hot blocks. You can try to use some of the techniques discussed here.

Step 3

Troubleshooting enqueue uses a similar strategy. For enqueues, the P2 column gives you the object_id of the object, and from there you figure out what’s going on.

select * from dba_objects where object_id = [your_P2_object];

In 10G, the interface is a little more useful, giving you the actual type of enqueue and the reason. In 8173, we’re not so fortunate. Either way, we managed to isolate a number of bad running queries this way, and it was obvious that we were hitting an I/O bottleneck based on one table in the database. My recommendation to the client was to check the disks. After they assured us that everything was okay, I went ahead and looked for query plan flips, etc.

At the end of 3 days of query tuning, one of the clients’ sys admins finally discovered that a battery had failed on the disk controller, causing all write cacheing to no longer be available. I’m sure there must have been a way to see this, but I just don’t know how.

To summarize folks, always check your batteries. But if you don’t, and your DBA starts lifting the hood and poking around, they’re sure to find tons of other stuff that can be fixed and that came to light only when the hardware didn’t behave nicely. Kinda like a nice stress test, eh?

What my team and I discovered was that the object in question needed some serious love, since the client does plan on growing their operation. Because of all the queries we fixed, their app is that much faster. So in the long run, this was good for us. Just like broccoli.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
在信号处理领域,DOA(Direction of Arrival)估计是一项关键技术,主要用于确定多个信号源到达接收阵列的方向。本文将详细探讨三种ESPRIT(Estimation of Signal Parameters via Rotational Invariance Techniques)算法在DOA估计中的实现,以及它们在MATLAB环境中的具体应用。 ESPRIT算法是由Paul Kailath等人于1986年提出的,其核心思想是利用阵列数据的旋转不变性来估计信号源的角度。这种算法相比传统的 MUSIC(Multiple Signal Classification)算法具有较低的计算复杂度,且无需进行特征值分解,因此在实际应用中颇具优势。 1. 普通ESPRIT算法 普通ESPRIT算法分为两个主要步骤:构造等效旋转不变系统和估计角度。通过空间平移(如延时)构建两个子阵列,使得它们之间的关系具有旋转不变性。然后,通过对子阵列数据进行最小二乘拟合,可以得到信号源的角频率估计,进一步转换为DOA估计。 2. 常规ESPRIT算法实现 在描述中提到的`common_esprit_method1.m`和`common_esprit_method2.m`是两种不同的普通ESPRIT算法实现。它们可能在实现细节上略有差异,比如选择子阵列的方式、参数估计的策略等。MATLAB代码通常会包含预处理步骤(如数据归一化)、子阵列构造、旋转不变性矩阵的建立、最小二乘估计等部分。通过运行这两个文件,可以比较它们在估计精度和计算效率上的异同。 3. TLS_ESPRIT算法 TLS(Total Least Squares)ESPRIT是对普通ESPRIT的优化,它考虑了数据噪声的影响,提高了估计的稳健性。在TLS_ESPRIT算法中,不假设数据噪声是高斯白噪声,而是采用总最小二乘准则来拟合数据。这使得算法在噪声环境下表现更优。`TLS_esprit.m`文件应该包含了TLS_ESPRIT算法的完整实现,包括TLS估计的步骤和旋转不变性矩阵的改进处理。 在实际应用中,选择合适的ESPRIT变体取决于系统条件,例如噪声水平、信号质量以及计算资源。通过MATLAB实现,研究者和工程师可以方便地比较不同算法的效果,并根据需要进行调整和优化。同时,这些代码也为教学和学习DOA估计提供了一个直观的平台,有助于深入理解ESPRIT算法的工作原理。
The error message "The slave I/O thread is not running" indicates that the slave database server is not able to connect or synchronize with the master database server in a MySQL replication setup. Here are a few steps you can follow to troubleshoot and resolve this issue: 1. Check the connection settings: Verify that the slave database server has the correct configuration settings for connecting to the master database server. This includes the `master_host`, `master_user`, `master_password`, and `master_port` variables in the slave's MySQL configuration file (`my.cnf` or `my.ini`). 2. Verify network connectivity: Ensure that the slave database server can reach the master database server over the network. Check for any firewall rules or network restrictions that may be blocking the connection. 3. Check replication user permissions: Ensure that the replication user on the master database server has the necessary permissions to replicate data to the slave. The user should have the `REPLICATION SLAVE` privilege granted. 4. Check replication status: Use the following command on the slave database server to check the replication status: ``` SHOW SLAVE STATUS; ``` Look for any errors or warnings in the output, such as a failed connection or replication lag. Pay attention to fields like `Slave_IO_Running` and `Last_IO_Error` to identify potential issues. 5. Restart replication: If you have made any changes to the configuration or encountered errors, you can try restarting the replication process on the slave database server. Use the following commands: ``` STOP SLAVE; START SLAVE; ``` 6. Monitor logs: Check the MySQL error log file on both the master and slave database servers for any relevant error messages or warnings that might shed light on the issue. The log files are typically located at `/var/log/mysql/error.log` or `/var/log/mysqld.log`. If you are still unable to resolve the issue, it may be helpful to consult the MySQL documentation or seek assistance from the MySQL community or database administrators with expertise in MySQL replication.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值