Monitoring Informatica from Oracle is a reasonably straight forward exercise if you know where to look and what to look for. I’ll walk through a typical scenario I encounter on a regular basis at the client sites I work at.
During the course of a day-time ETL run (during performance testing) I was called to have a look at a mapping that was “running slow”. In the main, that’s usually the extent of the information available.
The first thing I did was log in to informatica’s workflow monitor to have a look at the performance statistics. Sure enough, there was a mapping which was processing 100 rows per second.
The source rows pretty much matched the target rows, so in this scenario there’s no evidence of a bottleneck such as a joiner or aggregator. If the target has processed around the same number of rows as the source, then rows are definitely making it through the mapping. However, this doesn’t rule out transformation related bottlenecks.
SOURCE QUALIFIER
Next, let’s have a look at the source in oracle. I’m using the view v$session_wait to see what the session is waiting on at the moment I run the query.
DW_STATS@MIDDWHP>select event, wait_time, seconds_in_wait, state
2 from v$session_wait
3 where sid = 963
4 /
EVENT WAIT_TIME SECONDS_IN_WAIT STATE
——————————————————————————————————————————–
SQL*Net message from client 0 0 WAITING
Elapsed: 00:00:00.04
DW_STATS@MIDDWHP>/
EVENT WAIT_TIME SECONDS_IN_WAIT STATE
———————————————————————————————————————————
SQL*Net message from client 0 0 WAITING
Elapsed: 00:00:00.00
DW_STATS@MIDDWHP>/
EVENT WAIT_TIME SECONDS_IN_WAIT STATE
———————————————————————————————————————————-
SQL*Net message from client 0 0 WAITING
In order to identify the SID which related to the source qualifier, in this case I simply viewed what SQL each session on the database was executing until I found the relevant one. Of course, this isn’t always possible – especially if the session you are looking for is a cache build which hasn’t started building its cache yet! But for this example, it sufficed.
So we can see from above that at the time I executed the query against v$session_wait the session was waiting on SQL*Net message from client. In other words, oracle is idle, waiting for the client to tell it to do something. In this case the client is Informatica. To get a bigger picture of what is happening to the session we can go to another view – v$session_event. This view provide wait time information for the life of the session, aggregated by event. So using this view we can see what the session has spent it’s time on during its entire life – not just real time.
DW_STATS@MIDDWHP>
1 select event, total_waits, time_waited
2 from v$session_event
3 where sid = 963
4* order by 3 desc
DW_STATS@MIDDWHP>/
EVENT TOTAL_WAITS TIME_WAITED
————————————————————————————————————————
SQL*Net message from client 7497 8094790
db file scattered read 4268 11449
db file sequential read 11703 2685
SQL*Net more data to client 5013 53
Elapsed: 00:00:00.00
DW_STATS@MIDDWHP>/
EVENT TOTAL_WAITS TIME_WAITED
———————————————————————————————————————-
SQL*Net message from client 7784 8095780
db file scattered read 4564 12864
db file sequential read 11978 2691
SQL*Net more data to client 5078 91
So we can see that over the course of the life of this session, it has been waiting mainly on informatica to send it something. However, we can also see that the counts for db file scattered read and db file sequential read are going up – so oracle is actually doing work, not JUST waiting on informatica. The event db file sequential read is a single block access, usually indicating an index access path. The event db file scattered read is a multi-block read, usually indicating a full table scan or a fast full index scan. So just from looking at the wait events, we can start to see that the source qualifier isn’t the bottleneck. Why??
If the Source Qualifier was the bottleneck we would see much more work going on in Oracle and much less time spent on the event SQL*Net message from client. One word of caution though before we move on – there is a situation where this sort of profile from v$session_event could still signify a problem – I’m not setting out rule of thumb here. This is just one typical example, so when applying this to your own environment be careful to take into consideration all other factors.
LOOKUP
The next logical step would be to have a look at the cache build session. However, in this case we see that rows are actually reaching the target, suggesting that the cache build has actually completed and therefore couldn’t be the bottleneck in this particular scenario. However, I thought I would just show you the profile of the cache build anyway, just for interest. When I ran the v$session_event sql, as above I get the following –
EVENT TOTAL_WAITS TIME_WAITED
————————————————————————————————————————
SQL*Net message from client 87374 4878878
db file scattered read 3834 2634
db file sequential read 994 690
SQL*Net more data to client 8637 36
As can be seen, the cache build session actually waited a much greater percentage of it’s time on SQL*Net message from client. The reason is because of the way informatica handles its connections to oracle. When an informatica sessions initializes, it creates all of the sessions to the source, lookup and target it will require during the life of the mapping. In this example, that meant it created 3 oracle sessions – one for the source qualifier, one for the cache build and one for the target. The first statement executed against the database is the source qualifier. Only when the source qualifier returns rows will the cache build sql be fired off against the database. When the cache build finishes and the rows from the source qualifier begin their journey through the transformations on their way to the target, the cache build oracle sessions is not ended. The sessions is kept open. Therefore, it is not unusual to see a cache build session to wait for SQL*Net message from client for the majority of its life.
Let’s move on to the target then;
TARGET
Let’s execute the same sql against the target and see what we get –
DW_STATS@MIDDWHP>select event, wait_time, seconds_in_wait, state
2 from v$session_wait
3 where sid = 854
4 /
EVENT WAIT_TIME SECONDS_IN_WAIT STATE
————————————————————————————————————————
db file sequential read 0 0 WAITING
Elapsed: 00:00:00.03
DW_STATS@MIDDWHP>/
EVENT WAIT_TIME SECONDS_IN_WAIT STATE
————————————————————————————————————————
db file sequential read 0 0 WAITING
Elapsed: 00:00:00.01
DW_STATS@MIDDWHP>/
EVENT WAIT_TIME SECONDS_IN_WAIT STATE
————————————————————————————————————————
db file sequential read 0 0 WAITING
Elapsed: 00:00:00.01
Looks like during our sample time oracle is spending it’s time on db file sequential read. As we have already stated, this wait event is single block read. But this is a target, why are we seeing reads? In this case, the sql being executed was straight INSERT statements. In order to modify a block, oracle has to read that block from disk. Perhaps this is what we are seeing? Is this just an INSERT in action? Let’s have a look at the overall life of the target session and see if that produces any more information;
DW_STATS@MIDDWHP> select event, total_waits, time_waited
2 from v$session_event
3 where sid = 854
4 order by 3 desc
5 /
EVENT TOTAL_WAITS TIME_WAITED
————————————————————————————————————————
db file sequential read 2043339 947769
SQL*Net message from client 7374 2878
log file switch completion 45 102
library cache pin 1 3
latch: cache buffers chains 169 2
So we can see that the target is spending the majority of it’s time reading blocks. The SQL*Net message from client is, again, more than likely the result of informatica’s connection handling i.e. this was the time the target was idle from the initialization of the mapping to the time when rows arrived at the target. When can check this assumption from looking at another of Oracle’s performance views – v$active_session_history (10g and above). This view holds data sampled from v$session_wait, thus providing a way to see a breakdown of the wait events during a particular time period. So if our assumption about the reason for SQL*Net message from client appears in this particular sessions history, then this event should NOT appear in the last 10 minutes (the target has been active for 2 hours at this stage). Let’s see;
DW_STATS@MIDDWHP>select event, sum(time_waited)
2 from v$active_session_history
where session_id = 854
4 and sample_time between sysdate-1/24/6 and sysdate
5 group by event
6 order by 2 desc
7 /
EVENT SUM(TIME_WAITED)
——————————————————————————————
db file sequential read 6647312
– 0
Elapsed: 00:00:00.20
So we can see that in the last ten minutes 100% of this sessions time was spent waiting on single block reads, so our assumption is holding up.
So from the data we have collected so far we can produce information like this –
Not too difficult to pick out the bottleneck now. It’s obviously the target.
But why is the target so slow? Can we breakdown that time spent in oracle even more to see exactly what it’s doing? One way we can do this is to have a look and see what objects it’s actually reading. Using the view we can get the object_id of the current object being operated on –
DW_STATS@MIDDWHP>select owner, object_name, object_type
2 from dba_objects
3 where object_id in ( select ROW_WAIT_OBJ#
4 from v$session
5 where sid = 933 )
6 /
OWNER OBJECT_NAME OBJECT_TYPE
———————————————————————————————————————–
DW_PROD PK_CUST_ACCOUNT_IDX INDEX
Elapsed: 00:00:01.24
DW_STATS@MIDDWHP>/
OWNER OBJECT_NAME OBJECT_TYPE
————————————————————————————————————————
DW_PROD AK_CUST_ACCOUNT_IDX INDEX
Elapsed: 00:00:01.24
DW_STATS@MIDDWHP>/
OWNER OBJECT_NAME OBJECT_TYPE
————————————————————————————————————————
DW_PROD AK_CUST_ACCOUNT_IDX INDEX
Elapsed: 00:00:01.24
From the above we can see that actually, the time isn’t being spent on the table itself, it’s being spent maintaining indexes. Just to check that assumption we’ll go back to v$active_session_history –
DW_STATS@MIDDWHP>l
1 select CURRENT_OBJ#, sum(time_waited)
2 from v$active_session_history
3 where session_id = 844
4 and sample_time between sysdate-1/24/6 and sysdate
5 group by CURRENT_OBJ#
6* order by 2 desc
DW_STATS@MIDDWHP>/
CURRENT_OBJ# SUM(TIME_WAITED)
———— —————-
2340991 2409213
2341005 146570
These object_id’s correspond to the same indexes we identified through v$session. So we can now say categorically that the mapping is going only as fast as the target, so therefore, if we make the target faster, we make the mapping faster.
This issue in this case turned out to be very simple – it was large concatenated indexes on the target table. When an insert is performed oracle needs to maintain those indexes, and if those indexes are large, then the time taken to perform that maintenance can be substantial. In this case the indexes were in fact the same size as the table itself! Marking the indexes unusable provided the desired result and produced a response profile that looked like this;
So from the above graphs, we can now see that the target is no longer the bottleneck in the system. Neither is the source. We’re in the position now, where the “bottleneck” is informatica. However, the above profile could be from a mapping that is absolutely fine. There are no obvious bottlenecks here. In order to identify further bottlenecks, we need to look inside informatica and at the performance metrics it produces. I’ll cover this is a later paper.
One thing to note is the rise in the Cache Build active time. The actual time taken to build the cache didn’t change, but because the overall time of the mapping went down dramatically, the cache build actually took up 10% of the overall time of the mapping, which in this case was reasonable.
Conclusion
We have seen from this simple example that it’s possible to gather information about informatica’s performance directly from oracle. By understanding how informatica handles it’s connections we can start to draw conclusions about the data we extract from oracle. This essentially allows a developer or DBA to identify where the current bottlenecks are and where expensive development or support time should be spent in the future in order to gain the maximum benefit.
When this data is readily available to developers / testers & DBA’s it can have a dramatic effect on the productivity and quality of the work produced, saving development and testing time and potentially negating the need to buy ever bigger boxes!