Oracle8i Parallel Server Administration, Deployment, and Performance Release 2 (8.1.6) Part Number A76970-01 |
|
The chapter describes Oracle Parallel Server and Cache Fusion-related statistics and provides procedures that explain how to use these statistics to monitor and tune performance. This chapter also briefly explains how Cache Fusion resolves reader/writer conflicts in Oracle Parallel Server. It describes Cache Fusion's benefits in general terms that apply to most types of systems and applications.
The topics in this chapter include:
When a data block requested by one instance is in the memory cache of a remote instance, Cache Fusion resolves the read/write conflict using remote memory access, not disk access. The requesting instance sends a request for a consistent-read copy of the block to the holding instance. The Block Server Process (BSP) on the holding instance transmits the consistent-read image of the requested block directly from the holding instance's buffer cache to the requesting instance's buffer cache across a high speed interconnect.
As Figure 12-1 illustrates, Cache Fusion enables the buffer cache of one node to send data blocks directly to the buffer cache of another node by way of low latency, high bandwidth interconnects. This reduces the need for expensive disk I/O in parallel cache management.
Cache Fusion also leverages new interconnect technologies for low latency, user-space based, interprocessor communication. This potentially lowers CPU usage by reducing operating system context switches for inter-node messages. Oracle manages write/write contention using conventional disk-based Parallel Cache Management (PCM).
Cache Fusion only solves part of the block conflict resolution issue by providing improved scalability for applications that experience high levels of reader/writer contention. For applications with high writer/writer concurrency, you also need to accurately partition your application's tables to reduce the potential for writer/writer conflicts.
Cache Fusion improves application transaction throughput and scalability by providing:
Applications demonstrating high reader/writer conflict rates under disk-based PCM benefit the most from Cache Fusion. Packaged applications also scale more effectively as a result of Cache Fusion. Applications in which OLTP and reporting functions execute on separate nodes may also benefit from Cache Fusion.
Reporting functions that access data from tables modified by OLTP functions receive their versions of data blocks by way of high speed interconnects. This reduces the pinging of data blocks to disk. Performance gains are derived primarily from reduced X-to-S lock conversions and the corresponding reduction in disk I/O for X-to-S lock conversions.
Furthermore, the instance that was changing the cached data block before it received a read request for the same block from another instance would not have to request exclusive access to the block again for subsequent changes. This is because the instance retains the exclusive lock and the buffer after the block is shipped to the reading instance.
Because Cache Fusion exploits high speed IPCs, Oracle Parallel Server benefits from the performance gains of the latest technologies for low latency communication across cluster interconnects. Further performance gains can be expected with even more efficient protocols, such as Virtual Interface Architecture (VIA) and user-mode IPCs.
Cache Fusion reduces CPU utilization by taking advantage of user-mode IPCs, also known as "memory-mapped IPCs", for both Unix and NT based platforms. If the appropriate hardware support is available, operating system context switches are minimized beyond the basic reductions achieved with Cache Fusion alone. This also eliminates costly data copying and system calls.
User-mode IPCs, if efficiently implemented by hardware support, can reduce CPU use because user processes can communicate without using the operating system kernel. In other words, there is no need to switch from user execution mode to kernel execution mode.
Cache Fusion reduces expensive lock operations and disk I/O for data and undo segment blocks by transmitting consistent-read blocks directly from one instance's buffer cache to another. This can reduce the latency required to resolve reader/writer conflicts by as much as 90 percent.
Cache Fusion resolves reader/writer concurrency with approximately one tenth of the processing effort required by disk-based PCM, using little or no disk I/O. To do this, Cache Fusion only incurs overhead for processing the consistent-read request and for constructing a consistent-read copy of the requested block in memory and transferring it to the requesting instance. On some platforms this can take less than one millisecond.
The primary components affecting Cache Fusion performance are the interconnect and the protocols that process inter-node communication. The interconnect bandwidth, its latency, and the efficiency of the IPC protocol determine the speed with which Cache Fusion processes consistent-read block requests.
Once your interconnect is operative, you cannot significantly influence its performance. However, you can influence a protocol's efficiency by adjusting the IPC buffer sizes.
Interconnects that support Oracle Parallel Server and Cache Fusion use one of these protocols:
Oracle Parallel Server can use any interconnect product that supports these protocols. The interconnect product must also be certified for Oracle Parallel Server hardware cluster platforms.
Cache Fusion performance levels may vary in terms of latency and throughput from application to application. Performance is further influenced by the type and mixture of transactions your system processes.
The performance gains from Cache Fusion also vary with each workload. The hardware, the interconnect protocol specifications, and the operating system resource usage also affect performance.
If your application did not demonstrate a significant amount of consistent-read contention prior to Cache Fusion, your performance with Cache Fusion will likely remain unchanged. However, if your application experienced numerous lock conversions and heavy disk I/O as a result of consistent-read conflicts, your performance with Cache Fusion should improve significantly.
A comparison of the locking and I/O statistics for Oracle 8.1. and Oracle 8.0 reveals a major reduction of exclusive to shared lock requests and physical write I/O. The following section, "Monitoring Cache Fusion and Inter-Instance Performance", describes how to evaluate Cache Fusion performance in more detail.
This section describes how to obtain and analyze Oracle Parallel Server and Cache Fusion statistics to monitor inter-instance performance. Topics in this section include:
The main goal of monitoring Cache Fusion and Oracle Parallel Server performance is to determine the cost of global processing and quantify the resources required to maintain coherency and synchronize the instances. Do this by analyzing the performance statistics from several views as described in the following sections. Use these monitoring procedures on an ongoing basis to observe processing trends and to maintain processing at optimal levels.
Many statistics are available to measure the work done by different components of the database kernel, such as the cache layer, the transaction layer or the I/O layer. Moreover, timed statistics allow you to accurately determine the time spent on processing certain requests or the time waited for specific events.
From these statistics sources, work rates, wait time and efficiency ratios can be derived.
See Also:
Chapter 11 for additional suggestions on which statistics to collect and how to use them to compute performance ratios. |
Oracle collects Cache Fusion-related performance statistics from the buffer cache and DLM layers. Oracle also collects general Oracle Parallel Server statistics for lock requests and lock waits. You can use several views to examine these statistics.
Maintaining an adequate history of system performance helps you identify trends as these statistics change. This facilitates identifying contributors to increasing response times and reduced throughput. It would also be helpful in spotting workload changes and peak processing requirements.
Procedures in this section use statistics that are grouped according to the following topics:
As described in Chapter 11, consider maintaining statistics from the V$SYSSTAT view and the V$SYSTEM_EVENT view on a per second and per transaction basis to obtain a general profile of the workload. Relevant observations from these views are:
By maintaining these statistics, you can accurately estimate the effect of an increasing cost for a certain type of operation on transaction response times. Major increases in work rates or average delays also contribute to identifying capacity issues.
You must set the parameter TIMED_STATISTICS to TRUE for Oracle to collect statistics for most views discussed in the procedures in this section. The timed statistics from views discussed in this chapter are displayed in units of 1/100ths of a second.
The SQL script CATPARR.SQL creates parallel server data dictionary views. To run this script, you must have SYSDBA privileges.
CATALOG.SQL creates the standard V$ dynamic views, as well as:
You can rerun CATPARR.SQL if you want the EXT_TO_OBJ table to contain the latest information after you add extents. If you drop objects without rerunning CATPARR.SQL, EXT_TO_OBJ may display misleading information.
Tuning and performance information for the Oracle database is stored in a set of dynamic performance tables known as the "V$ fixed views". Each active instance has its own set of fixed views. In Oracle Parallel Server, you can query a global dynamic performance (GV$) view to retrieve the V$ view information from all qualified instances. A global fixed view is available for all of the existing dynamic performance views except for V$ROLLNAME, V$CACHE_LOCK, V$LOCK_ACTIVITY, and V$LOCKS_WITH_COLLISIONS.
The global view contains all the columns from the local view, with an additional column, INST_ID (datatype INTEGER). This column displays the instance number from which the associated V$ information was obtained. You can use the INST_ID column as a filter to retrieve V$ information from a subset of available instances. For example, the query:
SELECT * FROM GV$LOCK WHERE INST_ID = 2 or INST_ID = 5;
Retrieves information from the V$ views on instances 2 and 5.
Each global view contains a GLOBAL hint that creates a parallel query to retrieve the contents of the local view on each instance.
If you have reached the limit of PARALLEL_MAX_SERVERS on an instance and you attempt to query a GV$ view, one additional parallel server process will be spawned for this purpose. The extra process is not available for parallel operations other than GV$ queries.
If you have reached the limit of PARALLEL_MAX_SERVERS on an instance and issue multiple GV$ queries, all but the first query will fail. In most parallel queries, if a server process could not be allocated this would result in either an error or a sequential execution of the query by the query coordinator.
See Also:
|
Oracle collects global cache statistics at the buffer cache layer within an instance. These statistics include counts and timings of requests for global resources.
Requests for global locks on data blocks originate in the buffer cache of the requesting instance. Before a request enters the DLM, Oracle allocates data structures in the System Global Area to track the state of the request. These structures are called "lock elements".
To monitor global cache statistics, query the V$SYSSTAT view and analyze its output as described in the following procedures.
Complete the following steps to analyze global cache statistics.
Oracle responds with output similar to:
NAME VALUE -------------------------------------------------------------------- global cache cr blocks received 7372 global cache cr block receive time 2293 global cache cr blocks served 7882 global cache cr block serve time 60 global cache cr block send time 239 global cache cr block log flushes 119 global cache cr block log flush time 140 global cache cr timeouts 2 global cache cr requests blocked 0
The result, which should typically be about 15 milliseconds depending on your system configuration and volume, is the average latency of a consistent-read request round trip from requesting instance, to holding instance, and back to the requesting instance. If your CPU has limited idle time and your system typically processes long-running queries, the latency may be higher. However, it is possible to have an average latency of less than one millisecond.
Consistent-read server request latency can also be influenced by a high value for the DB_MULTI_BLOCK_READ_COUNT parameter. This is because a requesting process may issue more than one request for a block depending on the setting of this parameter. Correspondingly, the requesting process may wait longer.
Track the average BSP service time per request and the total round-trip time per request as presented in this step.
To determine which part of the service time correlates most with the total service time, derive the time waited for a log flush and the time spent in sending the completed request using the following two equations:
By calculating these averages, you can account for almost all the processing steps of a consistent read block request. The remaining difference between the total round-trip time and the BSP service time per request falls onto processing time in the LMD processing and network IPC time.
High convert times may indicate excessive global concurrency. A large number of global cache gets, global cache converts, and a rapid increase in average convert or get times indicates that there is excessive contention. Another cause may be that latencies for lock operations are high due to overall system workload or system problems. A reasonable value for a cache get is 20 to 30 milliseconds while converts should take 10 to 20 milliseconds on average.
Oracle increments global cache gets when a new lock on a resource is opened. A convert is counted when there is already an open lock and Oracle converts it to another mode.
The elapsed time for a get thus includes the allocation and initialization of new locks. If the average cache get or average convert times are excessive, your system may be experiencing timeouts.
If the global cache convert times or global cache get times are high, refer to statistics in the V$SYSTEM_EVENTS view to identify events with a high value for TIME_WAITED statistics.
The following describes additional Cache Fusion statistics that you may find useful in diagnosing global cache and Cache Fusion operations. Use these statistics to monitor all the major operations of a consistent block request.
When a process requests a consistent read for a data block that it cannot satisfy from its local cache, it sends a request to another instance. Once the request is complete, in other words, the buffer has been received, Oracle decrements the request count.
This statistic records the total time it took for consistent read requests to complete, in other words, the accumulated round-trip time for all requests for consistent read blocks.
This statistic identifies a request for a consistent read block that has a long delay and that has timed out. This could be due to system performance problems, a slow interconnect network or dropped network packets. The value of this statistic should always be 0.
This is the number of requests for a consistent read block served by BSP. Oracle increments this statistic when the block is sent.
This statistic represents the accumulated time it took BSP to fill all incoming requests. For each request, the start time is recorded immediately after BSP takes a request off the request queue. The interval is computed after the blocks is sent.
This is the time required by BSP to initiate a send of a consistent read block. For each request, timing starts when the block is sent and stops when the send has completed. It is a part of the serve time. Note that this statistic only measures the time it takes to initiate the send; it does not measure the time elapsed before the block arrives at the requestor.
For changes to buffers containing a version of a data block that the block sever process has produced, a log flush must be initiated. BSP handles the wait asynchronously by managing a completion queue. Once LGWR has completed flushing the changes to a buffer that is on the log flush queue, BSP can send it. Therefore it periodically checks the queue. Oracle increments this statistic when a log flush is queued.
This is the time waited for a log flush. It is part of the serve time.
Global lock statistics provide counts and timings for both PCM and non-PCM lock activity. Oracle collects global lock statistics from the DLM API layer. All Oracle clients to the DLM, of which the buffer cache is only one, make their requests to the DLM through this layer. Thus, global lock statistics include lock requests originating from all layers of the kernel, while global cache statistics relate to buffer cache Oracle Parallel Server activity.
Use the procedures in this section to monitor data from the V$SYSSTAT view to derive averages, latencies, and counts. This establishes a rough indicator of the Oracle Parallel Server workload generated by an instance.
Use the following procedures to view and analyze statistics from the V$SYSSTAT view for global lock processing.
SELECT * FROM V$SYSSTAT WHERE NAME LIKE 'global lock%';
Oracle responds with output similar to:
NAME VALUE ---------------------------------------------------------------- ---------- global lock sync gets 703 global lock async gets 12748 global lock get time 1071 global lock sync converts 303 global lock async converts 41 global lock convert time 93 global lock releases 573
Use your V$SYSSTAT output to perform the calculations and analyses described in the remaining procedures in this group of procedures.
If the result is more than 20 or 30 milliseconds, query the TIME_WAITED column in the V$SYSTEM_EVENTS view using the DESCEND keyword to identify which lock events are waited for most frequently using this query:
SELECT EVENT_TIME_WAITED, AVERAGE_WAIT FROM V$SYSTEM_EVENTS ORDER BY TIME_WAITED DESCEND;
Oracle increments global lock gets when a new lock on a resource is opened. A convert is counted when there is already an open lock and Oracle converts it to another mode.
The elapsed time for a get thus includes the allocation and initialization of new locks. If the average lock get or average lock convert times are excessive, your system may be experiencing timeouts.
If the global lock convert times or global lock get times are high, refer to statistics in the V$SYSTEM_EVENTS view to identify events with a high value for TIME_WAITED statistics.
If the result is more than 20 milliseconds, query the TIME_WAITED column in the V$SYSTEM_EVENTS view using the DESCEND keyword to identify the event causing the delay.
Synchronous lock gets includes, for example, global lock sync gets. These are usually performed for lock requests for resources other than cached data blocks. To determine the proportion of the time required for synchronous lock gets, divide global lock get time or global lock convert time by the corresponding number of synchronous operations.
Asynchronous lock operations include, for example, global lock async gets. These are typically lock operations for global cache locks. You can derive the proportion of the total time using the same calculation as for synchronous operations. In this way, the proportion of work and the cost of global cache lock requests and other lock requests can be determined.
Normally, if the proportion of global lock requests for resources other than global cache lock requests dominates the cost for all lock operations, the V$SYSTEM_EVENTS view shows high wait times for row cache locks, enqueues or library cache pins.
Oracle collects DLM resource, lock, and message statistics at the DLM level. Use these statistics to monitor DLM latency and workloads. These statistics appear in the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views.
These views record average convert times, count information, and timed statistics for each type of lock request. The V$DLM_CONVERT_LOCAL view shows statistics for local lock operations. The V$DLM_CONVERT_REMOTE view shows values for remote conversions. The average convert times in these views are in 100ths of a second.
The DLM workload is an important aspect of Oracle Parallel Server and Cache Fusion performance because each consistent-read request results in a lock request. High DLM workloads as a result of heavy request rates can adversely affect performance.
The DLM performs local lock operations entirely within the local node, or in other words, without sending messages. Remote lock operations require sending messages to and waiting for responses from other nodes. Most down-converts, however, are local operations for the DLM.
The following procedures for analyzing DLM resource, locks, and message statistics appear in two groups. The first group of procedures explains how to monitor DLM resources and locks. The second group explains how to monitor message statistics.
Use the following procedures to obtain and analyze statistics from the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views for DLM resource processing.
You must enable event 29700 to populate the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views. Do this by entering this syntax:
EVENT="29700 TRACE NAME CONTEXT FOREVER"
SELECT CONVERT_TYPE, AVERAGE_CONVERT_TIME, CONVERT_COUNT FROM V$DLM_CONVERT_LOCAL;
Oracle responds with output similar to:
CONVERT_TYPE AVERAGE_CONVERT_TIME CONVERT_COUNT -------------------------------------- -------------------- ------------- NULL -> SS 0 0 NULL -> SX 0 0 NULL -> S 1 146 NULL -> SSX 0 0 NULL -> X 1 92 SS -> SX 0 0 SS -> S 0 0 SS -> SSX 0 0 SS -> X 0 0 SX -> S 0 0 SX -> SSX 0 0 SX -> X 0 0 S -> SX 0 0 S -> SSX 0 0 S -> X 3 46 SSX -> X 0 0 16 rows selected.
SELECT * FROM V$DLM_CONVERT_REMOTE;
Oracle responds with output identical in format to the output for the V$DLM_CONVERT_LOCAL view.
Use your output from the V$DLM_CONVERT_LOCAL and V$DLM_CONVERT_REMOTE views to perform the calculation described in the following procedure.
SELECT r.CONVERT_TYPE, r.AVERAGE_CONVERT_TIME, l.AVERAGE_CONVERT_TIME, r.CONVERT_COUNT, l.CONVERT_COUNT, FROM V$DLM_CONVERT_LOCAL l, V$DLM_CONVERT_REMOTE r GROUP BY r.CONVERT_TYPE;
For a quick estimate of the CPU time spent by LMD, you can transform the wait time event for LMD presented in the V$SYSTEM_EVENT view. To do this, look for the event name "lkmgr wait for remote messages" that represents the time that the LMD process is idle. The TIME_WAITED column contains the accumulated idle time for LMD in units of hundredths of a second.
To derive the busy time, divide the value for TIME_WAITED by the length of the measurement interval after normalizing it to seconds. In other words, a value of 17222 centiseconds is 172.22 seconds. The result is the idle time of the LMD process, or the percentage of idle time. Subtract that value from 1 and the result is the busy time for the LMD process. This is a fairly accurate estimate when compared with operating system utilities that provide information about CPU utilization per process.
The DLM sends messages either directly or by using flow control. For both methods, the DLM attaches markers known as "tickets" to each message. The allotment of tickets for each DLM is limited. However, the DLM can re-use tickets indefinitely.
DLMs send messages directly until no more tickets are available. When an DLM runs out of tickets, messages must wait in a flow control queue until outstanding messages have been acknowledged and more tickets are available. Flow-controlled messaging is managed by the LMD process.
The rationing of tickets prevents one node from sending an excessive amount of messages to another node during periods of heavy inter-instance communication. This also prevents one node with heavy remote consistent-read block requirements from assuming control of messaging resources throughout a cluster at the expense of other, less-busy nodes.
The V$DLM_MISC view contains the following statistics about message activity:
Use the following procedures to obtain and analyze message statistics in the V$DLM_MISC view.
SELECT NAME, VALUE FROM V$DLM_MISC;
Oracle responds with output similar to:
STATISTIC# NAME VALUE ---------- ----------------------------------- ----------- 0 dlm messages sent directly 29520 1 dlm messages flow controlled 1851 2 dlm messages received 29668 3 dlm total incoming msg queue length 297 4 rows selected.
Use your output from the V$DLM_MISC view to perform the following procedure.
Oracle increments the value for "total incoming message queue length" whenever a new request enters the LMD process' message queue. When messages leave the LMD queue to begin processing, Oracle increments the value for "messages received".
The size of the queue may increase if a large number of requests simultaneously arrives at the LMD. This can occur when the volume of locking activity is high or when the LMD processes a large quantity of consistent-read requests. Typically, the average receive queue length is less than 10.
In addition to the global cache and global lock statistics that were previously discussed, you can also use statistics in the V$SYSSTAT view to measure the I/O workload related to global cache synchronization. There are three important statistics in the V$SYSSTAT view for this purpose:
DBWR forced writes occur when Oracle resolves inter-instance data block contention by writing the requested block to disk before the requesting node can use it.
Cache Fusion minimizes the disk I/O for consistent-reads. This can lead to a substantial reduction in physical writes and reads performed by each instance. Before Cache Fusion, a consistent-read requesting data from a remote instance could result in up to three write I/Os on the remote instance and three corresponding read I/Os for the requesting instance: one for the data block, one for the rollback segment header, and one for a rollback segment block.
You can obtain the following statistics to quantify the write I/Os required for global cache synchronization.
SELECT NAME, VALUE FROM V$SYSSTAT WHERE NAME IN ('DBWR forced writes', 'remote instance undo block writes', 'remote instance undo header writes', 'physical writes');
Oracle responds with output similar to:
NAME VALUE --------------------------------------------------------- ---------- physical writes 41802 DBWR cross-instance writes 5403 remote instance undo block writes 0 remote instance undo header writes 2 4 rows selected.
Where the statistic "physical writes" refers to all physical writes that occurred from a particular instance performed by DBWR, the value for "DBWR cross-instance writes" accounts for all writes caused by writing a dirty buffer containing a data block that is requested for modification by another instance. As cross-instance writes are also handled by DBWR, it follows that "DBWR cross-instance writes" is a subset of all "physical writes".
The other notable statistics, "remote instance undo block writes" and "remote instance undo header writes", refer to the number of times that Oracle writes a rollback segment block to disk because another instance intends to build a consistent read version of a data block but the information required to roll back the block are not in the instance's cache. Both are a subset of "DBWR cross-instance writes". Their significance for performance is less critical in Oracle8i because Cache Fusion reduces the need to "export" and "import" rollback information. In most cases, instances send the complete version of a data block by way of the interconnect to the requesting instance.
Note that every lock conversion from Exclusive (X) to Null (N) or from Exclusive (X) to Shared (S) is associated with a write to disk when the buffer under the lock is dirty. However, in Oracle8i, the number of X to S lock conversions is reduced because Cache Fusion does not require them. In most cases, the holding instance retains the X lock.
You should see a noticeable decrease in this ratio between this calculation and pre-Cache Fusion statistics.
The ratio shows how much disk I/O is related to writes to rollback segments. With Cache Fusion, this ratio should be very low.
The following formula computes the percentage of reads that are only for local work:
Where "lock buffers for read" represents the N to S lock conversions.
These so-called "forced reads" occur when a cached data block that was previously modified by the local instance had to be written to disk due to a "ping" from another instance and the block is then re-acquired by the local instance for a read.
See Also:
Chapter 11 for more observations regarding estimations of local and global work rates and percentages in Oracle Parallel Server clusters. |
This section describes how to analyze output from three views to quantify lock conversions by type. The tasks and the views discussed in this section are:
The V$LOCK_ACTIVITY view summarizes how many lock up- and down-converts have occurred during an instance's lifetime. X-to-N down-converts denote the number of times a lock was down-converted because another instance wanted to modify a resource.
The other major type of down-convert is X-to-S. This type of down-convert occurs when an instance reads a resource that was last modified by a local instance. Both types of lock conversions involve I/O. However, Cache Fusion should reduce X-to-S down-converts because they are not needed for buffer locks.
The V$CLASS_PING view summarizes lock conversion activity by showing whether disk I/O is occurring on the following classes of blocks:
All X_2_NULL_FORCED_WRITE and X_2_S_FORCED_WRITE conversions involve write I/O. In other words, values in the columns for each block class provide an indicator of the cause of the disk I/O.
The V$PING view helps identify "hot" blocks and "hot" objects. The sum of each column, FORCED_READS and FORCED_WRITES, indicates the actual pinging activity on a particular block or object.
All three views provide different levels of detail. If you suspect that pinging or Oracle Parallel Server itself is the cause of a performance problem, monitor the V$LOCK_ACTIVITY view to generate an overall Oracle Parallel Server workload profile. Use information from the V$LOCK_ACTIVITY view to record the rate at which lock conversions occur.
For more details, use the V$CLASS_PING view to identify the type of block on which lock conversions and pinging are occurring. Once you have identified the class, use the V$PING view to obtain details about a particular table or index and the file and block numbers on which there is significant lock conversion activity.
If your response time or throughput requirements are no longer being met, you would normally examine the V$LOCK_ACTIVITY, V$CLASS_PING, V$CACHE, V$PING or V$FILE_PING views. In addition, you might also examine:
In summary, a change in the application profile and the work rates typically warrant a detailed analysis using the above-mentioned views. Apart from diagnosing performance problems of existing applications, these views are also useful when developing applications or when deciding on a partitioning strategy.
Latches are low-level locking mechanisms that protect System Global Area data structures. Excessive contention for latches degrades performance.
Use the V$DLM_LATCH and V$LATCH_MISSES views to monitor latch contention within the DLM. These views show information about a particular latch, its statistics, and the location in the code from where the latch is acquired.
For normal operations, the value latch statistics is limited. In some cases, multiple latches can help increase the performance for certain layers by a small amount. High latch contention is often the result of either:
The following procedures are suggestions as to which information is available. Oracle does not recommend that you monitor these statistics on a regular basis and derive conclusions solely on the basis of latching issues. However, gathering this information might be useful to Oracle Support or Oracle's Development Staff. Also, latch tuning can be the object of advanced tuning activities, but in the majority of cases latch tuning will not be your actual performance problem.
On the other hand, record information from these procedures if the TIME_WAITED value for the "latch free" wait event is very high and ranks among the events that accrue the largest times as indicated by the V$SYSTEM_EVENT view.
Use the following procedures to analyze latch, Oracle Parallel Server, and DLM-related statistics.
SELECT * FROM V$LATCH;
Oracle responds with output similar to the following where the columns from left to right show the statistic name, gets, misses, and sleeps:
SELECT PARENT_NAME, "WHERE", SLEEP_COUNT FROM V$LATCH_MISSES ORDER BY SLEEP_COUNT DESCENDING;
Oracle responds with output similar to:
Use your V$LATCH and V$LATCH_MISSES output to perform the following procedures.
High numbers for misses usually indicate contention for the same resources and locks. Acceptable ratios range from 90 to 95%.
The following section describes how to use the V$SYSTEM_EVENTS view in more detail.
Data about Cache Fusion and Oracle Parallel Server events appears in the V$SYSTEM_EVENT view. To identify events for which processes have waited the longest, query the V$SYSTEM_EVENT view on the TIME_WAITED column using the DESCENDING keyword. The TIME_WAITED column shows the total wait time for each system event listed.
By generating an ordered list of event waits, you can easily locate performance bottlenecks. Each COUNT represents a voluntary context switch. The TIME_WAIT value is the cumulative time that processes waited for particular system events. The values in the TOTAL_TIMEOUT and AVERAGE_WAIT columns provide additional information about system efficiency.
Oracle recommends dividing the sum of values from the TOTAL_WAITS and TIME_WAITED columns by the number of transactions, as outlined in Chapter 11. Transactions can be defined as business transactions, for example, insurance quotes, order entry, and so on, or you can define them on the basis of "user commits" or "executions", depending on your perspective.
The goal is to estimate which event type contributes primarily to transaction response times, since in general:
By this rationale, the total wait time can be divided into subcomponents of the wait time, such as:
where tm is "time waited".
It is also useful to derive the total wait time by adding the individual events and then observing the percentages that are spent waiting for each event to derive the major cost factors for transaction response times. Reducing the time for the largest proportion of the waits will have the most significant effect on response time.
The following events appearing in the V$SYSTEM_EVENT output represent waits for Oracle Parallel Server events:
You can monitor other events in addition to those listed under the previous heading because performance problems may be related to Oracle Parallel Server. These events are:
If the time waited for global cache events is high relative to other waits, look for increased latencies, contention, or excessive system workloads using V$SYSSTAT statistics and operating system performance monitors. A high number of global cache busy or buffer busy waits indicates increased contention in the buffer cache.
In OLTP systems with data block address locking and a high degree of contention, it is not unusual when the global cache wait events represent a high proportion of the sum of the total time waited.
If a lot of wait time is used by waits for non-buffer cache resources as indicated by statistics in the rows "row cache lock", "enqueues", and "library cache pin", monitor the V$ROWCACHE and V$LIBRARYCACHE views for Oracle Parallel Server-related issues. Specifically, observe values in the DLM columns of each of these views.
Common Oracle Parallel Server problems arise from poorly managed space parameters or sequences that are not cached. In such cases, processes wait for row cache locks and enqueues and the V$ROWCACHE view will show a high number of conflicts for certain dictionary caches.
|
![]() Copyright © 1996-2000, Oracle Corporation. All Rights Reserved. |
|