24
Tuning Instance Recovery Performance

This chapter offers guidelines for tuning instance recovery.

This chapter contains the following sections:

Understanding Instance Recovery

Instance and crash recovery are the automatic application of redo log records to Oracle data blocks after a crash or system failure. If a single instance database crashes, or if all instances of an Oracle Parallel Server configuration crash, then Oracle performs instance recovery at the next startup. If one or more instances of an Oracle Parallel Server configuration crash, then a surviving instance performs recovery.

Instance and crash recovery occur in two phases. In phase one, Oracle applies all committed and uncommitted changes in the redo log files to the affected datablocks. In phase two, Oracle applies information in the rollback segments to undo changes made by uncommitted transactions to the data blocks.

How Oracle Applies Redo Log Information

During normal operations, Oracle's DBWn processes periodically write dirty buffers, or buffers that have in-memory changes, to disk. Periodically, Oracle records the highest system change number (SCN) of all changes to blocks, such that all data blocks with changes below that SCN have been written to disk by DBWn. This SCN is the checkpoint.

Records that Oracle appends to the redo log file after the change record that the checkpoint refers to are changes that Oracle has not yet written to disk. If a failure occurs, then only redo log records containing changes at SCNs higher than the checkpoint need to be replayed during recovery.

The duration of recovery processing is directly influenced by the number of data blocks that have changes at SCNs higher than the SCN of the checkpoint. For example, Oracle recovers a redo log with 100 entries affecting one data block faster than it recovers a redo log with 10 entries for 10 different data blocks. This is because for each log record processed during recovery, the corresponding data block (if it is not already in memory) must be read from disk by Oracle, so that the change represented by the redo log entry can be applied to that block.

Trade-offs of Minimizing Recovery Duration

The principal means of balancing the duration of instance recovery and daily performance is by influencing how aggressively Oracle advances the checkpoint. If you force Oracle to keep the checkpoint only a few blocks behind the most recent redo log record, then you minimize the number of blocks Oracle processes during recovery.

The trade-off for having minimal recovery time, however, is increased performance overhead for normal database operations. If daily operational efficiency is more important than minimizing recovery time, then decreasing the frequency of writes to the datafiles increases instance recovery time.

Tuning the Duration of Instance and Crash Recovery

There are several methods for tuning instance and crash recovery to keep the duration of recovery within user-specified bounds. For example:

Use initialization parameters to influence the number of redo log records and data blocks involved in recovery.
Size the redo log file to influence checkpointing frequency.
Use SQL statements to initiate checkpoints.
Parallelize instance recovery operations to further shorten the recovery duration.

The Oracle8i Enterprise Edition also offers fast-start fault recovery functionality to control instance recovery. This reduces the roll forward time by making it bounded and predictable, and it also eliminates the time required perform rollback. The foundation of fast-start fault recovery is fast-start checkpointing architecture. Instead of the conventional periodic checkpointing, as performed in earlier versions of Oracle, fast-start checkpointing occurs continuously, advancing the checkpoint time as blocks are written. Fast-start checkpointing always writes the oldest modified block first, ensuring that every write allows the checkpoint time to be advanced. Administrators specify a target (bounded) time to complete the roll forward phase of recovery, and Oracle automatically varies the checkpoint writes to meet that target.

Using Initialization Parameters to Influence Recovery Time

During recovery, Oracle performs two main tasks:

Reads redo logs to determine what has been changed.
Reads data blocks to determine whether to apply changes.

Fast-start checkpointing eliminates bulk writes and the resultant I/O spikes that occur with conventional checkpointing, yielding smooth and fast ongoing performance. Continuous advancement reduces roll forward by half, compared to conventional checkpoints at the same transaction rate. Administrators can specify a bound on the time to do roll forward, rather than specifying the frequency of checkpoints.

You can use three initialization parameters to influence how aggressively Oracle advances the checkpoint, as shown in Table 24-1:

Table 24-1 Initialization Parameters Influencing Checkpoints

Parameter	Purpose
LOG_CHECKPOINT_TIMEOUT	Limits the number of seconds between the most recent redo record and the checkpoint.
LOG_CHECKPOINT_INTERVAL	Limits the number of redo records between the most recent redo record and the checkpoint.
FAST_START_IO_TARGET	Limits instance recovery time by controlling the number of data blocks Oracle processes during instance recovery.

Note:
The FAST_START_IO_TARGET parameter is only available with the Oracle8i Enterprise Edition.

Using LOG_CHECKPOINT_TIMEOUT

Set the initialization parameter LOG_CHECKPOINT_TIMEOUT to a value n (where n is an integer) to require that the latest checkpoint position follow the most recent redo block by no more than n seconds. In other words, at most, n seconds worth of logging activity can occur between the most recent checkpoint position and the end of the redo log. This forces the checkpoint position to keep pace with the most recent redo block

You can also interpret LOG_CHECKPOINT_TIMEOUT as specifying an upper bound on the time a buffer can be dirty in the cache before DBWn must write it to disk. For example, if you set LOG_CHECKPOINT_TIMEOUT to 60, then no buffers remain dirty in the cache for more than 60 seconds. The default value for LOG_CHECKPOINT_TIMEOUT is 1800.

Note:
The minimum value for LOG_CHECKPOINT_TIMEOUT in the Standard Edition is 900, or 15 minutes. If you set the value below 900 in the Standard Edition, then Oracle rounds it to 900.

Using LOG_CHECKPOINT_INTERVAL

Set the initialization parameter LOG_CHECKPOINT_INTERVAL to a value n (where n is an integer) to require that the checkpoint position never follow the most recent redo block by more than n blocks. In other words, at most n redo blocks can exist between the checkpoint position and the last block written to the redo log. In effect, you are limiting the amount of redo blocks that can exist between the checkpoint and the end of the log.

Oracle limits the maximum value of LOG_CHECKPOINT_INTERVAL to 90% of the smallest log to ensure that the checkpoint advances into the current log before that log fills and a log switch is attempted.

LOG_CHECKPOINT_INTERVAL is specified in redo blocks. Redo blocks are the same size as operating system blocks. Use the LOG_FILE_SIZE_REDO_BLKS column in V$INSTANCE_RECOVERY to see the number of redo blocks corresponding to 90% of the size of the smallest log file.

Using FAST_START_IO_TARGET

Note:
The initialization parameter FAST_START_IO_TARGET and fast-start checkpointing are only available with the Oracle8i Enterprise Edition.
Oracle recommends using fast-start checkpointing to control the duration of the roll forward phase of recovery. This behavior is controlled by the FAST_START_IO_TARGET parameter. The parameter, DB_BLOCK_MAX_DIRTY_TARGET, is an Oracle8 parameter used to provide more limited control over roll forward duration, and it is included in Oracle8i only for backward compatibility.

Set this parameter to n, where n is an integer limiting to n the number of buffers that Oracle processes during crash or instance recovery. Because the number of I/Os to be processed during recovery correlates closely to the duration of recovery, the FAST_START_IO_TARGET parameter gives you the most precise control over the duration of recovery.

FAST_START_IO_TARGET advances the checkpoint, because DBWn uses the value of FAST_START_IO_TARGET to determine how much writing to do. Assuming that users are making many updates to the database, a low value for this parameter forces DBWn to write changed buffers to disk. As the changed buffers are written to disk, the checkpoint advances.

The smaller the value of FAST_START_IO_TARGET, the better the recovery performance, because fewer blocks require recovery. If you use smaller values for this parameter, however, then you impose higher overhead during normal processing, because DBWn must write more buffers to disk more frequently.

See Also:
For more information, see "Estimating Recovery Time" and "Calculating Performance Overhead". For more information on tuning checkpoints, see Chapter 20, "Tuning I/O". For more information about initialization parameters, see the Oracle8i Reference.

Using Redo Log Size to Influence Checkpointing Frequency

The size of a redo log file directly influences checkpoint performance. The smaller the size of the smallest log, the more aggressively Oracle writes dirty buffers to disk to ensure the position of the checkpoint has advanced to the current log before that log completely fills. Forcing the checkpoint to advance into the current log before it fills ensures that Oracle will not need to wait for the checkpoint to advance out of a redo log file before it can be reused. Oracle enforces this behavior by ensuring the number of redo blocks between the checkpoint and the most recent redo record is less than 90% of the size of the smallest log.

If your redo logs are small compared to the number of changes made against the database, then Oracle must switch logs frequently. If the value of LOG_CHECKPOINT_INTERVAL is less than 90% of the size of the smallest log, then the size of the smallest log file does not influence checkpointing behavior.

Although you specify the number and sizes of online redo log files at database creation, you can alter the characteristics of your redo log files after startup. Use the ADD LOGFILE clause of the ALTER DATABASE statement to add a redo log file and specify its size, or the DROP LOGFILE clause to drop a redo log.

The size of the redo log appears in the LOG_FILE_SIZE_REDO_BLKS column of the V$INSTANCE_RECOVERY dynamic performance. This value shows how the size of the smallest online redo log is affecting checkpointing. By increasing or decreasing the size of your online redo logs, you indirectly influence the frequency of checkpoint writes.

Using SQL Statements to Initiate Checkpoints

Besides setting initialization parameters and sizing your redo log files, you can also influence checkpoints with SQL statements. ALTER SYSTEM CHECKPOINT directs Oracle to record a checkpoint for the node, and ALTER SYSTEM CHECKPOINT GLOBAL directs Oracle to record a checkpoint for every node in a cluster.

SQL-induced checkpoints are heavyweight. This means that Oracle records the checkpoint in a control file shared by all the redo threads. Oracle also updates the datafile headers. SQL-induced checkpoints move the checkpoint position to the point that corresponded to the end of the log when the statement was initiated. These checkpoints can adversely affect performance, because the additional writes to the datafiles increase system overhead.

Monitoring Instance Recovery

Use the V$INSTANCE_RECOVERY view to see your current recovery parameter settings. You can also use statistics from this view to calculate which parameter has the greatest influence on checkpointing. V$INSTANCE_RECOVERY contains the columns shown in Table 24-2.

Table 24-2 V$INSTANCE_RECOVERY View

Column	Description
RECOVERY_ESTIMATED_IOS	The estimated number of blocks that would be processed during recovery. This estimate is based upon `FAST_START_IO_TARGET`, and it is not valid unless `FAST_START_IO_TARGET` is driving checkpointing behavior.
ACTUAL_REDO_BLKS	Current number of redo blocks required for recovery.
TARGET_REDO_BLKS	Goal for the maximum number of redo blocks to be processed during recovery. This value is the minimum of the next 4 columns.
LOG_FILE_SIZE_REDO_BLKS	Number of redo blocks to be processed during recovery to guarantee that a log switch never has to wait for a checkpoint. This is 90% of the smallest log file.
LOG_CHKPT_TIMEOUT_REDO_BLKS	Number of redo blocks that must be processed during recovery to satisfy `LOG_CHECKPOINT_TIMEOUT`.
LOG_CHKPT_INTERVAL_REDO_BLKS	Number of redo blocks that must be processed during recovery to satisfy `LOG_CHECKPOINT_INTERVAL`.
FAST_START_IO_TARGET_REDO_BLKS	Number of redo blocks that must be processed during recovery to satisfy `FAST_START_IO_TARGET`.

The value appearing in the TARGET_REDO_BLKS column equals a value appearing in another column in the view. This other column corresponds to the parameter or log file that is determining the maximum number of redo blocks that Oracle processes during recovery. The setting for the parameter in this column is imposing the heaviest requirement on redo block processing.

See Also:
For more information on the V$INSTANCE_RECOVERY view, see the Oracle8i Reference.

Determining the Strongest Checkpoint Influence

For example, assume your initialization parameter settings are as follows:

FAST_START_IO_TARGET = 1000
LOG_CHECKPOINT_TIMEOUT = 1800 # default
LOG_CHECKPOINT_INTERVAL = 0# default: disabled interval checkpointing

You execute the following query:

SELECT * FROM V$INSTANCE_RECOVERY;

Oracle responds with the following:

RECOVERY_ 
ESTIMATED_ 
IOS

ACTUAL_REDO_
BLKS

TARGET_REDO_
BLKS

LOG_FILE_ 
SIZE_REDO_ 
BLKS

LOG_CHKPT_ 
TIMEOUT_
REDO_BLKS

LOG_CHKPT_ 
INTERVAL_ 
REDO_BLKS

FAST_START_IO_
TARGET_REDO_ 
BLKS

4294967295

1 row selected.

As you can see by the values in the last three columns, the FAST_START_IO_TARGET parameter places heavier recovery demands on Oracle than the other two parameters. It requires that Oracle process no more than 4215 redo blocks during recovery. The LOG_FILE_SIZE_REDO_BLKS column indicates that Oracle can process up to 55,296 blocks during recovery, so the log file size is not the heaviest influence on checkpointing.

Note:
The value for LOG_CHKPT_INTERVAL_REDO_BLKS, 4294967295, corresponds to the maximum possible value indicating that this column does not have the greatest influence over checkpointing.

The TARGET_REDO_BLKS column shows the smallest value of the last five columns. This shows the parameter or condition that exerts the heaviest requirement for Oracle checkpointing. In this example, the FAST_START_IO_TARGET parameter is the strongest influence with a value of 4215.

Assume you make several updates to the database and query V$INSTANCE_RECOVERY three hours later. Oracle responds with the following:

RECOVERY_
ESTIMATED_
IOS

ACTUAL_
REDO_BLKS

TARGET_
REDO_BLKS

LOG_FILE_
SIZE_REDO_
BLKS

LOG_CHKPT_
TIMEOUT_
REDO_BLKS

LOG_CHKPT_
INTERVAL_
REDO_BLKS

FAST_START_
IO_TARGET_ 
REDO_BLKS

4294967295

1 row selected.

FAST_START_IO_TARGET is still exerting the strongest influence over checkpointing behavior, although the number of redo blocks corresponding to this target has changed dramatically. This change is not due to a change in FAST_START_IO_TARGET or the corresponding RECOVERY_ESTIMATED_IOS. Instead, this indicates that operations requiring I/O in the event of recovery are more frequent in the redo log, so fewer redo blocks now correspond to the same FAST_START_IO_TARGET.

Assume you decide that FAST_START_IO_TARGET is placing an excessive limit on the maximum number of redo blocks that Oracle processes during recovery. You adjust FAST_START_IO_TARGET to 8000, set LOG_CHECKPOINT_TIMEOUT to 60, and perform several updates. You reissue the query to V$INSTANCE_RECOVERY and Oracle responds with:

RECOVERY_
ESTIMATED_
IOS

ACTUAL_
REDO_BLKS

TARGET_
REDO_BLKS

LOG_FILE_
SIZE_REDO_
BLKS

LOG_CHKPT_
TIMEOUT_
REDO_BLKS

LOG_CHKPT_
INTERVAL_
REDO_BLKS

FAST_START_
IO_TARGET_ 
REDO_BLKS

4294967295

1 row selected.

Because the TARGET_REDO_BLKS column value of 6707 corresponds to the value in the LOG_CHKPT_TIMEOUT_REDO_BLKS column, LOG_CHECKPOINT_TIMEOUT is now exerting the most influence over checkpointing behavior.

Estimating Recovery Time

Use statistics from the V$INSTANCE_RECOVERY view to estimate recovery time using the following formula:

For example, if RECOVERY_ESTIMATED_IOS is 2500, and the maximum number of writes your system performs is 500 per second, then recovery time is 5 seconds. Note the following restrictions:

The value for the maximum I/Os per second that the system can perform is difficult to measure accurately. You can estimate this value by measuring the total number of reads and writes your system can perform under a peak load. The V$FILESTAT view provides information on the number of physical reads and writes performed since the instance started. Measure these values over a set time interval, and then divide this by the time interval to estimate your system's maximum I/Os per second. The following query can be used to measure the total I/Os since the instance started:
```
SELECT sum(PHYBLKRD+PHYBLKWRT) 
FROM v$filestat;
```
There is no guarantee the system will sustain the I/O rate during recovery.
This estimate for recovery time is only valid when FAST_START_IO_TARGET is both enabled and when this parameter is the determining influence on checkpointing behavior.

To adjust recovery time, change the initialization parameter that has the most influence over checkpointing. Use the V$INSTANCE_RECOVERY view as described in "Monitoring Instance Recovery" to determine which parameter to adjust. Then, either adjust the parameter to decrease or increase recovery time as required.

Adjusting Recovery Time: Example Scenario

For example, assume as in "Determining the Strongest Checkpoint Influence" that your initialization parameter settings are the following:

FAST_START_IO_TARGET = 1000
LOG_CHECKPOINT_TIMEOUT = 1800 # default
LOG_CHECKPOINT_INTERVAL = 0 # default: disabled interval checkpointing

You execute the following query:

SELECT * FROM V$INSTANCE_RECOVERY;

Oracle responds with the following:

RECOVERY_
ESTIMATED_
IOS

ACTUAL_
REDO_BLKS

TARGET_
REDO_BLKS

LOG_FILE_
SIZE_REDO_
BLKS

LOG_CHKPT_
TIMEOUT_
REDO_BLKS

LOG_CHKPT_
INTERVAL_
REDO_BLKS

FAST_START_
IO_TARGET_ 
REDO_BLKS

4294967295

1 row selected.

You calculate recovery time using the formula, where RECOVERY_ESTIMATED_IOS is 1025 and the maximum I/Os per second the system can perform is 500:

You decide you can afford slightly more than 2.05 seconds of recovery time: constant access to the data is not critical. You increase the value for the parameter FAST_START_IO_TARGET to 2000 and perform several updates. You then reissue the query and Oracle displays:

RECOVERY_
ESTIMATED_
IOS

ACTUAL_
REDO_BLKS

TARGET_
REDO_BLKS

LOG_FILE_
SIZE_REDO_
BLKS

LOG_CHKPT_
TIMEOUT_
REDO_BLKS

LOG_CHKPT_
INTERVAL_
REDO_BLKS

FAST_START_
IO_TARGET_ 
REDO_BLKS

4294967295

1 row selected.

Recalculate recovery time using the same formula:

You have increased your recovery time by 1.96 seconds. If you can afford more time, then repeat the procedure until you arrive at an acceptable recovery time.

Calculating Performance Overhead

To calculate performance overhead, use the V$SYSSTAT view. For example, assume you execute the following query:

SELECT NAME, VALUE FROM V$SYSSTAT
WHERE NAME IN ( 'PHYSICAL READS', 'PHYSICAL WRITES',);

Oracle responds with the following:

NAME                                 VALUE
physical reads                        2376
physical writes                      14932
physical writes non checkpoint       11165
3 rows selected.

The first row shows the number of data blocks retrieved from disk. The second row shows the number of data blocks written to disk. The last row shows the value of the number of writes to disk that would occur if you turned off checkpointing.

Use this data to calculate the overhead imposed by setting the FAST_START_IO_TARGET initialization parameter. To effectively measure the percentage of extra writes, mark the values for these statistics at different times, t_1 and t_2. Use the following formula where the variables stand for the following:

Variable Definition

*_1

Value of prefixed variable at time t_1, which is any time after the database has been running for a while

*_2

Value of prefixed variable at time t_2, which is later than t_1 and not immediately after changing any of the checkpoint parameters

PWNC

Physical writes non checkpoint

PW

Physical writes

PR

Physical reads

EIO

Percentage of estimated extra I/Os generated by enabling checkpointing

Variable	Definition
*_1	Value of prefixed variable at time `t_1`, which is any time after the database has been running for a while
*_2	Value of prefixed variable at time `t_2`, which is later than `t_1` and not immediately after changing any of the checkpoint parameters
PWNC	Physical writes non checkpoint
PW	Physical writes
PR	Physical reads
EIO	Percentage of estimated extra I/Os generated by enabling checkpointing

Calculate the percentage of extra I/Os generated by fast-start checkpointing using this formula:

It can take some time for database statistics to stabilize after instance startup or dynamic initialization parameter modification. After such events, wait until all blocks age out of the buffer cache at least once before taking measurements.

If the percentage of extra I/Os is too high, then increase the value for FAST_START_IO_TARGET. Adjust this parameter until you get an acceptable value for the RECOVERY_ESTIMATED_IOS in V$INSTANCE_RECOVERY as described in "Determining the Strongest Checkpoint Influence".

The number of extra writes caused by setting FAST_START_IO_TARGET to a non-zero value is application-dependent. An application that repeatedly modifies the same buffers incurs a higher write penalty because of fast-start checkpointing than an application that does not. The extra write penalty is not dependent on cache size.

Calculating Performance Overhead: Example Scenario

As an example, assume your initialization parameter settings are:

FAST_START_IO_TARGET = 2000
LOG_CHECKPOINT_TIMEOUT = 1800 # default
LOG_CHECKPOINT_INTERVAL = 0 # default: disabled interval checkpointing

After the statistics stabilize, you issue this query on V$SYSSTAT:

SELECT NAME, VALUE FROM V$SYSSTAT
WHERE NAME IN ('PHYSICAL READS', 'PHYSICAL WRITES',
'PHYSICAL WRITES NON CHECKPOINT');

Oracle responds with:

Name                               Value
physical reads                      2376
physical writes                    14932
physical writes non checkpoint     11165
3 rows selected.

After making updates for a few hours, you re-issue the query and Oracle responds with:

Name                               Value
physical reads                      3011
physical writes                    17467
physical writes non checkpoint     13231
3 rows selected.

Substitute the values from your select statements in the formula as described to determine how much performance overhead you are incurring:

[((17467 - 14932) - (13231 - 11165)) / ((3011 - 2376) + (17467 - 14932))] x 100% = 14.8%

As the result indicates, enabling fast-start checkpointing generates about 15% more I/O than would be required had you not enabled fast-start checkpointing. After calculating the extra I/O, you decide you can afford more system overhead if you decrease recovery time.

To decrease recovery time, reduce the value for the parameter FAST_START_IO_TARGET to 1000. After items in the buffer cache age out, calculate V$SYSSTAT statistics across a second interval to determine the new performance overhead. Query V$SYSSTAT:

SELECT NAME, VALUE FROM V$SYSSTAT
WHERE NAME IN ('PHYSICAL READS', 'PHYSICAL WRITES',
'PHYSICAL WRITES NON CHECKPOINT');

Oracle responds with:

Name                               Value
physical reads                      4652
physical writes                    28864
physical writes non checkpoint     21784
3 rows selected.

After making updates, re-issue the query and Oracle responds with:

Name                               Value
physical reads                      6000
physical writes                    35394
physical writes non checkpoint     26438
3 rows selected.

Calculate how much performance overhead you are incurring using the values from your two SELECT statements:

[(35394 - 28864) - (26438 - 21784)) / ((6000 - 4652) + (35394 - 28864))] x 100% = 23.8%

After changing the parameter, the percentage of I/Os performed by Oracle is now about 24% more than it would be if you disabled fast-start checkpointing.

Tuning the Phases of Instance Recovery

The work required to do roll forward processing is proportional to the rate of change to the database (update transactions per second) and the time between which consistent snapshots, or checkpoints, of the database are made. The work required to do roll back is proportional to the number and size of uncommitted transactions when the system fault occurred. The total recovery time is the sum of time to do roll forward and the time to do roll back.

Besides using checkpoints to tune instance recovery, you can also use a variety of parameters to control Oracle's behavior during the rolling forward and rolling back phases of instance recovery. In some cases, you can parallelize operations and thereby increase recovery efficiency.

This section contains the following topics:

Tuning the Rolling Forward Phase

Use parallel block recovery to tune the roll forward phase of recovery. Parallel block recovery uses a division of labor approach to allocate different processes to different data blocks during the roll forward phase of recovery. For example, during recovery the redo log is read, and blocks that require redo application are parsed out. These blocks are subsequently distributed evenly to all recovery slaves to be read into the buffer cache. Crash, instance, and media recovery of many datafiles on different disk drives are good candidates for parallel block recovery.

Use the RECOVERY_PARALLELISM initialization parameter to specify the number of concurrent recovery processes for instance or media recovery operations. Because crash recovery occurs at instance startup, this parameter is useful for specifying the number of processes to use for crash recovery. The value of this parameter is also the default number of processes used for media recovery if you do not specify the PARALLEL clause of the RECOVER statement. To use parallel processing, the value of RECOVERY_PARALLELISM must be greater than 1 and cannot exceed the value of the PARALLEL_MAX_SERVERS parameter. Parallel block recovery requires a minimum of eight recovery processes for it to be more effective than serial recovery.

Recovery is usually I/O bound on reads to data blocks. Consequently, parallelism at the block level may only help recovery performance if it increases total I/Os. In other words, parallelism at the block level by-passes operating system restrictions on asynchronous I/Os. Performance on systems with efficient asynchronous I/O typically does not improve significantly with parallel block recovery.

Tuning the Rolling Back Phase

During the second phase of instance recovery, Oracle rolls back uncommitted transactions. Oracle uses two features, fast-start on-demand rollback and fast-start parallel rollback, to increase the efficiency of this recovery phase.

Note:
These features are part of fast-start fault recovery and are only available in the Oracle8i Enterprise Edition.

This section contains the following topics:

Using Fast-Start On-Demand Rollback

Using the fast-start on-demand rollback feature, Oracle automatically allows new transactions to begin immediately after the roll forward phase of recovery completes. Should a user attempt to access a row that is locked by a dead transaction, Oracle rolls back only those changes necessary to complete the transaction, in other words, it rolls them back on demand. Consequently, new transactions do not have to wait until all parts of a long transaction are rolled back.

Note:
Oracle does this automatically. You do not need to set any parameters or issue statements to use this feature.

Using Fast-Start Parallel Rollback

In fast-start parallel rollback, the background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes. Essentially, fast-start parallel rollback is to rolling back what parallel block recovery is to rolling forward.

Fast-start parallel rollback is mainly useful when a system has transactions that run a long time before committing, especially parallel INSERT, UPDATE, and DELETE operations. When SMON discovers that the amount of recovery work is above a certain threshold, it automatically begins parallel rollback by dispersing the work among several parallel processes: process 1 rolls back one transaction, process 2 rolls back a second transaction, and so on. The threshold is the point at which parallel recovery becomes cost-effective, in other words, when parallel recovery takes less time than serial recovery.

One special form of fast-start parallel rollback is intra-transaction recovery. In intra-transaction recovery, a single transaction is divided among several processes. For example, assume 8 transactions require recovery with one parallel process assigned to each transaction. The transactions are all similar in size except for transaction 5, which is quite large. This means it takes longer for one process to roll this transaction back than for the other processes to roll back their transactions.

In this situation, Oracle automatically begins intra-transaction recovery by dispersing transaction 5 among the processes: process 1 takes one part, process 2 takes another part, and so on.

You control the number of processes involved in transaction recovery by setting the parameter FAST_START_PARALLEL_ROLLBACK to one of three values:

FALSE

Turns off fast-start parallel rollback.

LOW

Specifies that the number of recovery servers may not exceed twice the value of the CPU_COUNT parameter.

HIGH

Specifies that the number of recovery servers may not exceed four times the value of the CPU_COUNT parameter.

Parallel Rollback in an Oracle Parallel Server Configuration

In Oracle Parallel Server, you can perform fast-start parallel rollback on each instance. Within each instance, you can perform parallel rollback on transactions that are:

Online on a given instance.
Offline and not being recovered on instances other than the given instance.

After a rollback segment is online for a given instance, only this instance can perform parallel rollback on transactions on that segment.

Monitoring Progress of Fast-Start Parallel Rollback

Monitor the progress of fast-start parallel rollback by examining the V$FAST_START_SERVERS and V$FAST_START_TRANSACTIONS tables. V$FAST_START_SERVERS provides information about all recovery processes performing fast-start parallel rollback. V$FAST_START_TRANSACTIONS contains data about the progress of the transactions.

FALSE	Turns off fast-start parallel rollback.
LOW	Specifies that the number of recovery servers may not exceed twice the value of the `CPU_COUNT` parameter.
HIGH	Specifies that the number of recovery servers may not exceed four times the value of the `CPU_COUNT` parameter.

24 Tuning Instance Recovery Performance

Understanding Instance Recovery

How Oracle Applies Redo Log Information

Trade-offs of Minimizing Recovery Duration

Tuning the Duration of Instance and Crash Recovery

Using Initialization Parameters to Influence Recovery Time

Table 24-1 Initialization Parameters Influencing Checkpoints

Using LOG_CHECKPOINT_TIMEOUT

Using LOG_CHECKPOINT_INTERVAL

Using FAST_START_IO_TARGET

Using Redo Log Size to Influence Checkpointing Frequency

Using SQL Statements to Initiate Checkpoints

Monitoring Instance Recovery

Table 24-2 V$INSTANCE_RECOVERY View

Determining the Strongest Checkpoint Influence

Estimating Recovery Time

Adjusting Recovery Time: Example Scenario

Calculating Performance Overhead

Calculating Performance Overhead: Example Scenario

Tuning the Phases of Instance Recovery

Tuning the Rolling Forward Phase

Tuning the Rolling Back Phase

Using Fast-Start On-Demand Rollback

Using Fast-Start Parallel Rollback

Parallel Rollback in an Oracle Parallel Server Configuration

Monitoring Progress of Fast-Start Parallel Rollback

24
Tuning Instance Recovery Performance