Oracle8i Recovery Manager User's Guide and Reference Release 2 (8.1.6) Part Number A76990-01 |
|
This chapter describes how to troubleshoot and debug Recovery Manager. It includes the following topics:
Recovery Manager provides detailed error messages that can aid in troubleshooting problems. Also, the Oracle database server and third-party media vendors generate useful debugging output of their own. This section addresses the following topics:
Output that is useful for troubleshooting failed RMAN jobs is located in several different places, as explained in the following table:
Type of Output | Produced By | Location | Description |
---|---|---|---|
RMAN messages |
RMAN |
Direct this output to: |
Describes actions relevant to the RMAN job as well as error messages generated by RMAN, the server, and the media vendor. |
RMAN trace file |
RMAN's debug command |
The file specified using the trace parameter at the command line. |
Contains exhaustive output on the creation and execution of PL/SQL program units. |
|
Oracle database server |
The directory specified in the USER_DUMP_DEST initialization parameter. |
Contains a chronological log of errors, |
Oracle trace file |
Oracle database server |
The directory specified in the USER_DUMP_DEST initialization parameter. |
Contains detailed output generated by Oracle server processes. This file is created when an |
|
Third-party media management software |
The directory specified in the USER_DUMP_DEST initialization parameter. |
Contains information on the functioning of the media management device. |
Media manager log file |
Third-party media management software |
The filenames for any media manager logs other than |
Contains information on the functioning of the media management device. |
Typically, you find the following types of error codes in RMAN message stacks:
RMAN-
ORA-
Additional information:
Explanations for RMAN
and ORA
error codes are in Oracle8i Error Messages.
Table 9-1 indicates the error ranges for common RMAN error messages, all of which are described in Oracle8i Error Messages:
When errors occur through the media management API, RMAN returns an error message number prefixed as follows:
Additional information:
Below is the list of message numbers and their corresponding error text. In the error codes, O/S stands for Operating System. The errors prefixed with an asterisk are internal and should never be seen during normal operation.
Sometimes you may find it difficult to identify the useful messages in the RMAN error stack. Note the following tips and suggestions:
Additional information:
followed by an integer. This line indicates a media management error. The integer that follows refers to a code that is explained in the text of the error message.
You attempt the following backup of tablespace TBS_99 and receive the following message:
RMAN> run{allocate channel c1 type disk; backup tablespace tbs_99;} RMAN-03022: compiling command: allocate RMAN-03023: executing command: allocate RMAN-08030: allocated channel: c1 RMAN-08500: channel c1: sid=8 devtype=DISK RMAN-03022: compiling command: backup RMAN-03026: error recovery releasing channel resources RMAN-08031: released channel: c1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure during compilation of command RMAN-03013: command type: backup RMAN-06038: recovery catalog package detected an error RMAN-20202: tablespace not found in the recovery catalog RMAN-06019: could not translate tablespace name "TBS_99"
You first note that the error occurred after RMAN compiled and executed the allocate command, but before it could execute the backup command. You read the last two messages in the stack first and immediately see the problem: no tablespace called TBS_99 appears in the recovery catalog. You conclude that either TBS_99 does not exist in the database, or it does exist in the database but the recovery catalog does not yet know about it.
Assume that you attempt to back up your database and receive the following error:
RMAN> run{allocate channel c1 type disk;backup database;} RMAN-03022: compiling command: allocate RMAN-03026: error recovery releasing channel resources RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure during compilation of command RMAN-03013: command type: allocate RMAN-06004: ORACLE error from recovery catalog database: ORA-03113: end-of-file on communication channel RMAN-06097: text of failing SQL statement: begin dbms_rcvman . resetAll ; dbms_rcvman . set_package_constants ; : RMAN-06099: error occurred in source file: krmk.pc, line: 14531
As suggested, you start reading from the bottom up. The first message is numbered 6099, so you know that RMAN was not able to compile the command. The third message from the bottom is more specific: the error came from the recovery catalog database. The ORA-03113
message indicates that there was a problem communicating with the recovery catalog server.
You then query the recovery catalog database and discover that it was in the process of being shut down when you executed the run command. Consequently, the job failed.
Media management errors are not uncommon. Assume that you use a tape drive and receive the following output during a backup job:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03007: retryable error occurred during execution of command: allocate RMAN-07004: unhandled exception during command execution on channel c4 RMAN-10032: unhandled exception during execution of job step 4: ORA-06512: at line 158 RMAN-10035: exception raised in RPC: ORA-19624: operation failed, retry possible ORA-19506: failed to create sequential file, name="df_99_1", parms="" ORA-27007: failed to open file HP-UX Error: 1003: Unknown system error Additional information: 7004 Additional information: 1 ORA-06512: at "SYS.DBMS_BACKUP_RESTORE", line 410 RMAN-10031: ORA-19624 occurred during call to DBMS_BACKUP_RESTORE.DEVICEALLOCATE
Following the suggestions for reading error message stacks, you look for the Additional information:
line and notice:
Additional information: 7004
You discover that error 7004 means that RMAN is unable to connect to the tape device. For some reason, RMAN is not recognizing your media management software.
Note also that when you read from the bottom up, the first line says that an error occurred during a call to a PL/SQL program unit called DEVICEALLOCATE. Also, the first message below the stack banner says that there was an error executing the allocate command. All of this information indicates that RMAN was not able to allocate a tape channel because it was unable to recognize the tape device.
If the standard RMAN error message stack is not generating sufficient information, then use the RMAN debug command to generate more extensive output. Use debugging for the following purposes:
Because debugging output can be voluminous, redirect the output to a trace file to prevent overloading the current log file or standard output. The debugging output displays the following detailed information:
RMAN debugging output is so highly detailed that you may find yourself unable to distinguish the useful from the useless information. Assume that you execute the following command in debug mode:
run{ allocate channel c1 type disk; backup tablespace TBS_5, TBS_6; }
The command fails. See the following subset of the debug output for this failed command (note the commented lines in bold):
EXEC SQL AT RCVCAT begin :name := upper ( :name ) ; dbms_rcvman . translateTablespace ( :name ) ; end ; sqlcode=0 :b1 = "TBS_5" /* RMAN calls the translateTablespace procedure to translate tablespace * * name TBS_5 into its constituent datafiles and then calls getDataFile * * to get information about each of the datafiles in tablespace TBS_5. */ EXEC SQL AT RCVCAT begin dbms_rcvman . getDataFile ( :fno , :crescn , :cretime:cretime_i , :fname:fname_i , :tsname , :status:status_i , :blksize , :kbytes:kbytes_i , :blocks:blocks_i , :urscn , :stopscn:stopscn_i , :read_only ) ; end ; sqlcode=0 :b1 = 14 :b2 = 33410 :b3 = "11-JUN-99" :b4 = "/vobs/oracle/dbs/tbs_51.f" /* datafile name */ :b5 = "TBS_5" :b6 = <EMPTY STRING> :b7 = 2048 :b8 = 500 :b9 = 250 :b10 = 0 :b11 = NULL :b12 = 0 EXEC SQL AT RCVCAT begin dbms_rcvman . getDataFile ( :fno , :crescn , :cretime:cretime_i , :fname:fname_i , :tsname , :status:status_i , :blksize , :kbytes:kbytes_i , :blocks:blocks_i , :urscn , :stopscn:stopscn_i , :read_only ) ; end ; sqlcode=0 :b1 = 15 :b2 = 33416 :b3 = "11-JUN-99" :b4 = "/vobs/oracle/dbs/tbs_52.f" /* datafile name */ :b5 = "TBS_5" :b6 = <EMPTY STRING> :b7 = 2048 :b8 = 500 :b9 = 250 :b10 = 0 :b11 = NULL :b12 = 0 EXEC SQL AT RCVCAT begin dbms_rcvman . getDataFile ( :fno , :crescn , :cretime:cretime_i , :fname:fname_i , :tsname , :status:status_i , :blksize , :kbytes:kbytes_i , :blocks:blocks_i , :urscn , :stopscn:stopscn_i , :read_only ) ; end ; sqlcode=-1405 /* RMAN tries to translate tablespace name TBS_6 and get the datafiles, * * but is unable to identify any datafiles for this tablespace. */ EXEC SQL AT RCVCAT begin :name := upper ( :name ) ; dbms_rcvman . translateTablespace ( :name ) ; end ; sqlcode=0 :b1 = "TBS_6" EXEC SQL AT RCVCAT begin dbms_rcvman . getDataFile ( :fno , :crescn , :cretime:cretime_i , :fname:fname_i , :tsname , :status:status_i , :blksize , :kbytes:kbytes_i , :blocks:blocks_i , :urscn , :stopscn:stopscn_i , :read_only ) ; end ; sqlcode=-20202 krmicomp: error 6038 signalled during compilation RMAN-03026: error recovery releasing channel resources krmkdps: this_db_key=1 krmkdps: this_dbinc_key=2 krmkdps: until_scn= krmkdps: until_time= krmkdps: completed_after= krmkdps: completed_before= krmkdps: like_pattern= krmkdps: RA_kindMask=255 krmkdps: all_flag=0 krmqclean: the compiled command tree is: CMD type=cleanup id=1 status=NOT STARTED
As you can see, the debugging output is voluminous--and the sample included is just one portion of the total output. Nevertheless, the output is often useful for pinpointing which PL/SQL package, procedure, or function was unable to execute successfully.
On specified platforms, Oracle provides a diagnostic tool called sbttest
. This utility performs a simple test of the tape library by acting as the Oracle database server and attempting to communicate with the media manager.
On UNIX, the sbttest
utility is located in $ORACLE_HOME/bin
. If for some reason the utility is not included with your platform, then contact Oracle Support to obtain the C version of the program. You can compile this version of the program on all UNIX platforms.
Note that on platforms such as SunSolaris, you do not have to relink when using sbttest
. On other platforms, relinking may be necessary.
For online documentation of sbttest
, issue the following on the command line:
% sbttest
The program displays the list of possible arguments for the program:
Error: backup file name must be specified Usage: sbttest backup_file_name # this is the only required parameter <-dbname database_name> <-trace trace_file_name> <-remove_before> <-no_remove_after> <-read_only> <-no_regular_backup_restore> <-no_proxy_backup> <-no_proxy_restore> <-file_type n> <-copy_number n> <-media_pool n> <-os_res_size n> <-pl_res_size n> <-block_size block_size> <-block_count block_count> <-proxy_file os_file_name bk_file_name [os_res_size pl_res_size block_size block_count]>
The display also indicates the meaning of each argument. For example, following is the description for two optional parameters:
Optional parameters: -dbname specifies the database name which will be used by SBT to identify the backup file. The default is "sbtdb" -trace specifies the name of a file where the Media Management software will write diagnostic messages.
Use sbttest
to perform a quick test of the media manager. The following table explains how to interpret the output:
sbttest
at the command line:
% sbttest
If the program is operational, you should see a display of the online documentation.
some_file.f
and write the output to sbtio.log
:
% sbttest some_file.f -trace sbtio.log
You can also test a backup of an existing datafile. For example, this command tests datafile tbs_33.f
of database PROD:
% sbttest tbs_33.f -dbname prod
libobk.so could not be loaded. Check that it is installed properly, and that LD_ LIBRARY_PATH environment variable (or its equivalent on your platform) includes the directory where this file can be found. Here is some additional information on the cause of this error: ld.so.1: sbttest: fatal: libobk.so: open failed: No such file or directory
Sometimes it is useful to identify what a server session performing a backup or copy operation is doing. You have access to several views that can assist in monitoring the progress of or obtaining information about RMAN jobs:
RMAN allows you to perform the following checks:
To identify which server sessions correspond to which RMAN channels, use the set command with the command id parameter. The command id parameter enters the specified string into the CLIENT_INFO column of the V$SESSION dynamic performance view. Join V$SESSION with V$PROCESS to correlate the server session with the channel.
The CLIENT_INFO column of V$SESSION contains information for each Recovery Manager server session. The data appears in one of the following formats:
This form appears for the first connection to the target database established by RMAN.
This form appears for all allocated channels.
The SPID column of V$PROCESS identifies the operating system process number.
% rman target / catalog rman/rman@rcat
run { allocate channel t1 type disk; allocate channel t2 type disk; set command id to 'rman'; backup incremental level 0 filesperset 5 tablespace 'SYSTEM'; # optionally, issue a host command to access the operating system prompt host;sql 'ALTER SYSTEM ARCHIVE LOG ALL';
}
SELECT sid, spid, client_info FROM v$process p, v$session s WHERE p.addr = s.paddr AND client_info LIKE '%id=rman%'; SID SPID CLIENT_INFO ---------- --------- ---------------------------------------------------------------- 8 21973 id=rman 16 22057 id=rman 17 22068 id=rman,ch=t1 18 22070 id=rman,ch=t2
Monitor the progress of backups, copies, and restores by querying the view V$SESSION_LONGOPS.
Each server session performing a backup, restore, or copy reports its progress compared to the total amount of work required for that particular part of the restore. For example, if you perform a restore using two channels, and each channel has two backup sets to restore (a total of 4 sets), then each server session reports its progress through a single set. When that set is completely restored, RMAN starts reporting progress on the next set to restore.
% rman target / catalog rman/rman@rcat
run { allocate channel t1 type disk; backup database;
}
SELECT sid, serial#, context, sofar, totalwork, round(sofar/totalwork*100,2) "% Complete" FROM v$session_longops WHERE opname LIKE 'RMAN%' AND opname NOT LIKE '%aggregate%' AND totalwork != 0 AND sofar <> totalwork /
If you repeat the query while the backup progresses, then you see output such as the following:
SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % Complete ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 10377 36617 28.34 SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % Complete ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 21513 36617 58.75 SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % Complete ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 29641 36617 80.95 SQL> @longops SID SERIAL# CONTEXT SOFAR TOTALWORK % Complete ---------- ---------- ---------- ---------- ---------- ---------- 8 19 1 35849 36617 97.9 SQL> @longops no rows selected
SQL> SELECT sid, seconds_in_wait AS sec_wait, event FROM v$session_wait 2 WHERE wait_time = 0 3 ORDER BY sid; SID SEC_WAIT EVENT ---------- ---------- ----------------------------------------------- 1 368383335 pmon timer 2 1097 rdbms ipc message 3 387928 rdbms ipc message 4 0 rdbms ipc message 5 1408 smon timer 6 386114 rdbms ipc message 7 387626 rdbms ipc message 8 1060 SQL*Net message from client 9 1060 SQL*Net message from client 12 1060 SQL*Net message from client 13 2366 SQL*Net message from client 14 2757 SQL*Net message from client 12 rows selected.
See Also:
|
Monitor backup and restore performance by querying V$BACKUP_SYNC and V$BACKUP_ASYNC_IO. For a complete description of the contents of these views and how you can use them to tune backup performance, see Oracle8i Designing and Tuning for Performance.
You may sometimes need to kill an RMAN job that is hanging. Often, hung jobs occur when RMAN is interfacing with a media manager. The best way to stop RMAN when the connections for the allocated channels are hung in the media manager is to kill the Oracle process of the connections. Be careful when executing this operation because killing the Oracle process may cause problems for the media manager.
The nature of an RMAN session depends on the operating system. In UNIX, an RMAN session has the following processes associated with it:
RMAN usually hangs because one of the channel connections is waiting in the media manager code for a tape resource. The catalog connection and the default channel seem to hang because they are waiting for RMAN to tell them what to do. Polling connections seem to be in an infinite loop while polling the RPC under the control of the RMAN process.
If you kill the RMAN process itself, then you also kill the catalog connection, the default channel, and the polling connections. Target connections that are not hung in the media manager code also terminate: only the target connection executing in the media management layer remains active. You must manually kill this process because terminating its session does not kill it. Even after termination, the media manager may keep resources busy or continue processing because it does not realize that the Oracle process is gone. This behavior is media manager-dependent.
Terminating the catalog connection does not cause RMAN to finish because RMAN is not performing catalog operations. Removing default channel and polling connections cause the RMAN process to detect that one of the channels has died and then proceed to exit. In this case, the connections to the hung channels remain active as described above.
The best way to terminate RMAN when the connections for the allocated channels are hung in the media manager is to kill the Oracle process of the connections. The RMAN process detects this termination and proceed to exit, removing all connections except target connections that are still operative in the media management layer. The caveat about the media manager resources still applies in this case.
This procedure is system-specific. See your operating system-specific documentation for the relevant commands.
pstack
located in /usr/proc/bin
to obtain the stack.
SBTxxxx
(normally sbtopen
) as one of its top calls. Note that other layers may appear on top of it.
kill -9
command.
This section describes the most common problems encountered when using RMAN:
In this scenario, you link the media manager with Oracle but still cannot make RMAN back up to tape. You see either of these error messages:
# error 1 ORA-19511: SBT error = 4110, errno = 0, BACKUP_DIR environment variable is not set # error 2 RMAN-008526: channel channel_name: WARNING: Oracle Test Disk API
When you install Oracle8i, the server kernel is linked with a shared library whose name and location differs depending on your operating system. For example, the library on SunSolaris in version 8.1 is called $ORACLE_HOME/lib/libobk.so
. This is a symbolic link that points to the so-called dummy API that Oracle links to by default. On SunSolaris, this library is called libdsbtsh8.so
.
This dummy shared library allows you to use RMAN to test writing to disk so long as you specify BACKUP_DIR in the parms parameter of the allocate channel command.
Executable | Shared Library |
---|---|
oracle.exe |
libobk.so -> libdsbtsh8.so |
An SBT error of 4110
for Oracle version 8.1 or an Additional information:
message of 4110
for Oracle version 8.0 indicates that Oracle is not linked with the media manager API. Instead, Oracle is linked with Oracle's own dummy API. These errors occur because the BACKUP_DIR environment variable is not specified for the channel servicing the backup. One way to set the BACKUP_DIR location is by using the parms parameter of the allocate channel command.
In Oracle version 8.1, if the RMAN-8526
message states that the disk API was used, then the BACKUP_DIR environment variable was successfully resolved by the Oracle channel and RMAN made the backup using the disk API. If you do not see the 4110
or RMAN-008526
errors, but RMAN is not making backups to the media manager, then follow the procedure below to determine whether Oracle is using the dummy API.
% cd $ORACLE_HOME/bin
% ldd oracle | grep libobk.so
libobk.so
is symbolically linked to libdsbtsh8.so
:
libobk.so -> libdsbtsh8.so
If so, then Oracle is linking to the dummy API instead of your media management API.
For example, you can issue:
run { allocate channel c1 type 'sbt_tape' parms="ENV=(BACKUP_DIR=/oracle/work)"; backup tablespace system; }
If you see RMAN-08526
and RMAN-08525
in the output, then the backup to disk using the dummy API was successful:
RMAN-08526: channel c1: WARNING: Oracle Test Disk API RMAN-08525: backup set complete, elapsed time: 00:00:25
If Oracle is linked to the dummy API instead of your media manager's shared library, then you must tell Oracle to link to the media manager's shared library instead.
libobk.so
link. For example, enter:
SQL> SHUTDOWN IMMEDIATE
libobk.so
:
% rm $ORACLE_HOME/lib/libok.so
libobk.so
and the shared library that you want to use. For example, on SunSolaris you can create the symbolic link libobk.so
to a shared library such as liblsm.so
using the ln
command:
% ln -s $ORACLE_HOME/lib/libobk.so $ORACLE_HOME/lib/liblsm.so
sbttest
test program to back up a file. For example, enter:
% sbttest -testfile
In this scenario, you install the media manager on the Windows platform, but cannot make RMAN back up to tape. Instead, RMAN does one or more of the following:
$ORACLE_HOME/database
directory on disk
4110
for Oracle version 8.1 or an additional information message of 4110
for Oracle version 8.0
If you specify 'sbt_tape' but RMAN does not use the media management API, then the following scenarios are possible:
orasbt.dll
in its default directory path.
DllMain()
function returned an error, or the vendor did not implement all of the SBT functions.
RMAN-008526: channel channel_name: WARNING: Oracle Test Disk API
If you see this information, then Oracle is linking to the dummy API instead of your media management API. If not, proceed to the next step.
If Oracle is using the dummy API successfully, then it cannot find the media management API in its default directory path. You must configure the system so that Oracle can find the correct API.
Follow this procedure:
orasbt.dll
in the SYSTEM path so that the Oracle service can find it.
In this scenario, an RMAN backup job starts as normal and then pauses inexplicably:
Recovery Manager: Release 8.1.5.0.0 - Production RMAN-06005: connected to target database: TORPEDO RMAN-06008: connected to recovery catalog database RMAN> run { 2> allocate channel t1 type "SBT_TAPE"; 3> backup 4> tablespace system,users; } RMAN-03022: compiling command: allocate RMAN-03023: executing command: allocate RMAN-08030: allocated channel: t1 RMAN-08500: channel t1: sid=16 devtype=SBT_TAPE RMAN-03022: compiling command: backup RMAN-03023: executing command: backup RMAN-08008: channel t1: starting datafile backupset RMAN-08502: set_count=15 set_stamp=338309600 RMAN-08010: channel t1: including datafile 2 in backupset RMAN-08010: channel t1: including datafile 1 in backupset RMAN-08011: channel t1: including current controlfile in backupset # Hanging here for 30 min now
If a backup job is hanging, that is, not proceeding, then several scenarios are possible:
Your first task is to try to determine which of these scenarios is the most likely cause.
% rman target / catalog rman/rman@catdb debug trace = /oracle/log
run { allocate channel c1 type 'sbt_tape'; backup tablespace system; }
krmxrpc: xc=6897512 starting longrunning RPC #13 to target: DBMS_BACKUP_RESTORE. BACKUPPIECECREATE krmxr: xc=6897512 started long running rpc
SELECT * FROM v$session_wait WHERE wait_time = 0 AND sid = 12;
Because the causes of a hung backup job can be varied, so are the solutions. The best practice is to look for the simplest solutions first. For example, it is quite common for backup jobs to hang simply because the tape device has completely filled the current cassette and is waiting for a new tape to be inserted.
To learn how to kill an RMAN session that is hanging, see "Terminating an RMAN Session".
In this scenario, you run a backup job and receive message output similar to the following:
RMAN-08010: channel c8: including datafile number 47 in backupset RMAN-10030: RPC call appears to have failed to start on channel c9 RMAN-10036: RPC call ok on channel c9 RMAN-08010: channel c3: including datafile number 18 in backupset
The RMAN-10030
error message does not usually indicate a problem. The RMAN-10030
error indicates one of the following:
Timing problems occur in this way. When RMAN posts a long-term RPC, it checks the V$SESSION performance view. The RPC updates the information in the view to indicate when it starts and finishes. Sometimes RMAN checks V$SESSION before the RPC has indicated it has started, which in turn generates the RMAN-10030
error message.
If the RMAN-10036
error message does not appear in the output along with RMAN-10030
, then the backup job encountered a problem. For example, sometimes the RMAN-10030
message appears when a backup to tape hangs at the beginning of the job.
SELECT * FROM V$SESSION_WAIT WHERE compnam = 'dbms_backup_restore';
If Oracle returns no information, then the PL/SQL program performing the backup is hung.
libobk
client. If the client does not receive the information, then Oracle does not get notified that the backup terminated due to MML problems.
In this scenario, you attempt a backup and receive the following error messages:
RMAN-3014: Implicit resync of recovery catalog failed RMAN-6038: Recovery catalog package detected an error RMAN-20035: Invalid high RECID error
You probably restored a backup control file created through a non-Oracle mechanism, and then opened the database without performing a RESETLOGS operation. If you had created the backup control file through the RMAN backup command or the ALTER DATABASE BACKUP CONTROLFILE statement, then Oracle would have required you to reset the logs.
The control file and the recovery catalog are now not synchronized. The database control file is older than the recovery catalog, because at one time the recovery catalog resynchronized using the old current control file, and now the database is using a backup control file. RMAN detects that the control file currently in use is older than the control file previously used to resynchronize.
You can follow either of these procedures, although the first procedure is safer and is strongly recommended:
% sqlplus sys/oracle@prod1
SQL> ALTER DATABASE MOUNT;
SQL> ALTER DATABASE RECOVER DATABASE UNTIL CANCEL USING BACKUP CONTROLFILE; SQL> ALTER DATABASE RECOVER CANCEL;
SQL> ALTER DATABASE OPEN RESETLOGS;
% rman target / rman/rman@rcat
reset database;
run { allocate channel c1 type disk; backup database; }
% sqlplus sys/oracle@prod1
SQL> ALTER DATABASE MOUNT;
SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
*** SESSION ID:(8.1) 1998.12.09.13.26.36.000 *** 1998.12.09.13.26.36.000 # The following statements will create a new control file and use it # to open the database. # Data used by the recovery manager will be lost. Additional logs may # be required for media recovery of offline data files. Use this # only if the current version of all online logs are available. STARTUP NOMOUNT CREATE CONTROLFILE REUSE DATABASE "PROD1" NORESETLOGS ARCHIVELOG MAXLOGFILES 32 MAXLOGMEMBERS 2 MAXDATAFILES 32 MAXINSTANCES 1 MAXLOGHISTORY 1012 LOGFILE GROUP 1 '/oracle/dbs/t1_log1.f' SIZE 200K, GROUP 2 '/oracle/dbs/t1_log2.f' SIZE 200K DATAFILE '/oracle/dbs/tbs_01.f', '/oracle/dbs/tbs_02.f', '/oracle/dbs/tbs_11.f', '/oracle/dbs/tbs_12.f', '/oracle/dbs/tbs_21.f', '/oracle/dbs/tbs_22.f', CHARACTER SET WE8DEC ; # Configure snapshot controlfile filename EXECUTE SYS.DBMS_BACKUP_RESTORE.CFILESETSNAPSHOTNAME('/oracle/dbs/snapcf_prod1.f'); # Recovery is required if any of the datafiles are restored backups, # or if the last shutdown was not normal or immediate. RECOVER DATABASE # All logs need archiving and a log switch is needed. ALTER SYSTEM ARCHIVE LOG ALL; # Database can now be opened normally. ALTER DATABASE OPEN; # No tempfile entries found to add.
SHUTDOWN IMMEDIATE
STARTUP NOMOUNT CREATE CONTROLFILE ...; EXECUTE ...; RECOVER DATABASE ALTER SYSTEM ARCHIVE LOG ALL; ALTER DATABASE OPEN ...;
In this scenario, a backup job fails because RMAN cannot make a snapshot control file. The message stack is as follows:
RMAN-08502: set_count=11 set_stamp=333299261 RMAN-08010: channel dev1: including datafile 1 in backupset RMAN-08512: waiting for snapshot controlfile enqueue RMAN-08512: waiting for snapshot controlfile enqueue RMAN-20029: cannot make a snapshot controlfile RMAN-03026: error recovery releasing channel resources RMAN-08031: released channel: dev1 RMAN-00569: ================error message stack follows================ RMAN-03006: non-retryable error occurred during execution of command: backup RMAN-07004: unhandled exception during command execution on channel dev1 RMAN-10032: unhandled exception during execution of job step 1: ORA-06512: at line 90 RMAN-10035: exception raised in RPC: ORA-00230: operation disallowed: snapshot controlfile enqueue unavailable ORA-06512: at "SYS.DBMS_BACKUP_RESTORE", line 1826 RMAN-10031: ORA-230 occurred during call to DBMS_BACKUP_RESTORE.CFILEMAKEANDUSESNAPSHOT
When RMAN needs to back up or resynchronize from the control file, it first creates a snapshot or consistent image of the control file. If one RMAN job is already backing up the control file while another needs to create a new snapshot control file, then you may see the following message:
RMAN-08512: waiting for snapshot controlfile enqueue
Under normal circumstances, a job that must wait for the control file enqueue waits for a brief interval and then successfully retrieves the enqueue. Recovery Manager makes up to five attempts to get the enqueue and then fails the job. The conflict is usually caused when two jobs are both backing up the control file, and the job that first starts backing up the control file waits for service from the media manager.
RMAN-08512: waiting for snapshot controlfile enqueue
message, start a new SQL*Plus session on the target database:
% sqlplus sys/sys_pwd@prod1
SELECT s.sid, username AS "User", program, module, action, logon_time "Logon", l.* FROM v$session s, v$enqueue_lock l WHERE l.sid = s.sid and l.type = 'CF' AND l.id1 = 0 and l.id2 = 2;
You should see output similar to the following (the output in this example has been truncated):
SID User Program Module Action Logon --- ---- -------------------- ------------------------- ---------------- --------- 9 SYS rman@h13 (TNS V1-V3) backup full datafile: c1 0000210 STARTED 21-JUN-99
After you have determined which job is creating the enqueue, you can do one of the following:
Commonly, enqueue situations occur when a job is writing to a tape drive, but the tape drive is waiting for a new cassette to be inserted. If you start a new job in this situation, you will probably receive the enqueue message because the first job cannot complete until the new tape is loaded.
In this scenario, your database archives automatically to two directories: /arc_dest
and arc_dest2
. You tell RMAN to perform a backup and delete the input archived redo logs afterwards:
run { allocate channel c1 type 'sbt_tape'; backup database; backup archivelog all delete input; }
You then perform a crosscheck to make sure that the logs are gone and discover the following:
RMAN> change archivelog all crosscheck; RMAN-03022: compiling command: change RMAN-06158: validation succeeded for archived log RMAN-08514: archivelog filename=/oracle/arch/dest2/arcr_1_964.arc recid=19 stamp=368726072
RMAN deleted one set of logs but not the other.
This problem is not an error. RMAN deletes only one copy of each input log, so even if you archive to five destinations, RMAN deletes logs from only one directory.
To force RMAN to delete all existing archived redo logs, allocate multiple channels and specify that each channel back up and delete logs from a different archiving destination. For example, enter:
run { allocate channel t1 type 'sbt_tape'; allocate channel t2 type 'sbt_tape'; backup archivelog like '/oracle/arch/dest1/%' channel t1 delete input archivelog like '/oracle/arch/dest2/%' channel t2 delete input; }
In this scenario, you schedule regular incremental backups of your database. Because RMAN can use incremental backups instead of archived redo logs to perform recovery, you use an operating system utility to delete all archived logs after each backup. The next time you take a backup, you receive this error:
RMAN-6089: archive log NAME not found or out of sync with catalog
This problem occurs when you delete the archived logs using an operating system command, which means that RMAN is unaware of the deletion. The RMAN-6089
error occurs because RMAN attempts to back up a log that it thinks still exists.
The easiest solution is to specify the delete input option when backing up archived logs. For example, enter:
run { allocate channel c1 type 'sbt_tape'; backup archivelog all delete input; }
The second easiest solution is to issue the following command at the RMAN prompt after deleting the logs using an operating system utility:
change archivelog all crosscheck;
If the compatible parameter in the catalog is set to 8.1.5 or lower, RMAN marks all archived logs it cannot located as having the status DELETED. If compatible is set to 8.1.6 or higher, then RMAN removes the records from the repository.
In this scenario, you are running an 8.1.5 version of RMAN and trying to connect to a version 8.0.4 target database. You receive the following error messages when you try to connect to the target database:
% rman catalog rman/rman@rcat Recovery Manager: Release 8.1.5.0.0 - Production RMAN-06008: connected to recovery catalog database RMAN> connect target sys/oscar123@nc0d RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ==== RMAN-00571: =========================================================== RMAN-04005: error from target database: ORA-06550: line 1, column 7: PLS-00201: identifier 'DBMS_BACKUP_RESTORE.SET_CHARSET' must be declared ORA-06550: line 1, column 7: PL/SQL: Statement ignored RMAN-04015: error setting target database character set to WE8ISO8859P1
Typically, this error message means that the DBMS_BACKUP_RESTORE package was not created during the installation of the database. Here are possible causes:
If you did not install the PL/SQL option, then install it. If you did install the PL/SQL option, then create the required packages by connecting to SQL*Plus with SYSDBA privileges and running the following scripts:
SQL> @$ORACLE_HOME/rdbms/admin/dbmsbkrs.sql SQL> @$ORACLE_HOME/rdbms/admin/prvtbkrs.plb
Recovery Manager fails with the following errors when attempting to connect to the target database:
% rman Recovery Manager: Release 8.1.5.0.0 - Production RMAN> connect target sys/change_on_install@inst1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-04005: error from target database: ORA-01017: invalid username/password; logon denied
Recovery Manager automatically requests a connection to the target database as SYSDBA. In order to connect to the target database as SYSDBA, you must either:
orapwd
command and the initialization parameter REMOTE_LOGIN_PASSWORDFILE.
If the target database does not have a password file, the user you are logged in as must be validated using operating system authentication.
Either create a password file for the target database or add yourself to the administrator list in your operating system. To learn how to create a password file, see Oracle8i Administrator's Guide.
In this scenario, you attempt to duplicate a database to the same host (although it could also be a remote host) using the duplicate command, but get the following error stack during compilation of the recover command:
RMAN-03022: compiling command: recover(3) RMAN-03023: executing command: recover(3) RMAN-08054: starting media recovery RMAN-08060: unable to find archivelog RMAN-08510: archivelog thread=1 sequence=6 RMAN-03022: compiling command: recover(4) RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-00601: fatal error in recovery manager RMAN-03012: fatal error during compilation of command RMAN-03028: fatal error code: 3015 RMAN-03013: command type: Duplicate Db RMAN-03015: error occurred in stored script Memory Script RMAN-03002: failure during compilation of command RMAN-03013: command type: recover RMAN-03002: failure during compilation of command RMAN-03013: command type: recover(4) RMAN-06038: recovery catalog package detected an error RMAN-20242: specification does not match any archivelog in the recovery catalog
The problem is probably that the backup of the datafiles is not consistent, that is, the following SQL statement was not issued after the datafile backup:
ALTER SYSTEM ARCHIVE LOG CURRENT
Consequently, the duplicate command is attempting to read the online redo logs for the necessary redo records.
When creating the duplication script, use the set until command to specify a log sequence number for incomplete recovery. For example, to stop recovery at log sequence 5 enter:
run { set until logseq 5 thread 1; allocate auxiliary channel dupdb1 type disk; duplicate target database to dupdb;
}
See Also:
"Creating a Non-Current Duplicate Database" for more information about performing incomplete recovery during the duplication operation. |
In this scenario, you list the database incarnations registered in the recovery catalog and see a database with the name UNKNOWN:
list incarnation of database; RMAN-03022: compiling command: list List of Database Incarnations DB Key Inc Key DB Name DB ID CUR Reset SCN Reset Time ------- ------- ------- ------ --- ---------- ---------- 56 57 SKDHRA 4052472287 YES 1 Sep 03 1998 06:45:51 1 19 UNKNOWN 4141147584 NO 1 Jan 08 1999 14:47:28 1 2 SKDHRC 4141147584 YES 14602 Jan 15 1999 15:32:57
One way you get the DB_NAME of UNKNOWN is when you register a database that was once opened with the RESETLOGS option. The DB_NAME can be changed during a RESETLOGS, so RMAN does not know what the DB_NAME was for those old incarnations of the database because it was not registered in the recovery catalog at the time. Consequently, RMAN sets the DB_NAME column to UNKNOWN when creating the DBINC record.
The UNKNOWN name entry is expected behavior: you should not attempt to remove UNKNOWN entries from the recovery catalog. Also, the backups of this incarnation are not usable--at least with this recovery catalog--so RMAN cannot restore them even if you issue a reset database to incarnation command.
|
![]() Copyright © 1996-2000, Oracle Corporation. All Rights Reserved. |
|