5
Managing Distributed Transactions

This chapter describes how to manage and troubleshoot distributed transactions. Topics include:

Setting Distributed Transaction Initialization Parameters

You can set initialization parameters that control the behavior of distributed transaction processing. The following tables describes initialization parameters relevant for distributed transaction processing:

Parameter Description

DISTRIBUTED_TRANSACTIONS

Specifies the maximum number of distributed transactions in which this database can concurrently participate.

DISTRIBUTED_LOCK_TIMEOUT

Specifies the number of seconds that a distributed transaction waits for locked resources.

DISTRIBUTED_RECOVERY_CONNECTION_HOLD_TIME

Specifies the number of seconds that Oracle holds open a remote connection after a distributed transaction fails.

COMMIT_POINT_STRENGTH

Specifies the value used to determine the commit point site in a distributed transaction.

Parameter	Description
DISTRIBUTED_TRANSACTIONS	Specifies the maximum number of distributed transactions in which this database can concurrently participate.
DISTRIBUTED_LOCK_TIMEOUT	Specifies the number of seconds that a distributed transaction waits for locked resources.
DISTRIBUTED_RECOVERY_CONNECTION_HOLD_TIME	Specifies the number of seconds that Oracle holds open a remote connection after a distributed transaction fails.
COMMIT_POINT_STRENGTH	Specifies the value used to determine the commit point site in a distributed transaction.

This section includes the following topics:

Limiting the Number of Distributed Transactions

The initialization parameter DISTRIBUTED_TRANSACTIONS limits the number of distributed transactions in which a given instance can concurrently participate, both as a client and a server. The default value for this parameter is operating system-dependent.

If Oracle reaches this limit and a subsequent user issues a SQL statement referencing a remote database, then the system rolls back the statement and returns the following error message:

ORA-2042: too many global transactions

For example, assume that you set the parameter as follows for a given instance:

DISTRIBUTED_TRANSACTIONS = 10

In this case, a maximum of 10 sessions can concurrently process a distributed transaction. If an additional session attempts to issue a DML statement requiring distributed access, then Oracle returns an error message to the session and rolls back the statement.

Note:
Oracle recommends setting the value for DISTRIBUTED_TRANSACTIONS equal to the total number of distributed database sites in your environment.

Increasing the Transaction Limit

Consider increasing the value of the DISTRIBUTED_TRANSACTIONS when an instance regularly participates in numerous distributed transactions and the ORA-2042 is frequently returned. Increasing the limit allows more users to concurrently issue distributed transactions.

Decreasing the Transaction Limit

If your site is experiencing an abnormally high number of network failures, you can temporarily decrease the value of DISTRIBUTED_TRANSACTIONS. This operation limits the number of in-doubt transactions in which your site takes part, and thereby limits the amount of locked data at your site, and the number of in-doubt transactions you might have to resolve

Disabling Distributed Transaction Processing

If DISTRIBUTED_TRANSACTIONS is set to zero, no distributed SQL statements can be issued in any session. Also, the RECO background process is not started at startup of the local instance. In-doubt distributed transactions that may be present cannot be automatically resolved by Oracle8i. Therefore, only set this initialization parameter to zero to prevent distributed transactions when a new instance is started and when it is certain that no in-doubt distributed transactions remained after the last instance shut down.

See Also:
Oracle8i Reference for more information about the DISTRIBUTED_TRANSACTIONS initialization parameter.

Specifying the Lock Timeout Interval

When you issue a SQL statement, Oracle8i attempts to lock the resources needed to successfully execute the statement. If the requested data is currently held by statements of other uncommitted transactions, however, and remains locked for a long time, a timeout occurs.

Consider the following scenarios involving data access failure:

Transaction Timeouts

A DML statement that requires locks on a remote database can be blocked if another transaction own locks on the requested data. If these locks continue to block the requesting SQL statement, then the following sequence of events occurs:

A timeout occurs.
Oracle rolls back the statement.

Oracle returns this error message to the user:

ORA-02049: time-out: distributed transaction waiting for lock

Because the transaction did not modify data, no actions are necessary as a result of the timeout. Applications should proceed as if a deadlock has been encountered. The user who executed the statement can try to re-execute the statement later. If the lock persists, then the user should contact an administrator to report the problem.

Setting the Timeout Interval

Use the initialization parameter DISTRIBUTED_LOCK_TIMEOUT to control the timeout interval, which is set in seconds (see "Specifying the Lock Timeout Interval"). For example, to set the timeout interval for an instance to 30 seconds, include the following line in the associated parameter file:

DISTRIBUTED_LOCK_TIMEOUT = 30

The default value for this parameter is 60 seconds. Normally, Oracle waits indefinitely for a lock to be released. With the above timeout interval, the timeout errors discussed in the previous section occur if a transaction cannot proceed after 30 seconds of waiting for unavailable resources.

See Also:
Oracle8i Reference for more information about the DISTRIBUTED_LOCK_TIMEOUT initialization parameter.

Locks From In-Doubt Transactions

A query or DML statement that requires locks on a local database can be blocked indefinitely due to the locked resources of an in-doubt distributed transaction. In this case, Oracle issues the following error message:

ORA-01591: lock held by in-doubt distributed transaction identifier

In this case, Oracle rolls back the SQL statement immediately. The user who executed the statement can try to re-execute the statement later. If the lock persists, the user should contact an administrator to report the problem, including the ID of the in-doubt distributed transaction.

The chances of the above situations occurring are rare considering the low probability of failures during the critical portions of the two-phase commit. Even if such a failure occurs, and assuming quick recovery from a network or system failure, problems are automatically resolved without manual intervention. Thus, problems usually resolve before they can be detected by users or database administrators.

Specifying the Interval for Holding Open Connections

If a distributed transaction fails, then the connection from the local site to the remote site may not close immediately. Instead, it remains open in case communication can be restored quickly, without having to re-establish the connection. Use the following initialization parameter to specify the length of time to hold open a remote connection after a distributed transaction fails:

DISTRIBUTED_RECOVERY_CONNECTION_HOLD_TIME

The default value for this parameter is 200 seconds. If you set larger values, then you minimize reconnection time but also consume local resources for a longer time period. The range of values is between 0 and 1800. You can set this parameter to a value greater than 1800, however, which simply means that the connection never closes.

Specifying the Commit Point Strength of a Node

The database with the highest commit point strength determines which node commits first in a distributed transaction. When specifying a commit point strength for each node, ensure that the most critical server will be non-blocking if a failure occurs during a prepare or commit phase. The following initialization parameter determines a node's commit point strength:

COMMIT_POINT_STRENGTH

The default value is operating system-dependent. The range of values is any integer from 0 to 255. For example, to set the commit point strength of a database to 200, include the following line in that database's initialization parameter file:

COMMIT_POINT_STRENGTH = 200

The commit point strength only determines the commit point site in a distributed transaction.

See Also:
"Commit Point Site" for a conceptual overview of commit points.

Suggestions for Setting the Commit Point Strength

When setting the commit point strength for a database, note the following considerations:

Because the commit point site stores information about the status of the transaction, the commit point site should not be a node that is frequently unreliable or unavailable in case other nodes need information about the transaction's status.
Set the commit point strength for a database relative to the amount of critical shared data in the database. For example, a database on a mainframe computer usually shares more data among users than a database on a PC. Therefore, set the commit point strength of the mainframe to a higher value than the PC.

Viewing Information About Distributed Transactions

The data dictionary of each database stores information about all open distributed transactions. You can use data dictionary tables and views to gain information about the transactions. This section contains the following topics:

Determining the ID Number and Status of Prepared Transactions

The following view show the database links that have been defined at the local database and stored in the data dictionary:

View Purpose

DBA_2PC_PENDING

Lists all in-doubt distributed transactions. The view is empty until populated by an in-doubt transaction. After the transaction is resolved, the view is purged.

View	Purpose
DBA_2PC_PENDING	Lists all in-doubt distributed transactions. The view is empty until populated by an in-doubt transaction. After the transaction is resolved, the view is purged.

Use this view to determine the global commit number for a particular transaction ID. You can use this global commit number when manually resolving an in-doubt transaction.

The following table shows the most relevant columns (for a description of all the columns in the view, see Oracle8i Reference):

Table 5-1 DBA_2PC_PENDING

Column	Description
LOCAL_TRAN_ID	Local transaction identifier in the format integer.integer.integer. Note: When the LOCAL_TRAN_ID and the GLOBAL_TRAN_ID for a connection are the same, the node is the global coordinator of the transaction.
GLOBAL_TRAN_ID	Global database identifier in the format global_db_name.db_hex_id.local_tran_id, where db_hex_id is an eight-character hexadecimal value used to uniquely identify the database. This common transaction ID is the same on every node for a distributed transaction. Note: When the LOCAL_TRAN_ID and the GLOBAL_TRAN_ID for a connection are the same, the node is the global coordinator of the transaction.
STATE	See Table 5-2, "STATE Column of DBA_2PC_PENDING".
MIXED	YES means that part of the transaction was committed on one node and rolled back on another node.
HOST	Name of the host machine.
COMMIT#	Global commit number for committed transactions.

Table 5-2 STATE Column of DBA_2PC_PENDING

collecting	This category normally applies only to the global coordinator or local coordinators. The node is currently collecting information from other database servers before it can decide whether it can prepare.
prepared	The node has prepared and may or may not have acknowledged this to its local coordinator with a prepared message. However, no commit request has been received. The node remains prepared, holding any local resource locks necessary for the transaction to commit.
committed	The node (any type) has committed the transaction, but other nodes involved in the transaction may not have done the same. That is, the transaction is still pending at one or more nodes.
forced commit	A pending transaction can be forced to commit at the discretion of a database administrator. This entry occurs if a transaction is manually committed at a local node by a database administrator.
forced abort (rollback)	A pending transaction can be forced to roll back at the discretion of a database administrator. This entry occurs if this transaction is manually rolled back at a local node by a database administrator.

Execute the following script to query pertinent information in DBA_2PC_PENDING (sample output included):

COL local_tran_id FORMAT a13
COL global_tran_id FORMAT a30
COL state FORMAT a8
COL mixed FORMAT a3
COL host FORMAT a10
COL commit# FORMAT a10

SELECT local_tran_id, global_tran_id, state, mixed, host, commit#
FROM dba_2pc_pending
/

SQL> @pending_txn_script

LOCAL_TRAN_ID GLOBAL_TRAN_ID                 STATE    MIX HOST       COMMIT#
------------- ------------------------------ -------- --- ---------- ----------
1.15.870      HQ.ACME.COM.ef192da4.1.15.870  commit   no  dlsun183   115499

This output indicates that local transaction 1.15.870 has been committed on this node, but it may be pending on one or more other nodes. Because LOCAL_TRAN_ID and the local part of GLOBAL_TRAN_ID are the same, the node is the global coordinator of the transaction.

Tracing the Session Tree of In-Doubt Transactions

The following view shows which in-doubt transactions are incoming from a remote client and which are outgoing to a remote server:

View Purpose

DBA_2PC_NEIGHBORS

Lists all incoming (from remote client) and outgoing (to remote server) in-doubt distributed transactions. It also indicates whether the local node is the commit point site in the transaction.
The view is empty until populated by an in-doubt transaction. After the transaction is resolved, the view is purged.

View	Purpose
DBA_2PC_NEIGHBORS	Lists all incoming (from remote client) and outgoing (to remote server) in-doubt distributed transactions. It also indicates whether the local node is the commit point site in the transaction. The view is empty until populated by an in-doubt transaction. After the transaction is resolved, the view is purged.

When a transaction is in-doubt, you may need to determine which nodes performed which roles in the session tree. Use to this view to determine:

All the incoming and outgoing connections for a given transaction.
Whether the node is the commit point site in a given transaction.
Whether the node is a global coordinator in a given transaction (because its local transaction ID and global transaction ID are the same).

The following table shows the most relevant columns (for an account of all the columns in the view, see Oracle8i Reference):

Table 5-3 DBA_2PC_NEIGHBORS

Column	Description
LOCAL_TRAN_ID	Local transaction identifier with the format integer.integer.integer. Note: When LOCAL_TRAN_ID and GLOBAL_TRAN_ID.DBA_2PC_PENDING for a connection are the same, the node is the global coordinator of the transaction.
IN_OUT	IN for incoming transactions; OUT for outgoing transactions.
DATABASE	For incoming transactions, the name of the client database that requested information from this local node; for outgoing transactions, the name of the database link used to access information on a remote server.
DBUSER_OWNER	For incoming transactions, the local account used to connect by the remote database link; for outgoing transactions, the owner of the database link.
INTERFACE	C is a commit message; N is either a message indicating a prepared state or a request for a read-only commit. When IN_OUT is OUT, C means that the child at the remote end of the connection is the commit point site and knows whether to commit or abort. N means that the local node is informing the remote node that it is prepared. When IN_OUT is IN, C means that the local node or a database at the remote end of an outgoing connection is the commit point site. N means that the remote node is informing the local node that it is prepared.

Execute the following script to query pertinent information in DBA_2PC_PENDING (sample output included):

COL local_tran_id FORMAT a13
COL in_out FORMAT a6
COL database FORMAT a25
COL dbuser_owner FORMAT a15
COL interface FORMAT a3

SELECT local_tran_id, in_out, database, dbuser_owner, interface
FROM dba_2pc_neighbors
/

SQL> CONNECT sys/sys_pwd@hq.acme.com
SQL> @neighbors_script

LOCAL_TRAN_ID IN_OUT DATABASE                  DBUSER_OWNER    INT
------------- ------ ------------------------- --------------- ---
1.15.870      out    SALES.ACME.COM            SYS             C

This output indicates that the local node sent an outgoing request to remote server SALES to commit transaction 1.15.870. If SALES committed the transaction but no other node did, then you know that SALES is the commit point site--because the commit point site always commits first.

Deciding How to Handle In-Doubt Transactions

A transaction is in-doubt when there is a failure during any aspect of the two-phase commit. Distributed transactions become in-doubt in the following ways:

A server machine running Oracle software crashes.
A network connection between two or more Oracle databases involved in distributed processing is disconnected.
An unhandled software error occurs.

See Also:
"In-Doubt Transactions" for a conceptual overview of in-doubt transactions.

You can manually force the commit or rollback of a local, in-doubt distributed transaction. Because this operation can generate consistency problems, perform it only when specific conditions exist.

This section contains the following topics:

Discovering Problems with a Two-Phase Commit

The user application that commits a distributed transaction is informed of a problem by one of the following error messages:

ORA-02050: transaction ID rolled back,
           some remote dbs may be in-doubt
ORA-02051: transaction ID committed,
           some remote dbs may be in-doubt
ORA-02054: transaction ID in-doubt

A robust application should save information about a transaction if it receives any of the above errors. This information can be used later if manual distributed transaction recovery is desired.

No action is required by the administrator of any node that has one or more in-doubt distributed transactions due to a network or system failure. The automatic recovery features of Oracle8i transparently complete any in-doubt transaction so that the same outcome occurs on all nodes of a session tree (that is, all commit or all roll back) after the network or system failure is resolved.

In extended outages, however, you can force the commit or rollback of a transaction to release any locked data. Applications must account for such possibilities.

Determining Whether to Perform a Manual Override

Override a specific in-doubt transaction manually only when one of the following situations exists:

The in-doubt transaction locks data that is required by other transactions. This situation occurs when the ORA-01591 error message interferes with user transactions.
An in-doubt transaction prevents the extents of a rollback segment from being used by other transactions. The first portion of an in-doubt distributed transaction's local transaction ID corresponds to the ID of the rollback segment, as listed by the data dictionary views DBA_2PC_PENDING and DBA_ROLLBACK_SEGS.
The failure preventing the two-phase commit phases to complete cannot be corrected in an acceptable time period. Examples of such cases include a telecommunication network that has been damaged or a damaged database that requires a long recovery time.

Normally, you should make a decision to locally force an in-doubt distributed transaction in consultation with administrators at other locations. A wrong decision can lead to database inconsistencies that can be difficult to trace and that you must manually correct.

If the conditions above do not apply, always allow the automatic recovery features of Oracle8i to complete the transaction. If any of the above criteria are met, however, consider a local override of the in-doubt transaction.

Analyzing the Transaction Data

If you decide to force the transaction to complete, analyze available information with the following goals in mind.

Find a Node That Committed or Rolled Back

Use the DBA_2PC_PENDING view to find a node that has either committed or rolled back the transaction. If you can find a node that has already resolved the transaction, then you can follow the action taken at that node.

Look For Transaction Comments

See if any information is given in the TRAN_COMMENT column of DBA_2PC_PENDING for the distributed transaction. Comments are included in the COMMENT parameter of the COMMIT command.

For example, an in-doubt distributed transaction's comment can indicate the origin of the transaction and what type of transaction it is:

COMMIT COMMENT 'Finance/Accts_pay/Trans_type 10B';

Look For Transaction Advice

See if any information is given in the ADVICE column of DBA_2PC_PENDING for the distributed transaction. An application can prescribe advice about whether to force the commit or force the rollback of separate parts of a distributed transaction with the ADVISE parameter of the SQL command ALTER SESSION.

The advice sent during the prepare phase to each node is the advice in effect at the time the most recent DML statement executed at that database in the current transaction.

For example, consider a distributed transaction that moves an employee record from the EMP table at one node to the EMP table at another node. The transaction can protect the record--even when administrators independently force the in-doubt transaction at each node--by including the following sequence of SQL statements:

ALTER SESSION ADVISE COMMIT;
INSERT INTO emp@hq ... ;    /*advice to commit at HQ */
ALTER SESSION ADVISE ROLLBACK;
DELETE FROM emp@sales ... ; /*advice to roll back at SALES*/

ALTER SESSION ADVISE NOTHING;

If you manually force the in-doubt transaction, the worst that can happen is that each node has a copy of the employee record; the record cannot disappear.

Manually Overriding In-Doubt Transactions

Use the COMMIT or ROLLBACK statement with the FORCE option and a text string that indicates either the local or global transaction ID of the in-doubt transaction to commit.

Note:
In all examples, the transaction is committed or rolled back on the local node, and the local pending transaction table records a value of forced commit or forced abort for the STATE column of this transaction's row.

This section contains the following topics:

Manually Committing an In-Doubt Transaction

Before attempting to commit the transaction, ensure that you have the proper privileges. Note the following requirements:

If the transaction was committed by... Then you must have this privilege...

You

FORCE TRANSACTION

Another user

FORCE ANY TRANSACTION

If the transaction was committed by...	Then you must have this privilege...
You	FORCE TRANSACTION
Another user	FORCE ANY TRANSACTION

Committing Using Only the Transaction ID

The following SQL statement is the command to commit an in-doubt transaction:

COMMIT FORCE 'transaction_id';

The variable transaction_id is the identifier of the transaction as specified in either the LOCAL_TRAN_ID or GLOBAL_TRAN_ID columns of the DBA_2PC_PENDING data dictionary view.

For example, assume that you query DBA_2PC_PENDING and determine the local transaction ID for a distributed transaction:

LOCAL_TRAN_ID          1.45.13

You then issue the following SQL statement to force the commit of this in-doubt transaction:

COMMIT FORCE '1.45.13';

Committing Using an SCN

Optionally, you can specify the SCN for the transaction when forcing a transaction to commit. This feature allows you to commit an in-doubt transaction with the SCN assigned when it was committed at other nodes.

Consequently, you maintain the synchronized commit time of the distributed transaction even if there is a failure. Specify an SCN only when you can determine the SCN of the same transaction already committed at another node.

For example, assume you want to manually commit a transaction with the following global transaction ID:

SALES.ACME.COM.55d1c563.1.93.29

First, query the DBA_2PC_PENDING view of a remote database also involved with the transaction in question. Note the SCN used for the commit of the transaction at that node. Specify the SCN when committing the transaction at the local node. For example, if the SCN is 829381993, issue:

COMMIT FORCE 'SALES.ACME.COM.55d1c563.1.93.29', 829381993;

Manually Rolling Back an In-Doubt Transaction

Before attempting to roll back the in-doubt distributed transaction, ensure that you have the proper privileges. Note the following requirements:

If the transaction was committed by... Then you must have this privilege...

You

FORCE TRANSACTION

Another user

FORCE ANY TRANSACTION

If the transaction was committed by...	Then you must have this privilege...
You	FORCE TRANSACTION
Another user	FORCE ANY TRANSACTION

The following SQL statement is the command to roll back an in-doubt transaction:

ROLLBACK FORCE 'transaction_id';

The variable transaction_id is the identifier of the transaction as specified in either the LOCAL_TRAN_ID or GLOBAL_TRAN_ID columns of the DBA_2PC_PENDING data dictionary view.

For example, to roll back the in-doubt transaction with the local transaction ID of 2.9.4, use the following statement:

ROLLBACK FORCE '2.9.4';

Note:
You cannot roll back an in-doubt transaction to a savepoint.

Purging Pending Rows from the Data Dictionary

Before RECO recovers an in-doubt transaction, the transaction appears in DBA_2PC_PENDING.STATE as COLLECTING, COMMITTED, or PREPARED. If you force an in-doubt transaction using COMMIT FORCE or ROLLBACK FORCE, then the states FORCED COMMIT or FORCED ROLLBACK may appear.

Automatic recovery normally deletes entries in these states. The only exception is when recovery discovers a forced transaction that is in a state inconsistent with other sites in the transaction; in this case, the entry can be left in the table and the MIXED column in DBA_2PC_PENDING has a value of YES.

If automatic recovery is not possible because a remote database has been permanently lost, then recovery cannot identify the re-created database because it receives a new database ID when it is re-created. In this case, you must use the PURGE_LOST_DB_ENTRY procedure in the DBMS_TRANSACTION package to clean up the entries. The entries do not hold up database resources, so there is no urgency in cleaning them up.

Executing the PURGE_LOST_DB_ENTRY Procedure

To manually remove an entry from the data dictionary, use the following syntax (where trans_id is the identifier for the transaction):

DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('trans_id');

For example, to purge pending distributed transaction 1.44.99, enter the following command in SQL*Plus:

EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('1.44.99');

Execute this procedure only if significant reconfiguration has occurred so that automatic recovery cannot resolve the transaction. Examples include:

Total loss of the remote database
Reconfiguration in software resulting in loss of two-phase commit capability
Loss of information from an external transaction coordinator such as a TPMonitor

See Also:
Oracle8i Supplied PL/SQL Packages Reference for more information about the DBMS_TRANSACTION package.

Determining When to Use DBMS_TRANSACTION

The following tables indicates what the various states indicate about the distributed transaction what the administrator's action should be:

STATE Column State of Global Transaction State of Local Transaction Normal Action Alternative Action

Collecting

Rolled back

Rolled back

None

PURGE_LOST_DB_ENTRY (only if autorecovery cannot resolve transaction)

Committed

Committed

Committed

None

PURGE_LOST_DB_ENTRY (only if autorecovery cannot resolve transaction)

Prepared

Unknown

Prepared

None

Force commit or rollback

Forced commit

Unknown

Committed

None

PURGE_LOST_DB_ENTRY (only if autorecovery cannot resolve transaction)

Forced rollback

Unknown

Rolled back

None

PURGE_LOST_DB_ENTRY (only if autorecovery cannot resolve transaction)

Forced commit

Mixed

Committed

Manually remove inconsistencies then use PURGE_MIXED

Forced rollback

Mixed

Rolled back

Manually remove inconsistencies then use PURGE_MIXED

STATE Column	State of Global Transaction	State of Local Transaction	Normal Action	Alternative Action
Collecting	Rolled back	Rolled back	None	PURGE_LOST_DB_ENTRY (only if autorecovery cannot resolve transaction)
Committed	Committed	Committed	None	PURGE_LOST_DB_ENTRY (only if autorecovery cannot resolve transaction)
Prepared	Unknown	Prepared	None	Force commit or rollback
Forced commit	Unknown	Committed	None	PURGE_LOST_DB_ENTRY (only if autorecovery cannot resolve transaction)
Forced rollback	Unknown	Rolled back	None	PURGE_LOST_DB_ENTRY (only if autorecovery cannot resolve transaction)
Forced commit	Mixed	Committed	Manually remove inconsistencies then use PURGE_MIXED
Forced rollback	Mixed	Rolled back	Manually remove inconsistencies then use PURGE_MIXED

See Also:
Oracle8i Supplied PL/SQL Packages Reference for more information about the DBMS_TRANSACTION package.

Manually Committing an In-Doubt Transaction: Example

The following example, illustrated in Figure 5-1, shows a failure during the commit of a distributed transaction. It explains how to go about gaining information before manually forcing the commit or rollback of the local portion of an in-doubt distributed transaction.

Figure 5-1 Example of an In-Doubt Distributed Transaction

In this failure case, the prepare phase completes. During the commit phase, however, the commit point site's commit confirmation never reaches the global coordinator, even though the commit point site committed the transaction.

You are the WAREHOUSE database administrator. The inventory data locked because of the in-doubt transaction is critical to other transactions. The data cannot be accessed, however, because the locks must be held until the in-doubt transaction either commits or rolls back. Furthermore, you understand that the communication link between sales and headquarters cannot be resolved immediately.

Therefore, you decide to manually force the local portion of the in-doubt transaction using the following steps:

Record user feedback.
Query the local DBA_2PC_PENDING view to obtain the global transaction ID and get other information about the in-doubt transaction.
Query the local DBA_2PC_NEIGHBORS view to begin tracing the session tree so that you can find a node that resolved the in-doubt transaction.
Check the mixed outcome flag after normal communication is re-established.

The following sections explain each step in detail for this example:

Step 1: Record User Feedback

The users of the local database system that conflict with the locks of the in-doubt transaction receive the following error message:

ORA-01591: lock held by in-doubt distributed transaction 1.21.17

In this case, 1.21.17 is the local transaction ID of the in-doubt distributed transaction. You should request and record this ID number from users that report problems to identify which in-doubt transactions should be forced.

Step 2: Query DBA_2PC_PENDING

After connecting with SQL*Plus to WAREHOUSE, query the local DBA_2PC_PENDING data dictionary view to gain information about the in-doubt transaction:

CONNECT sys/sys_pwd@warehouse.acme.com
SELECT * FROM sys.dba_2pc_pending WHERE local_tran_id = '1.21.17';

Oracle returns the following information:

Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.21.17
GLOBAL_TRAN_ID         SALES.ACME.COM.55d1c563.1.93.29
STATE                  prepared
MIXED                  no
ADVICE
TRAN_COMMENT           Sales/New Order/Trans_type 10B
FAIL_TIME              31-MAY-91
FORCE_TIME
RETRY_TIME             31-MAY-91
OS_USER                SWILLIAMS
OS_TERMINAL            TWA139:
HOST                   system1
DB_USER                SWILLIAMS
COMMIT#

Determining the Global Transaction ID

The global transaction ID is the common transaction ID that is the same on every node for a distributed transaction. It is of the form:

global_database_name.hhhhhhhh.local_transaction_id

where:

global_database_name

is the database name of the global coordinator.

hhhhhhh

is the internal database identifier of the global coordinator (in hexadecimal).

local_transaction_id

is the corresponding local transaction ID assigned on the global coordinator.

Note that the last portion of the global transaction ID and the local transaction ID match at the global coordinator. In the example, you can tell that WAREHOUSE is not the global coordinator because these numbers do not match:

LOCAL_TRAN_ID          1.21.17
GLOBAL_TRAN_ID         ... 1.93.29

Determining the State of the Transaction

The transaction on this node is in a prepared state:

STATE prepared

Therefore, WAREHOUSE waits for its coordinator to send either a commit or a rollback request.

Looking For Comments or Advice

The transaction's comment or advice can include information about this transaction. If so, use this comment to your advantage. In this example, the origin and transaction type is in the transaction's comment:

TRAN_COMMENT           Sales/New Order/Trans_type 10B

This information can reveal something that helps you decide whether to commit or rollback the local portion of the transaction. If useful comments do not accompany an in-doubt transaction, you must complete some extra administrative work to trace the session tree and find a node that has resolved the transaction.

Step 3: Query DBA_2PC_NEIGHBORS on Local Node

The purpose of this step is to climb the session tree so that you find coordinators, eventually reaching the global coordinator. Along the way, you may find a coordinator that has resolved the transaction. If not, you can eventually work your way to the commit point site, which will always have resolved the in-doubt transaction. To trace the session tree, query the DBA_2PC_NEIGHBORS view on each node.

In this case, you query this view on the WAREHOUSE database:

CONNECT sys/sys_pwd@warehouse.acme.com
SELECT * FROM dba_2pc_neighbors
  WHERE local_tran_id = '1.21.17'
  ORDER BY sess#, in_out;

Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.21.17
IN_OUT                 in
DATABASE               SALES.ACME.COM
DBUSER_OWNER           SWILLIAMS
INTERFACE              N
DBID                   000003F4
SESS#                  1
BRANCH                 0100

Obtaining Database Role and Database Link Information

The DBA_2PC_NEIGHBORS view provides information about connections associated with an in-doubt transaction. Information for each connection is different, based on whether the connection is inbound (IN_OUT = in) or outbound (IN_OUT = out):

IN_OUT Meaning DATABASE DBUSER_OWNER

in

Your node is a server of another node.

Lists the name of the client database that connected to your node.

Lists the local account for the database link connection that corresponds to the in-doubt transaction.

out

Your node is a client of other servers.

Lists the name of the database link that connects to the remote node.

Lists the owner of the database link for the in-doubt transaction.

IN_OUT	Meaning	DATABASE	DBUSER_OWNER
in	Your node is a server of another node.	Lists the name of the client database that connected to your node.	Lists the local account for the database link connection that corresponds to the in-doubt transaction.
out	Your node is a client of other servers.	Lists the name of the database link that connects to the remote node.	Lists the owner of the database link for the in-doubt transaction.

In this example, the IN_OUT column reveals that the WAREHOUSE database is a server for the SALES client, as specified in the DATABASE column:

IN_OUT                 in
DATABASE               SALES.ACME.COM

The connection to WAREHOUSE was established through a database link from the SWILLIAMS account, as shown by the DBUSER_OWNER column:

DBUSER_OWNER           SWILLIAMS

Determining the Commit Point Site

Additionally, the INTERFACE column tells whether the local node or a subordinate node is the commit point site:

INTERFACE              N

Neither WAREHOUSE nor any of its descendants is the commit point site, as shown by the INTERFACE column.

Step 4: Querying Data Dictionary Views on All Nodes

At this point, you can contact the administrator at the located nodes and ask each person to repeat Steps 2 and 3 using the global transaction ID.

Note:
If you can directly connect to these nodes with another network, you can repeat Steps 2 and 3 yourself.

For example, the following results are returned when Steps 2 and 3 are performed at SALES and HQ.

Checking the Status of Pending Transactions at SALES

At this stage, the SALES administrator queries the DBA_2PC_PENDING data dictionary view:

SQL> CONNECT sys/sys_pwd@sales.acme.com
SQL> SELECT * FROM sys.dba_2pc_pending
   > WHERE global_tran_id = 'SALES.ACME.COM.55d1c563.1.93.29';

Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.93.29
GLOBAL_TRAN_ID         SALES.ACME.COM.55d1c563.1.93.29
STATE                  prepared
MIXED                  no
ADVICE
TRAN_COMMENT           Sales/New Order/Trans_type 10B
FAIL_TIME              31-MAY-91
FORCE_TIME
RETRY_TIME             31-MAY-91
OS_USER                SWILLIAMS
OS_TERMINAL            TWA139:
HOST                   system1
DB_USER                SWILLIAMS
COMMIT#

Determining the Coordinators and Commit Point Site at SALES

Next, the SALES administrator queries DBA_2PC_NEIGHBORS to determine the global and local coordinators as well as the commit point site:

SELECT * FROM dba_2pc_neighbors
   WHERE global_tran_id = 'SALES.ACME.COM.55d1c563.1.93.29'
   ORDER BY sess#, in_out;

This query returns three rows:

The connection to WAREHOUSE
The connection to HQ
The connection established by the user

Reformatted information corresponding to the rows for the WAREHOUSE connection appears below:

Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.93.29
IN_OUT                 OUT
DATABASE               WAREHOUSE.ACME.COM
DBUSER_OWNER           SWILLIAMS
INTERFACE              N
DBID                   55d1c563
SESS#                  1
BRANCH                 1

Reformatted information corresponding to the rows for the HQ connection appears below:

Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.93.29
IN_OUT                 OUT
DATABASE               HQ.ACME.COM
DBUSER_OWNER           ALLEN
INTERFACE              C
DBID                   00000390
SESS#                  1
BRANCH                 1

The information from the previous query reveals the following:

SALES is the global coordinator because the local transaction ID and global transaction ID match.
Two outbound connections are established from this node, but no inbound connections. SALES is not the server of another node.
HQ or one of its servers is the commit point site.

Checking the Status of Pending Transactions at HQ:

At this stage, the HQ administrator queries the DBA_2PC_PENDING data dictionary view:

SELECT * FROM dba_2pc_pending
   WHERE global_tran_id = 'SALES.ACME.COM.55d1c563.1.93.29';

Column Name            Value
---------------------- --------------------------------------
LOCAL_TRAN_ID          1.45.13
GLOBAL_TRAN_ID         SALES.ACME.COM.55d1c563.1.93.29
STATE                  COMMIT
MIXED                  NO
ACTION
TRAN_COMMENT           Sales/New Order/Trans_type 10B
FAIL_TIME              31-MAY-91
FORCE_TIME
RETRY_TIME             31-MAY-91
OS_USER                SWILLIAMS
OS_TERMINAL            TWA139:
HOST                   SYSTEM1
DB_USER                SWILLIAMS
COMMIT#                129314

At this point, you have found a node that resolved the transaction. As the view reveals, it has been committed and assigned a commit ID number:

STATE                  COMMIT
COMMIT#                129314

Therefore, you can force the in-doubt transaction to commit at your local database. It is a good idea to contact any other administrators you know that could also benefit from your investigation.

Step 5: Commit the In-Doubt Transaction

You contact the administrator of the SALES database, who manually commits the in-doubt transaction using the global ID:

SQL> CONNECT sys/sys_pwd@sales.acme.com
SQL> COMMIT FORCE 'SALES.ACME.COM.55d1c563.1.93.29';

As administrator of the WAREHOUSE database, you manually commit the in-doubt transaction using the global ID:

SQL> CONNECT sys/sys_pwd@warehouse.acme.com
SQL> COMMIT FORCE 'SALES.ACME.COM.55d1c563.1.93.29';

Step 6: Check for Mixed Outcome Using DBA_2PC_PENDING

After you manually force a transaction to commit or roll back, the corresponding row in the pending transaction table remains. The state of the transaction is changed depending on how you forced the transaction.

Every Oracle8i database has a pending transaction table. This is a special table that stores information about distributed transactions as they proceed through the two-phase commit phases. You can query a database's pending transaction table through the DBA_2PC_PENDING data dictionary view (see Table 5-1, "DBA_2PC_PENDING").

Also of particular interest in the pending transaction table is the mixed outcome flag as indicated in DBA_2PC_PENDING.MIXED. You can make the wrong choice if a pending transaction is forced to commit or roll back. For example, the local administrator rolls back the transaction, but the other nodes commit it. Incorrect decisions are detected automatically, and the damage flag for the corresponding pending transaction's record is set (MIXED=yes).

The RECO (Recoverer) background process uses the information in the pending transaction table to finalize the status of in-doubt transactions. You can also use the information in the pending transaction table to manually override the automatic recovery procedures for pending distributed transactions.

All transactions automatically resolved by RECO are automatically removed from the pending transaction table. Additionally, all information about in-doubt transactions correctly resolved by an administrator (as checked when RECO reestablishes communication) are automatically removed from the pending transaction table. However, all rows resolved by an administrator that result in a mixed outcome across nodes remain in the pending transaction table of all involved nodes until they are manually deleted.

Simulating Distributed Transaction Failure

You can force the failure of a distributed transaction for the following reasons:

To observe RECO automatically resolving the local portion of the transaction.
To practice manually resolving in-doubt distributed transactions and observing the results.

The following sections describes the features available and the steps necessary to perform such operations.

Forcing a Distributed Transaction to Fail

You can include comments in the COMMENT parameter of the COMMIT statement. To intentionally induce a failure during the two-phase commit phases of a distributed transaction, include the following comment in the COMMENT parameter:

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-n';

where n is one of the following integers:

n

Effect

1

Crash commit point site after collect

2

Crash non-commit point site after collect

3

Crash before prepare (non-commit point site)

4

Crash after prepare (non-commit point site)

5

Crash commit point site before commit

6

Crash commit point site after commit

7

Crash non-commit point site before commit

8

Crash non-commit point site after commit

9

Crash commit point site before forget

10

Crash non-commit point site before forget

For example, the following statement returns the following messages if the local commit point strength is greater than the remote commit point strength and both nodes are updated:

COMMIT COMMENT 'ORA-2PC-CRASH-TEST-7';

ORA-02054: transaction 1.93.29 in-doubt
ORA-02059: ORA-CRASH-TEST-7 in commit comment

At this point, the in-doubt distributed transaction appears in the DBA_2PC_PENDING view. If enabled, RECO automatically resolves the transaction.

Disabling and Enabling RECO

The RECO background process of an Oracle8i instance automatically resolves failures involving distributed transactions. At exponentially growing time intervals, the RECO background process of a node attempts to recover the local portion of an in-doubt distributed transaction.

RECO can use an existing connection or establish a new connection to other nodes involved in the failed transaction. When a connection is established, RECO automatically resolves all in-doubt transactions. Rows corresponding to any resolved in-doubt transactions are automatically removed from each database's pending transaction table.

You can enable and disable RECO using the ALTER SYSTEM statement with the ENABLE/DISABLE DISTRIBUTED RECOVERY options. For example, you can temporarily disable RECO to force the failure of a two-phase commit and manually resolve the in-doubt transaction.

The following statement disables RECO:

ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;

Alternatively, the following statement enables RECO so that in-doubt transactions are automatically resolved:

ALTER SYSTEM ENABLE DISTRIBUTED RECOVERY;

Note:
Single-process instances (for example, a PC running MS-DOS) have no separate background processes, and therefore no RECO process. Therefore, when a single-process instance that participates in a distributed system is started, you must manually enable distributed recovery using the statement above.

See Also:
Your Oracle operating system-specific documentation for more information about distributed transaction recovery for single-process instances.

Managing Read Consistency

An important restriction exists in Oracle's implementation of distributed read consistency. The problem arises because each system has its own SCN, which you can view as the database's internal timestamp. The Oracle database server uses the SCN to decide which version of data is returned from a query.

The SCNs in a distributed transaction are synchronized at the end of each remote SQL statement and at the start and end of each transaction. Between two nodes that have heavy traffic and especially distributed updates, the synchronization is frequent. Nevertheless, no practical way exists to keep SCNs in a distributed system absolutely synchronized: a window always exists in which one node may have an SCN that is somewhat in the past with respect to the SCN of another node.

Because of the SCN gap, you can execute a query that uses a slightly old snapshot, so that the most recent changes to the remote database are not seen. In accordance with read consistency, a query can therefore retrieve consistent, but out-of-date data. Note that all data retrieved by the query will be from the old SCN, so that if a locally executed update transaction updates two tables at a remote node, then data selected from both tables in the next remote access contain data prior to the update.

One consequence of the SCN gap is that two consecutive SELECT statements can retrieve different data even though no DML has been executed between the two statements. For example, you can issue an update statement and then commit the update on the remote database. When you issue a SELECT statement on a view based on this remote table, the view does not show the update to the row. The next time that you issue the SELECT statement, the update is present.

You can use the following techniques to ensure that the SCNs of the two machines are synchronized just before a query:

Because SCNs are synchronized at the end of a remote query, precede each remote query with a dummy remote query to the same site, for example, SELECT * FROM dual@remote.
Because SCNs are synchronized at the start of every remote transaction, commit or roll back the current transaction before issuing the remote query.

global_database_name	is the database name of the global coordinator.
hhhhhhh	is the internal database identifier of the global coordinator (in hexadecimal).
local_transaction_id	is the corresponding local transaction ID assigned on the global coordinator.

n	Effect
1	Crash commit point site after collect
2	Crash non-commit point site after collect
3	Crash before prepare (non-commit point site)
4	Crash after prepare (non-commit point site)
5	Crash commit point site before commit
6	Crash commit point site after commit
7	Crash non-commit point site before commit
8	Crash non-commit point site after commit
9	Crash commit point site before forget
10	Crash non-commit point site before forget

5 Managing Distributed Transactions