F
Oracle Cluster Management Software
for Linux

Oracle Cluster Management Software (OCMS) is available with Oracle9i on Linux systems. This appendix contains the following sections:

Overview
Watchdog Daemon
Cluster Manager
Starting OCMS
Watchdog Daemon and Cluster Manager Starting Options

Overview

OCMS is included with the Oracle9i Enterprise Edition for Linux. It provides cluster membership services, a global view of clusters, node monitoring, and cluster reconfiguration. It is a component of Oracle9i Real Application Clusters on Linux and is installed automatically when you choose Oracle9i Real Application Clusters. OCMS consists of the following components:

Watchdog Daemon
Cluster Manager

Figure F-1 shows how the Watchdog daemon provides services to the Cluster Manager.

Figure F-1 Oracle Instance and Components of OCMS

This graphic shows the components of OCMS.

Watchdog Daemon

The Watchdog daemon (watchdogd) uses a software-implemented Watchdog timer to monitor selected system resources to prevent database corruption.Watchdog timer is a feature of the Linux kernel. The Watchdog daemon is part of Oracle9i Real Application Clusters.

The Watchdog daemon monitors the Cluster Manager and passes notifications to the Watchdog timer at defined intervals. The behavior of the Watchdog timer is partially controlled by the CONFIG_WATCHDOG_NOWAYOUT configuration parameter of the Linux kernel.

If you use Oracle9i Real Application Clusters, you must set the value of the CONFIG_WATCHDOG_NOWAYOUT configuration parameter to Y. If the Watchdog Timer detects an Oracle instance or Cluster Manager failure, it resets the instance to avoid possible database corruption.

For information on how to set the CONFIG_WATCHDOG_NOWAYOUT parameter, see the /usr/src/linux/Documentation/configure.help file in the Linux kernel source code. For more information on Watchdog devices, see the /usr/src/linux/Documentation/watchdog.txt file in the Linux kernel source code.

Cluster Manager

The Cluster Manager maintains the status of the nodes and the Oracle instances across the cluster. The Cluster Manager process runs on each node of the Real Applications Cluster. Each node has one Cluster Manager. The number of Oracle instances for each node is not limited by Oracle9i Real Application Clusters. The Cluster Manager uses the following communication channels between nodes:

Private network
Quorum partition on the shared disk

During normal cluster operations, the Cluster Managers on each node of the cluster communicate with each other through heartbeat messages sent over the private network. The quorum partition is used as an emergency communication channel if a heartbeat message fails. A heartbeat message can fail for the following reasons:

The Cluster Manager terminates on a node
The private network fails
There is an abnormally heavy load on the node

The Cluster Manager uses the quorum partition to determine the reason for the failure. From each node, the Cluster Manager periodically updates the designated block on the quorum partition. Other nodes check the timestamp for each block. If the message from one of the nodes does not arrive, but the corresponding partition on the quorum has a current timestamp, the network path between this node and other nodes has failed.

Each Oracle instance registers with the local Cluster Manager. The Cluster Manager monitors the status of local Oracle instances and propagates this information to Cluster Managers on other nodes. If the Oracle instance fails on one of the nodes, the following events occur:

The Cluster Manager on the node with the failed Oracle instance informs the Watchdog daemon about the failure.
The Watchdog daemon requests the Watchdog timer to reset the failed node.
The Watchdog timer resets the node.
The Cluster Managers on the surviving nodes inform their local Oracle instances that the failed node is removed from the cluster.
Oracle instances in the surviving nodes start the Oracle9i Real Application Clusters reconfiguration procedure.

The nodes must reset if an Oracle instance fails. This ensures that:

No physical I/O requests to the shared disks from the failed node occur after the Oracle instance fails.
Surviving nodes can start the cluster reconfiguration procedure without corrupting the data on the shared disk.

See Also:

"Configuring Timing for Cluster Reconfiguration" and "Watchdog Daemon and Cluster Manager Starting Options" for more information on the Cluster Manager.

Starting OCMS

The following sections describe how to start OCMS:

Starting the Watchdog Daemon
Configuring the Cluster Manager
Starting the Cluster Manager

Configuring Timing for Cluster Reconfiguration

Note:

Oracle Corporation supplies the $ORACLE_HOME/oracm/bin/ocmstart.sh sample startup script. Run this script as the root user. Make sure that the ORACLE_HOME and PATH environment variables are set as described in the Oracle9i Installation Guide Release 2 (9.2.0.1.0) for UNIX Systems. After you are familiar with starting the Watchdog daemon and the Cluster Manager, you can use the script to automate the start-up process.

Starting the Watchdog Daemon

To start the Watchdog daemon, enter the following:

$ su root
# cd $ORALE_HOME/oracm/bin
# watchdogd

Note:

Always start the Watchdog daemon as the root user.

The default Watchdog daemon log file is $ORACLE_HOME/oracm/log/wdd.log.

The Watchdog daemon does not have configuration files. Table F-1 describes the arguments that you can use when starting the Watchdog daemon.

Table F-1 Watchdogd Daemon Arguments

Argument	Valid Values	Default Value	Description
`-l` `number`	0 or 1	1	If the value is 0, no resources are registered for monitoring. This argument is used for debugging system configuration problems.If the value is 1, the Cluster Managers are registered for monitoring. Oracle Corporation recommends using this option for normal operations.
`-m` `number`	5000 to 180000 ms	5000	The Watchdog daemon expects to receive heartbeat messages from all clients (`oracm` threads) within the time specified by this value. If a client fails to send a heartbeat message within this time, the Watchdog daemon stops sending heartbeat message to the kernel Watchdog timer, causing the system to reset.
`-d` `string`		`/dev/watchdog`	Path of the device file for the Watchdog timer.
`-e` `string`		`$ORACLE_HOME/oracm/log/wdd.log`	Filename of the Watchdog daemon log file.

Configuring the Cluster Manager

You must create the $ORACLE_HOME/oracm/admin/cmcfg.ora Cluster Manager configuration file on each node of the cluster before starting OCMS. Include the following parameters in this file:

PublicNodeNames
PrivateNodeNames
CmDiskFile
WatchdogTimerMargin
HostName

Before creating the cmcfg.ora file, verify that the /etc/hosts file on each node of the cluster has an entry for the public network (public name) and an entry for the private network (private name for each node). The private network is used by the Oracle9i Real Application Clusters internode communication. The CmDiskFile parameter defines the location of the Cluster Manager quorum partition. The CmDiskFile parameter on each node in a cluster must specify the same quorum partition.

The following example shows a cmcfg.ora file on the first node of a four node cluster:

PublicNodeNames=pubnode1 pubnode2 pubnode3 pubnode4
PrivateNodeNames=prinode1 prinode2 prinode3 prinode4
CmDiskFile=/dev/raw1
WatchdogTimerMargin=1000
HostName=prinode1

Table F-2 lists all of the configurable Cluster Manager parameters in the cmcfg.ora file.

Table F-2 Cluster Manager Parameters of the cmcfg.oraFile

Parameter	Valid Values	Default Value	Description
CmDiskFile	Directory path, up to 256 characters in length	No default value. You must set the value explicitly.	Specifies the pathname of the quorum partition.
MissCount	2 to 1000	5	Specifies the time that the Cluster Manager waits for a heartbeat from the remote node before declaring that node inactive. The time in seconds is determined by multiplying the value of the MissCount parameter by 3.
PublicNodeNames	List of host names, up to 4096 characters in length	No default value.	Specifies the list of all host names for the public network, separated by spaces. List host names in the same order on each node.
PrivateNodeNames	List of host names, up to 4096 characters in length	No default value.	Specifies the list of all host names for the private network, separated by spaces. List host names in the same order on each node.
HostName	A host name, up to 256 characters in length	No default value.	Specifies the local host name for the private network. Define this name in the `/etc/hosts` file.
ServiceName	A service, up to 256 characters in length	CMSrvr	Specifies the service name to be used for communication between Cluster Managers. If a Cluster Manager cannot find the service name in the `/etc/services` file, it uses the port specified by the ServicePort parameter. ServiceName is a fixed-value parameter in this release. Use the ServicePort parameter if you need to choose an alternative port for the Cluster Manager to use.
ServicePort	Any valid port number	9998	Specifies the number of the port to be used for communication between cluster managers when the ServiceName parameter does not specify a service.
WatchdogTimerMargin	1000 to 180000ms	No default value	The same as the value of the soft_margin parameter specified at Linux `softdog` startup. The value of the soft_margin parameter is specified in seconds and the value of the WatchdogTimerMargin parameter is specified in milliseconds. This parameter is part of the formula that specifies the time between when the Cluster Manager on the local node detects an Oracle instance failure or join on any node and when it reports the cluster reconfiguration to the Oracle instance on the local node. See "Configuring Timing for Cluster Reconfiguration" for information on this formula.
WatchdogSafetyMargin	1000 to 180000ms	5000ms	Specifies the time between when the cluster manager detects a remote node failure and when the cluster reconfiguration is started.This parameter is part of the formula that specifies the time between when the Cluster Manager on the local node detects an Oracle instance failure or join on any node and when it reports the cluster reconfiguration to the Oracle instance on the local node. See "Configuring Timing for Cluster Reconfiguration" for information on this formula.

Starting the Cluster Manager

To start the Cluster Manager:

Confirm that the Watchdog daemon is running.
Confirm that the host names specified by the PublicNodeNames and PrivateNodeNames parameters in the cmcfg.ora file are listed in the /etc/hosts file.
As the root user, start the oracm process as a background process. Redirect any output to a log file. For example, enter the following:
```
$ su root
# cd $ORACLE_HOME/oracm/bin
# oracm </dev/null >$ORACLE_HOME/oracm/log/cm.out 2>&1 &
```
In the preceding example, all of the output messages and error messages are written to the $ORACLE_HOME/oracm/log/cm.out file.

The oracm process spawns multiple threads. To list all of the threads, enter the ps -elf command.

Table F-3 describes the arguments of the oracm executable.

Table F-3 Arguments for the oracm Executable

Argument	Values	Default Value	Description
/a:`action`	0,1	0	Specifies the action taken when the LMON process or another Oracle process that can write to the shared disk terminates abnormally. If `action` is set to 0 (the default), no action is taken. If `action` is set to 1, the Cluster Manager requests the Watchdog daemon to stop the node completely.
/`l`:`filename`	Any	`/$ORACLE_HOME/oracm/log/cm.log`	Specifies the pathname of the log file for the Cluster Manager. The maximum pathname length is 192 characters.
`/?`	None	None	Shows help for the arguments of the `oracm` executable. The Cluster Manager does not start if you specify this argument.
`/m`	Any	25000000	The size of the `oracm` log file in bytes.

Configuring Timing for Cluster Reconfiguration

To avoid database corruption when a node fails, there is a delay before the Oracle9i Real Application Clusters reconfiguration commences. Without this delay, simultaneous access of the same data block by the failed node and the node performing the recovery can cause database corruption. The length of the delay is defined by the sum of the following:

Value of the WatchdogTimerMargin parameter
Value of the WatchdogSafetyMargin parameter
Value of the Watchdog daemon -m command-line argument

See also:

Table F-2 for more information on the WatchdogTimerMargin and WatchdogSafetyMargin parameters, and Table F-1 for more information on the Watchdog daemon -m command-line argument.

If you use the default values for the Linux kernel soft_margin and Cluster Manager parameters, the time between when the failure is detected and the start of the cluster reconfiguration is 70 seconds. For most workloads this time can be significantly reduced. The following example shows how to decrease the time of the reconfiguration delay from 70 seconds to 20 seconds:

Set the value of WatchdogTimerMargin (soft_margin) parameter to 10 seconds.
Leave the value of the WatchdogSafetyMargin parameter at the default value, 5000ms.
Leave the value of the Watchdog daemon -m command-line argument at the default value, 5000ms.

To change the values of the WatchdogTimerMargin (soft_margin) and the WatchdogSafetyMargin:

Stop the Oracle instance.
Reload the softdog module with the new value of soft_margin. For example, enter:
```
#/sbin/insmod softdog soft_margin=10
```
Change the value of the WatchdogTimerMargin in the $ORACLE_HOME/oracm/admin/cmcfg.ora file. For example, edit the following line:
```
WatchdogTimerMargin=50000
```
Restart watchdogd with the -m command-line argument set to 5000.
Restart the oracm executable.
Restart the Oracle instance.

Watchdog Daemon and Cluster Manager Starting Options

OCMS supports node fencing by completely resetting the node if an Oracle instance fails and the Cluster Manager thread malfunctions. This approach guarantees that the database is not corrupted.

However, it is not always necessary to reset the node if an Oracle instance fails. If the Oracle instance uses synchronous I/O, a node reset is not required. In addition, in some cases where the Oracle instance uses asynchronous I/O, it is not necessary to reset the node, depending on how asynchronous I/O is implemented in the Linux kernel. For a list of certified Linux kernels that do not require node-reset, see the Oracle Technology Network Web site at the following URL:

http://otn.oracle.com

The /a:action flag in the following command defines OCMS behavior when an Oracle process fails:

$ oracm /a:[action]

In the preceding example, if the action argument is set to 0, the node does not reset.By default, the watchdog daemon starts with the -l 1 option and the oracm process starts with the /a:0 option. With these default values, the node resets only if the oracm or watchdogd process terminates. It does not reset if an Oracle process that can write to the disk terminates. This is safe if you are using a certified Linux kernel that does not require node-reset.

In the preceding example, if the action argument is set to 1, the node resets if the oracm command, watchdogd daemon, or Oracle process that can write to the disk terminates. In these situations, a SHUTDOWN ABORT command on an Oracle instance resets the node and terminates all Oracle instances that are running on that node.

F Oracle Cluster Management Software for Linux

Overview

Watchdog Daemon

Cluster Manager

Starting OCMS

Starting the Watchdog Daemon

Configuring the Cluster Manager

Starting the Cluster Manager

Configuring Timing for Cluster Reconfiguration

Watchdog Daemon and Cluster Manager Starting Options

F
Oracle Cluster Management Software
for Linux