Oracle增加节点标准方法， /u01 损坏的处理

本文链接：https://blog.csdn.net/jnrjian/article/details/136940364

SYMPTOMS

When attempting to resize the /u01 filesystem, the /u01 filesystem became corrupted with clusterware showing as not usable.

The output of fsck shows corrupted inode tables.

CHANGES

CAUSE

Attempt to re-size /u01 file system caused corruption of /u01. Damage is non-repairable.

This requires dropping and re-creation of /u01 file system.

SOLUTION

Drop and re-create the /u01 file system per the process below:

Nodes:

Bad node exadb01 <<< Failing node

Good Node exadb02 <<< Surviving node

If the /u01 file system is damaged, but able to be mounted, backup whatever data is possible.

Step 1: Remove the Failed Database Server from the Cluster

1. Disable the listener that runs on the failed database server:

[oracle@surviving]$ srvctl disable listener -n exadb01

[oracle@surviving]$ srvctl stop listener -n exadb01

PRCC-1017 : LISTENER was already stopped on exadb01

2. Delete the Oracle Home from the Oracle inventory:

[oracle@surviving]$ cd ${ORACLE_HOME}/oui/bin

[oracle@surviving]$ ./runInstaller –updateNodeList ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1

"CLUSTER_NODES=exadb02"

Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB. Actual 16383 MB

Passed

The inventory pointer is located at /etc/oraInst.loc

The inventory is located at /u01/app/oraInventory

'UpdateNodeList' was successful.

3. Verify that the failed database server is unpinned:

[oracle@surviving]$ olsnodes -s -t

exadb01 Inactive Unpinned

exadb02 Active Unpinned

4. Stop the VIP Resources for the failed database server and delete:

[root@surviving]# srvctl stop vip -i exadb01-vip

PRCC-1016 : exadb01-vip.####.com was already stopped

[root@surviving]# srvctl remove vip -i exadb01-vip

Please confirm that you intend to remove the VIPs exadb01-vip (y/[n]) y

5. Delete the node from the cluster:

[root@surviving]# crsctl delete node -n exadb01

CRS-4661: Node exadb01 successfully deleted.

6. Update the Oracle Inventory:

[oracle@surviving]$ cd ${ORACLE_HOME}/oui/bin

[oracle@surviving]$ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/11.2.0/grid

"CLUSTER_NODES=exadb02" CRS=TRUE Starting Oracle Universal

Installer...

Checking swap space: must be greater than 500 MB. Actual 16383 MB

Passed

The inventory pointer is located at /etc/oraInst.loc

The inventory is located at /u01/app/oraInventory

'UpdateNodeList' was successful.

7. Verify the node deletion is successful:

[oracle@surviving]$ cluvfy stage -post nodedel -n exadb01 -verbose

Performing post-checks for node removal

Checking CRS integrity...

The Oracle clusterware is healthy on node "exadb02"

CRS integrity check passed

Result:

Node removal check passed

Post-check for node removal was successful

Step 2: Drop and Re-create /u01

/dev/mapper/VGExaDb-LVDbOra1 (* * * *% ) /u01

# umount /u01
Issue df –k and see the file system type and more the /etc/fstab to verify the mount options.
Re-format the /u01 to clean it out and re-create the inodes.
1. # mkfs -t ext3 /dev/mapper/VGExaDb-LVDbOra1 /u01
2. OR
3. # mkfs.ext3 /dev/mapper/VGExaDb-LVDbOra1 /u01

Note: when you're creating the filesystem, check the filesystem version on a healthy node first to determine if it's ext3 or ext4 - then create the same version. The more recent factory versions ship with ext4.

# mount -t /dev/mapper/VGExaDb-LVDbOra1 /u01
On the failed node, create the directories for
1. /u01/app
2. /u01/app/11.2.0.4/grid
Grant correct ownership and permissions on directories
1. [root@replacement]# mkdir -p /u01/app/11.2.0.4/grid/
2. [root@replacement]# chown oracle /u01/app/11.2.0.4/grid
3. [root@replacement]# chgrp -R oinstall /u01/app/11.2.0.4/grid
4. [root@replacement]#chmod -R 775 /u01/

Step 3: Add Node back to cluster:

Clone Oracle Grid Infrastructure to the Replacement Database Server

1. Verify the hardware and operating system installations with the Cluster Verification Utility (CVU):

[oracle@surviving]$ cluvfy stage -post hwos -n exadb01,exadb02 –verbose

At the end of the report, you should see the text: “Post-check for hardware and operating system setup was successful.”

2. Verify peer compatibility:

[oracle@surviving]$ cluvfy comp peer -refnode exadb02 -n exadb01 –orainv

oinstall -osdba dba | grep -B 3 -A 2 mismatched

Compatibility check: Available memory [reference node: exadb02]

Node Name Status Ref. node status Comment

------------ ----------------------- ----------------------- ----------

exadb01 31.02GB (3.2527572E7KB) 29.26GB (3.0681252E7KB) mismatched

Available memory check failed

Compatibility check: Free disk space for "/tmp" [reference node: exadb02]

Node Name Status Ref. node status Comment

------------ ----------------------- ---------------------- ----------

exadb01 55.52GB (5.8217472E7KB) 51.82GB (5.4340608E7KB) mismatched

Free disk space check failed

If the only components that failed are related to physical memory, swap space, and disk space, then it is safe to continue.

3. Perform requisite checks for node addition:

[oracle@surviving]$ cluvfy stage -pre nodeadd -n exadb01 -fixup -fixupdir

/home/oracle/fixup.d

If the only component that fails is related to swap space, then it is safe to continue.

4. Add the replacement database server into the cluster:

NOTE: addnode.sh may error out with files that are only readable by root giving error similar to MOS 1526405.1 so following the workaround for these files and rerun the addnode.sh again.

[oracle@surviving]$ cd /u01/app/11.2.0/grid/oui/bin/

[oracle@surviving]$ ./addnode.sh -silent "CLUSTER_NEW_NODES={exadb01}"

"CLUSTER_NEW_VIRTUAL_HOSTNAMES={exadb01-vip}"

This initiates the OUI to copy the clusterware software to the replacement database server.

WARNING: A new inventory has been created on one or more nodes in this session.

However, it has not yet been registered as the central inventory of this system.

To register the new inventory please run the script at '/u01/app/oraInventory/orainstRoot.sh' with root privileges on nodes 'exadb01'.

If you do not register the inventory, you may not be able to update or patch the products you installed.

The following configuration scripts need to be executed as the "root" user in each cluster node:

/u01/app/oraInventory/orainstRoot.sh #On nodes exadb01

/u01/app/11.2.0/grid/root.sh #On nodes exadb01

To execute the configuration scripts:

a) Open a terminal window.

b) Log in as root.

c) Run the scripts on each cluster node

After the scripts are finished, you should see the following informational messages:

The Cluster Node Addition of /u01/app/11.2.0/grid was successful.

Please check '/tmp/silentInstall.log' for more details.

5. Run the orainstRoot.sh and root.sh scripts for the replacement database server:

NOTE: orainstRoot.sh will not need to be run if only /u01 was created and / filesystem was unchanged or restored because oraInst.loc and oratab files still exist.

[root@replacement]# /u01/app/oraInventory/orainstRoot.sh

Creating the Oracle inventory pointer file (/etc/oraInst.loc)

Changing permissions of /u01/app/oraInventory.

Adding read,write permissions for group.

Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.

The execution of the script is complete.

[root@replacement]# /u01/app/11.2.0/grid/root.sh

Check /u01/app/11.2.0/grid/install/root_exadb01.####.com_2010-03-10_17-59-15.log for the output of root script

The output file created above will report that the LISTENER resource on the replaced database server failed to start.

This is the expected output:

PRCR-1013 : Failed to start resource ora.LISTENER.lsnr

PRCR-1064 : Failed to start resource ora.LISTENER.lsnr on node exadb01

CRS-2662: Resource 'ora.LISTENER.lsnr' is disabled on server 'exadb01' start listener on node=exadb01 ... failed

6. Reenable the listener resource that was stopped and disabled

[root@replacement]# /u01/app/11.2.0/grid/bin/srvctl enable listener -l LISTENER -n exadb01

[root@replacement]# /u01/app/11.2.0/grid/bin/srvctl start listener -l LISTENER -n exadb01

Step 4: Clone Oracle Database Homes to Replacement Database Server

1. Add the RDBMS ORACLE_HOME on the replacement database server:

[oracle@surviving]$ cd /u01/app/oracle/product/11.2.0/dbhome_1/oui/bin/

[oracle@surviving]$ ./addnode.sh -silent "CLUSTER_NEW_NODES={exadb01}

These commands initiate the OUI (Oracle Universal Installer) to copy the Oracle Database software to the replacement database server. However, to complete the installation, you must run the root scripts on the replacement database server after the command completes.

WARNING: The following configuration scripts need to be executed as the “root” user in each cluster node.

/u01/app/oracle/product/11.2.0/dbhome_1/root.sh #On nodes exadb01

To execute the configuration scripts:

Open a terminal window.

Run the scripts on each cluster node.

After the scripts are finished, you should see the following informational messages:

The Cluster Node Addition of /u01/app/oracle/product/11.2.0/dbhome_1 was successful.

Please check '/tmp/silentInstall.log' for more details.

2. Run the following scripts on the replacement database server:

[root@replacement]# /u01/app/oracle/product/11.2.0/dbhome_1/root.sh

Check /u01/app/oracle/product/11.2.0/dbhome_1/install/root_exadb01.####.com_2010-03-

10_18-27-16.log for the output of root script