RAC ask节减

最新推荐文章于 2021-09-13 10:02:36 发布

念经居士

最新推荐文章于 2021-09-13 10:02:36 发布

阅读量619

点赞数

分类专栏： 11g 10g

本文链接：https://blog.csdn.net/felix556/article/details/12750861

版权

11g 同时被 2 个专栏收录

12 篇文章 0 订阅

订阅专栏

10g

10 篇文章 0 订阅

订阅专栏

1. How do I determine which node in the cluster is the "Master" node?
For the cluster synchronization service (CSS), the master can be found by searching ORACLE_HOME/log/nodename/cssd/ocssd.log where it is either the Oracle HOME for the Oracle Clusterware (this is the Grid Infrastructure home in Oracle Database 11g Release 2).

For master of a enqueue resource with Oracle RAC, you can select from v$ges_resource. There should be a master_node column.

2.Is rcp and/or rsh required for normal Oracle RAC operation ?
rcp"" and ""rsh"" are not required for normal Oracle RAC operation. However in older versions ""rsh"" and ""rcp"" should to be enabled for Oracle RAC and patchset installation. In later releases, ssh is used for these operations.
Note Oracle Enterprise Manager uses rsh.

3.What kind of HW components do you recommend for the interconnect?
The general recommendation for the interconnect is to provide the highest bandwidth interconnect, together with the lowest latency protocol that is available for a given platform. In practice, Gigabit Ethernet with UDP has proven sufficient in every case it has been implemented, and tends to be the lowest common denominator across platforms.

4.What Application Design considerations should I be aware of when moving to Oracle RAC?
The general principals are that fundamentally no different design and coding practices are required for RAC however application flaws in execution or design have a higher impact in RAC. The performance and scalability in RAC will be more sensitive to bad plans or bad schema design. Serializing contention makes applications less scalable. If your customer uses standard SQL and schema tuning, it solves > 80% of performance problems

Some of the scaleability pitfalls they should look for are:
* Serializing contention on a small set of data/index blocks
--> monotonically increasing key
--> frequent updates of small cached tables
--> segment without automatic segment space management (ASSM) or Free List Group (FLG)

* Full table scans
--> Optimization for full scans in 11g can save CPU and latency

* Frequent invalidation and parsing of cursors
--> Requires data dictionary lookups and synchronizations

* Concurrent DDL ( e.g. truncate/drop )

Look for:
* Indexes with right-growing characteristics
--> Use reverse key indexes
--> Eliminate indexes which are not needed

* Frequent updated and reads of “small” tables
--> “small”=fits into a single buffer cache
--> Use Sparse blocks ( PCTFREE 99 ) to reduce serialization

* SQL which scans large amount of data
--> Perhaps more efficient when parallelized
--> Direct reads do not need to be globally synchronized ( hence less CPU for global cache )

5.Are there any issues for the interconnect when sharing the same switch as the public network by using VLAN to separate the network?
Oracle RAC and Oracle Clusterware deployment best practices recommend that the interconnect be deployed on a stand-alone, physically separate, dedicated switch.

Many customers, however, have consolidated these stand-alone switches into larger managed switches. A consequence of this consolidation is a merging of IP networks on a single shared switch, segmented by VLANs. There are caveats associated with such deployments.

The Oracle RAC cache fusion protocol exercises the IP network more rigorously than non-RAC Oracle databases. The latency and bandwidth requirements as well as availability requirements of the Oracle RAC / Oracle Clusterware interconnect IP network are more in-line with high performance computing.

Deploying the Oracle RAC / Oracle Clusterware interconnect on a shared switch, segmented by a VLAN may expose the interconnect links to congestion and instability in the larger IP network topology.

If deploying the interconnect on a VLAN, there should be a 1:1 mapping of the VLAN to a non-routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple switches.

Deployment concerns in this environment include Spanning Tree loops when the larger IP network topology changes, Assymetrical routing that may cause packet flooding, and lack of fine grained monitoring of the VLAN/port.

6.What is SCAN?

Single Client Access Name (SCAN) is a single name that allows client connections to connect to any database in an Oracle cluster independently of which node in the cluster the database (or service) is currently running. The SCAN should be used in all client connection strings and does not change when you add/remove nodes from the cluster. SCAN allows clients to use EZConnect or the this JDBC URL.

sqlplus system/manager@ sales1-scan:1521/oltp

jdbc:oracle:thin:@sales1-scan:1521/oltp

The SCAN is defined as a single name resolving to 3 IP addresses in either the cluster's GNS or your corporate DNS.

7.If my OCR and Voting Disks are in ASM, can I shutdown the ASM instance?
No. You will have to stop the Oracle Clusterware stack on the node on which you need to stop the Oracle ASM instance. Either use "crsctl stop cluster -n node_name" or "crsctl stop crs" for this purpose.

8.I have the 11.2 Grid Infrastructure installed and now I want to install an earlier version of Oracle Database (11.1 or 10.2), is this supported ?
Yes however you need to "pin" the nodes in the cluster before trying to create a database using an earlier version of Oracle Database (IE not 11.2). The command to pin a node is crsctl pin css -n nodename. You should also apply the patch for Bug 8288940 to make DBCA work in an 11.2 cluster.

9.I get an error with DBCA from 10.2 or 11.1 after I have installed the 11.2 Grid Infrastructure?
You will need to apply the patch for Bug 8288940 to your database home in order for it to recognize ASM running from the new grid infrastructure home. Also make sure you have "pinned" the nodes.

crsctl pin css -n nodename

10.How many NICs do I need to implement Oracle RAC?
At minimum you need 2: external (public), interconnect (private). When storage for Oracle RAC is provided by Ethernet based networks (e.g. NAS/nfs or iSCSI), you will need a third interface for I/O so a minimum of 3. Anything else will cause performance and stability problems under load. From an HA perspective, you want these to be redundant, thus needing a total of 6.

11.Can I run Oracle RAC 10g with Oracle RAC 11g?
Yes. The Oracle Clusterware should always run at the highest level. With Oracle Clusterware 11g, you can run both Oracle RAC 10g and Oracle RAC 11g databases. If you are using ASM for storage, you can use either Oracle Database 10g ASM or Oracle Database 11g ASM however to get the 11g features, you must be running Oracle Database 11g ASM. It is recommended to use Oracle Database 11g ASM.
Note: When you upgrade to 11g Release 2, you must upgrade both Oracle Clusterware and Automatic Storage Management to 11g Release 2. This will support Oracle Database 10g and Oracle Database 11g (both RAC and single instance).
Yes, you can run Oracle 9i RAC in the cluster as well. 9i RAC requires the clusterware that is certified with Oracle 9i RAC to be running in addition to Oracle Clusterware 11g.

12.Can I run more than one clustered database on a single Oracle RAC cluster?
You can run multiple databases in a Oracle RAC cluster, either one instance per node (w/ different databases having different subsets of nodes in a cluster), or multiple instances per node (all databases running across all nodes) or some combination in between. Running multiple instances per node does cause memory and resource fragmentation, but this is no different from running multiple instances on a single node in a single instance environment which is quite common. It does provide the flexibility of being able to share CPU on the node, but the Oracle Resource Manager will not currently limit resources between multiple instances on one node. You will need to use an OS level resource manager to do this.

13.Can I have multiple public networks accessing my Oracle RAC?
Yes, you can have multiple networks however with Oracle RAC 10g and Oracle RAC 11g, the cluster can only manage a single public network with a VIP and the database can only load balance across a single network. FAN will only work on the public network with the Oracle VIPs.
Oracle RAC 11g Release 2 supports multiple public networks. You must set the new init.ora parameter LISTENER_NETWORKS so users are load balanced across their network. Services are tied to networks so users connecting with network 1 will use a different service than network 2. Each network will have its own VIP.

14.Is it supported to install Oracle Clusterware and Oracle RAC as different users?
Yes, Oracle Clusterware and Oracle RAC can be installed as different users. The Oracle Clusterware user and the Oracle RAC user must both have OINSTALL as their primary group. Every Database home can have a different OSDBA group with a different username.

15.How do I check for network problems on my interconect?
1. Confirm that full duplex is set correctly for all interconnect links on all interfaces on both ends. Do not rely on auto negotiation.
2. ifconfig -a will give you an indication of collisions/errors/overuns and dropped packets
3. netstat -s will give you a listing of receive packet discards, fragmentation and reassembly errors for IP and UDP.
4. Set the udp buffers correctly
5. Check your cabling

Note: If you are seeing issues with RAC, RAC uses UDP as the protocol. Oracle Clusterware uses TCP/IP.

16.Does changing uid or gid of the Oracle User affect Oracle Clusterware?
There are a lot of files in the Oracle Clusterware home and outside of the Oracle Clusterware home that are chgrp'ed to the appropriate groups for security and appropriate access. The filesystem records the uid (not the username), and so if you exchange the names, now the files are owned by the wrong group.

17.Why does netca always creates the listener which listens to public ip and not VIP only?
This is for backward compatibility with existing clients: consider pre-10g to 10g server upgrade. If we made upgraded listener to only listen on VIP, then clients that didn't upgrade will not be able to reach this listener anymore.

18.Does Oracle support rolling upgrades in a cluster?
This answer is for clusters running the Oracle stack. If 3rd party vendor clusterware in included, you need to check with the vendor about their support of a rolling upgrade.

By a rolling upgrade, we mean upgrading software (Oracle Database, Oracle Clusterware, ASM or the OS itself) while the cluster is operational by shutting down a node, upgrading the software on that node, and then reintegrating it into the cluster, and so forth one node at a time until all the nodes in the cluster are at the new software level.

For the Oracle Database software, it is possible only for certain single patches that are marked as rolling upgrade compatible. Most Bundle patches and Critical Patch Updates (CPU) are rolling upgradeable. Patchsets and DB version (10g to 11g) changes are not supported in a rolling fashion, one reason that this may be impossible is that across major releases, there may be incompatible versions of the system tablespace, for example. To upgrade these in a rolling fashion one will need to use a logical standby with Oracle Database 10g or 11g, see Note: 300479.1 for details.

Read the MAA Best Practice on Rolling Database Upgrades using Data Guard SQL Apply or with Oracle RAC 11g, Rolling Database Upgrades for Physical Standby Databases using Transient Logical Standby 11g

The Oracle Clusterware software always fully supports rolling upgrades, while the ASM software is rolling upgradeable at version 11.1.0.6 and beyond.

For Oracle Database 11g Release 2, Oracle Clusterware and ASM binaries are combined into a single ORACLE_HOME called the grid infrastructure home. This home fully supports rolling upgrades for patches, bundles, patchsets and releases. (If you are upgrading ASM from Oracle Database 10g to 11g Release 2, you will not be able to upgrade ASM in a rolling fashion.)

The Oracle Clusterware and Oracle Real Application Clusters both support rolling upgrades of the OS software when the version of the Oracle Database is certified on both releases of the OS (and the OS is the same, no Linux and Windows or AIX and Solaris, or 32 and 64 bit etc.). This can apply a patch to the operating system, a patchset (such as EL4u4 to EL4u6) or a release (EL4 to EL5).

Stay within a 24 hours of upgrade window and fully test this path as it's not possible for Oracle to test all these different paths and combinations.

19.

How do I determine whether or not an OneOff patch is "rolling upgradeable"?

After you have downloaded a patch, you can go into the directory where you unpacked the patch:

> pwd
/ora/install/4933522
Then use the following OPatch command:
> opatch query is_rolling_patch
...
Query ...
Please enter the patch location:
/ora/install/4933522
---------- Query starts ------------------
Patch ID: 4933522
....
Rolling Patch: True.
---------- Query ends -------------------

20.

What do I do if I see GC CR BLOCK LOST in my top 5 Timed Events in my AWR Report?

You should never see this or BLOCK RETRY events. This is most likely due to a fault in your interconnect network. Work with your system administrator or/and network administrator to find the fault. Check netstat -s

Ip:
84884742 total packets received
1201 fragments dropped after timeout
3384 packet reassembles failed

You do not want to see fragments dropped or packet reassemblies failed.

ifconfig –a:

eth0 Link encap:Ethernet HWaddr 00:0B:DB:4B:A2:04
inet addr:130.35.25.110 Bcast:130.35.27.255 Mask:255.255.252.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21721236 errors:135 dropped:0 overruns:0 frame:95
TX packets:273120 errors:0 dropped:0 overruns:0 carrier:0

You do not want to see high number of errors.

21. How can I validate the scalability of my shared storage? (Tightly related to RAC / Application scalability)

Storage vendors tend to focus their sales pitch mainly on the storage unit's capacity in Terabytes (1000 GB) or Petabytes (1000 TB), however for RAC scalability it's critical to also look at the storage unit's ability to process I/O's per second (throughput) in a scalable fashion, specifically from multiple sources (nodes). If that criteria is not met, RAC / Application scalability most probably will suffer, as it partially depends on storage scalability as well as a solid and capable interconnect (for network traffice between nodes).

Storage vendors may sometimes discourage such testing, boasting about their amazing front or backend battery backed memory caches that "eliminate" all I/O bottlenecks. This is all great, and you should take advantage of such caches as much as possible... however, there is no substitute to a real world test, you may uncover that the HBA (Host Bus Adapater) firmware or the driver versions are outdated (before you claim poor RAC / Application scalability issues).

It is highly recommended to test this storage scalability early on so that expectations are set accordingly. On Linux there is a freely available tool released on OTN called ORION (Oracle I/O test tool) which simulates Oracle I/O. Note: Starting with 11.2 the orion tool is included with the RDBMS/RAC software, see ORACLE_HOME/bin. Warning:if you are performing write tests, be prepared to lose any data stored on the luns.!!

On other Unix platforms (as well as Linux) one can use IOzone, if prebuilt binary not available you should build from source, make sure to use version 3.271 or later and if testing raw/block devices add the "-I" flag.

In a basic read test you will try to demonstrate that a certain IO throughput can be maintained as nodes are added. Try to simulate your database io patterns as much as possible, i.e. blocksize, number of simultaneous readers, rates, etc.

For example, on a 4 node cluster, from node 1 you measure 20MB/sec, then you start a read stream on node 2 and see another 20MB/sec while the first node shows no decrease. You then run another stream on node 3 and get another 20MB/sec, in the end you run 4 streams on 4 nodes, and get an aggregated 80MB/sec or close to that. This will prove that the shared storage is scalable. Obviously if you see poor scalability in this phase, that will be carried over and be observed or interperted as poor RAC / Application scalability.

In many cases RAC / Application scalability is at blame for no real reason, that is, the underlying IO subsystem is not scalable.