Veritas Cluster Debugging Tips

Initial Notes

Veritas cluster server is a high availability server. This means that processes switch between servers when a server fails. All database processes are run through this server - and as such, this needs to run smoothly. Note that the oracle process should only actually be running on the server which is active. On monitoring tools, the procs light for whichever box is secondary should be yellow, because oracle is not running. Yet, the cluster is running on both systems.

Cluster Not Up -- HELP

  •  
    •  
      • /opt/VRTSvcs/bin/hastart

        /opt/VRTSvcs/bin/hastatus -summary
        will tell you if processes started properly. It will NOT start processes on a FAULTED system.
    • /opt/VRTSvcs/bin/hastatus -summary
      This will give the general status of each machine and processes

      /opt/VRTSvcs/bin/hares -display
      This gives much more details - down to the resource level.

      If hastatus fails on both machines (it returns that the cluster is not up or returns nothing), try to start the cluster

    •  
      • /sbin/gabconfig -c -x

        /opt/VRTSvcs/bin/hastart

        /opt/VRTSvcs/bin/hastatus -summary

    • If the system is NOT FAULTED and only one system is up, the cluster probably needs to have gabconfig manually started. Do this by running: If the system is faulted, check licenses and clear the faults as described next.
    •  
      • vxlicense -p
        Note the NUMBER after the license (ie: Feature name: DATABASE_EDITION [100])

        cd /etc/vx/elm
        mkdir old
        mv lic.number old [do this for all expired licenses]
        vxlicense -p [Make sure there are no expired licenses AND your good licenses are there]
        hastart

        If still fails, call veritas for temp licenses. Otherwise, be certain to do the same on your second machine.

    • vxlicense -p

      Make sure all licenses are current - and NOT expired! If they are expired, that is your problem. Call VERITAS to get temporary licenses.

      There is a BUG with veritas licences. Veritas will not run if there are ANY expired licenses -- even if you have the valid ones you need. To get veritas to run, you will need to MOVE the expired licenses. [Note: you will minimally need VXFS, VxVM and RAID licenses to NOT be expired from what I understand.]

    •  
      • hares -clear resource-name -sys faulted-system
      • hagrp -disableresources groupname
        hagrp -flush group -sys sysname
        hagrp -enableresources groupname
      • hagrp -online group -sys desired-system
    • hares -display
      For each resource that is faulted run:
      If all of these clear, then run hastatus -summary and make sure that these are clear. If some don't clear you MAY be able to clear them on the group level. Only do this as last resort:
      To get a group to go online:

      If it did NOT clear, did you check licenses?

    •  
      • hastop -all
        on one machine hastart
        wait a few minutes
        on other machine hastart
    • System has the following EXACT status:
      gedb002# hastatus -summary
      
      -- SYSTEM STATE
      -- System               State                Frozen
      
      A  gedb001              RUNNING              0
      A  gedb002              RUNNING              0
      
      -- GROUP STATE
      -- Group           System               Probed     AutoDisabled    State         
      
      B  oragrp          gedb001              Y          N               OFFLINE       
      B  oragrp          gedb002              Y          N               OFFLINE       
      
      gedb002#  hares -display | grep  ONLINE
      nic-qfe3  State           gedb001   ONLINE
      nic-qfe3  State           gedb002   ONLINE
      
      gedb002# vxdg list
      NAME         STATE           ID
      rootdg       enabled  957265489.1025.gedb002
      
      gedb001# vxdg list
      NAME         STATE           ID
      rootdg       enabled  957266358.1025.gedb001
      
      

      Recovery Commands:

    • hashadow-log_A: hashadow checks to see if the ha cluster daemon (had) is up and restarts it if needed. This is the log of that process.

      engine.log_A: primary log, usually what you will be reading for debugging

      Oracle_A: oracle process log (related to cluster only)

      Sqlnet_A: sqlnet process log (related to cluster only)

      IP_A: related to shared IP

      Volume_A: related to Volume manager

      Mount_A: related to mounting actual filesystes (filesystem)

      DiskGroup_A: related to Volume Manager/Cluster Server

      NIC_A: related to actual network device

  • The normal debugging of steps includes: checking on status, restarting if no faults, checking licenses, clearing faults if needed, and checking logs.

    To find out Current Status:

    Starting Single System NOT Faulted

    To check licenses:

    To clear FAULTS:

    Bringing up Machines when fault will NOT clear:

    Reviewing Log Files:

    If you are still having troubles, look at the logs in /var/VRTSvcs/log. Look at the most recent ones for debugging purposes (ls -ltr). Here is a short description of the logs in /var/VRTSvcs/log:

    By looking at the most recent logs, you can know what failed last (or most recently). You can also tell what did NOT run which may be jut as much of a clue. Of course, if none of this helps, open a call with veritas tech support.

    Calling Tech Support:

    If you have tried the previously described debugging methods, call Veritas tech support: 800-634-4747. Your company needs to have a Veritas support contract.

 

Restarting Services:

If a system is gracefully shutdown and it was running oracle or other high availability services, it will NOT transfer them. It only transfers services when the system crashes or has an error.

  • hastart

    hastatus -summary
    will tell you if processes started properly. It will NOT start processes on a FAULTED system. If the system is faulted, clear the faults as described above.

Doing Maintenance on DBs:

BEFORE working on DB

  • Run hastop -all -force

 

AFTER working on Dbs:

  •  
    • hastart on the same machine as you started the work on (the first on system with oracle running)
      wait 3-5 minutes
      then run hastart on the other system
  • You MUST bring up oracle on same machine

    Once Oracle is up, run:


    If you need the instance to run on the other system, you can run: hagrp -switch oragrp -to othersystem

Shutting down db machines:

If you shutdown the machine that is running veritas cluster, it will NOT start on the other machine. It only fails over if the machine crashes. You need to manually switch the services if you shutdown the machine. To switch processes:

  • Find out groups to transfer over
    hagrp -display
    Switch over each group
    hagrp -switch group-to-move -to new-system

    Then shutdown machine as desired. When rebooted will start cluster daemon automatically.

Doing Maintenance on Admin Network:

If the admin network is brought down (that the veritas cluster uses), veritas WILL fault both machines AND bring down oracle (nicely). You will need to do the following to recover:

  • hastop -all
    On ONE machine: hastart
    wait 5 minutes
    On other machine: hastart

Manual start/stop WITHOUT veritas cluster:

THIS IS ONLY USED WHEN THERE ARE DB FAILURES

If possible, use the section on DB Maintenance. Only use this if system fails on coming up AND you KNOW that it is due to a db configuration error. If you manually startup filesystems/oracle -- manually shut them down and restart using hastart when done.

To startup:

Make sure ONLY rootdg volume group is active on BOTH NODEs. This is EXTREMELY important as if it is active on both nodes corruption occurs. [ie. oradg or xxoradg is NOT present]

  • vxdg list
    hastatus (stop on both as you are faulted on both machines )
    hastop -all (if either was active make sure you are truly shutdown!)

Once you have confirmed that the oracle datagroup is not active, on ONE machine do the following:

  • vxdg import oradg [this may be xxoradg where xx is the client 2 char code]

    vxvol -g oradg startall

    mount -F vxfs /dev/vx/dsk/oradg/name /mountpoint [Find volumes and mount points in /etc/VRTSvcs/conf/config/main.cf]

    Let DBAs do their stuff

 

To shutdown:

  • umount /mountpoint [foreach mountpoint]

    vxdg deport oradg

    vxvol -g oradg stopall

    clear faults; start cluster as described above

A wonderful reference book for Veritas Clusters is:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值