一.故障信息

1.1.故障信息

Log摘要

System  Configuration:Oracle Corporationsun4u Sun Fire 880

System  clock frequency: 150 MHz

Memory  size: 8192 Megabytes


=========================  CPUs ===============================================


RunE$CPUCPU

BrdCPUMHzMB Impl.Mask

--- -----  ---- ---- ------- ----

A09008.0 US-III+2.3

B19008.0 US-III+2.3

A29008.0 US-III+2.3

B39008.0 US-III+2.3


=========================  Memory Configuration ===============================


LogicalLogicalLogical

MCBankBankBankDIMMInterleaveInterleaved

BrdIDnumsizeStatusSizeFactorwith

-------------------------------------------------------

A00512MBno_status256MB8-way0

A01512MBno_status256MB8-way0

A02512MBno_status256MB8-way0

A03512MBno_status256MB8-way0

B10512MBno_status256MB8-way1

B11512MBno_status256MB8-way1

B12512MBno_status256MB8-way1

B13512MBno_status256MB8-way1

A20512MBno_status256MB8-way0

A21512MBno_status256MB8-way0

A22512MBno_status256MB8-way0

A23512MBno_status256MB8-way0

B30512MBno_status256MB8-way1

B31512MBno_status256MB8-way1

B32512MBno_status256MB8-way1

B33512MBno_status256MB8-way1


=========================  IO Cards =========================



BusMax

IOPort BusFreq BusDev,

BrdTypeIDSide Slot MHzFreq Func State NameModel

---- ----  ---- ---- ---- ---- ---- ---- ----- ------------------------------------------------------

I/OPCI8B333332,0okSUNW,qlc-pci1077,2300.1077.106.1+

I/OPCI9B633332,0okSUNW,qlc-pci1077,2300.1077.106.1+

I/OPCI9B533333,0okSUNW,qlc-pci1077,2300.1077.106.1+

I/OPCI9B433334,0okSUNW,qlc-pci1077,2300.1077.106.1+


No failures  found in System

===========================



=========================  Environmental Status =========================


System  Temperatures (Celsius):

-------------------------------

DeviceTemperatureStatus

---------------------------------------

CPU069OK

CPU169OK

CPU270OK

CPU366OK

MB32OK

IOB29OK

DBP030OK


=================================


Front  Status Panel:

-------------------

Keyswitch  position: NORMAL


System LED  Status:

GEN FAULTREMOVE

[ ON][OFF]


DISK FAULTPOWER FAULT

[OFF][OFF]


LEFT THERMAL FAULT RIGHT THERMAL FAULT

[ ON][OFF]


LEFT DOORRIGHT DOOR

[OFF][OFF]


=================================


Disk  Status:

PresenceFault LEDRemove LED

DISK0: [PRESENT][OFF][OFF]

DISK1: [PRESENT][OFF][OFF]

DISK2: [PRESENT][OFF][OFF]

DISK3: [PRESENT][OFF][OFF]

DISK4: [PRESENT][OFF][OFF]

DISK5: [PRESENT][OFF][OFF]

DISK6: [EMPTY]

DISK7: [EMPTY]

DISK8: [EMPTY]

DISK9: [EMPTY]

DISK10: [EMPTY]

DISK11: [EMPTY]


=================================


Fan Bank :

----------


BankSpeedStatusFan State

( RPMS )

------------------------------

CPU0_PRIM_FAN2189[ENABLED]OK

CPU1_PRIM_FAN2290[ENABLED]OK

CPU0_SEC_FAN0[DISABLED]OK

CPU1_SEC_FAN0[DISABLED]OK

IO0_PRIM_FAN0[DISABLED]ERROR

IO1_PRIM_FAN0[DISABLED]ERROR

IO0_SEC_FAN3947[ENABLED]OK

IO1_SEC_FAN3896[ENABLED]OK

IO_BRIDGE_PRIM_FAN3333[ENABLED]OK

IO_BRIDGE_SEC_FAN0[DISABLED]OK


=================================


Power  Supplies:

---------------

Current Drain:

SupplyStatusFan FailTemp FailCS Fail3.3V5V12V48V

------------------------------------------------------

PS0GOOD6323

PS1GOOD6323

PS2GOOD6323



=========================  HW Revisions =======================================


System PROM  revisions:

----------------------

OBP 4.6.7  2002/07/24 15:42


IO ASIC  revisions:

------------------

Port

ModelIDStatus Version

--------  ---- ------ -------

Schizo8ok4

Schizo9ok4

# System  Configuration:Oracle Corporationsun4u Sun Fire 880

System  clock frequency: 150 MHz

Memory  size: 8192 Megabytes


=========================  CPUs ===============================================


RunE$CPUCPU

BrdCPUMHzMB Impl.Mask

--- -----  ---- ---- ------- ----

A09008.0 US-III+2.3

B19008.0 US-III+2.3

A2  9008.0 US-III+2.3

B39008.0 US-III+2.3


=========================  Memory Configuration ===============================


LogicalLogicalLogical

MCBankBankBankDIMMInterleaveInterleaved

Brd IDnumsizeStatusSizeFactorwith

-------------------------------------------------------

A00512MBno_status256MB8-way0

A01512MBno_status256MB8-way0

A02512MBno_status256MB8-way0

A03512MBno_status256MB8-way0

B10512MBno_status256MB8-way1

B11512MBno_status256MB8-way1

B12512MBno_status256MB8-way1

B13512MBno_status256MB8-way1

A20512MBno_status256MB8-way0

A21512MBno_status256MB8-way0

A22512MBno_status256MB8-way0

A23512MBno_status256MB8-way0

B30512MBno_status256MB8-way1

B31512MBno_status256MB8-way1

B32512MBno_status256MB8-way1

B33512MBno_status256MB8-way1


=========================  IO Cards =========================



BusMax

IOPort BusFreq BusDev,

BrdTypeIDSide Slot MHzFreq Func State NameModel

---- ----  ---- ---- ---- ---- ---- ---- ----- ------------------------------------------------------

I/OPCI8B333332,0okSUNW,qlc-pci1077,2300.1077.106.1+

I/OPCI9B633332,0okSUNW,qlc-pci1077,2300.1077.106.1+

I/OPCI9B533333,0okSUNW,qlc-pci1077,2300.1077.106.1+

I/OPCI9B433334,0okSUNW,qlc-pci1077,2300.1077.106.1+


No failures found in System


#dmesg

Jan 15 11:39:50 csu picld[203]: [ID 625010  daemon.error] WARNING: Device IO0_PRIM_FAN failure detected

Jan 15 13:16:53 csu picld[208]: [ID 625010  daemon.error] WARNING: Device IO0_PRIM_FAN failure detected

Jan 15 14:46:34 csu picld[204]: [ID 625010  daemon.error] WARNING: Device IO0_PRIM_FAN failure detected





1.2.故障定位

客户反应,SUN V880前面板,温度LED指示灯橙×××。

通过系统错误信息得知:

IO0_PRIM_FAN0[DISABLED]ERROR

IO1_PRIM_FAN0[DISABLED]ERROR

通过dmesg 查看出IO0_PRIM_FAN 检查有问题,不排除IO1_PRIM_FAN和机器内部灰尘过多导致.

因此,先准备2PCI I/O Fan Tray和吸尘器(有可能机器内部灰尘过多,导致风扇告警),同时解决此次故障。


二.故障处理

2.1.先决条件

注意

确保系统关机,电源断开

   wKioL1LcjJewETqCAABLpL_0zrs419.jpg


操作时,使用防静电护腕


添加或更换硬件组件之前请作好数据备份。如果部件未正确安装,则可能会导致数据丢失。


2.2.准备项

准备确认项

类型

准备项

状态

硬件

笔记本一台

已准备就绪

串口线一根

已准备就绪

一字、十字螺丝刀各一把

已准备就绪

防静电护腕一个

已准备就绪

2IO_PRIM_FAN

已准备就绪

软件





其它







2.3.操作项

操作项列表

序号

操作项

操作内容

状态

1

确认系统关机

建议客户应用及业务数据备份


2

使用POST诊断

定位系统故障的确切位置


3

佩戴防静电护腕

确认已经佩戴防静电护腕,并且防静电护腕连接到机柜上的未涂漆部分


4

断开电源

断开主电源和次电源


5

移除服务检修盖


6

拆除处理器板


7

将取下的处理器板放置在防静电的材质表面


8

拆开移除处理器前盖


9

确认IO_PRIM_FAN位置


10

从防静电包装中取出IO_PRIM_FAN


11

安装IO_PRIM_FAN


12

重新安装处理器板


13

确认故障影响消失

确认新更换的硬件无告警


确认新的硬件在系统中就绪


用户确认应用及业务数据不受影响


14

收尾

清理现场,结束工作



三.参考信息

wKioL1Lciw2xO2vpAAJX1l528HM425.jpg

wKiom1Lci0iBsuPjAAWTkOkU5B4607.png


wKiom1Lci2GyvHPDAAKoYp2byGI598.jpg