| SRDB ID | Synopsis | Date | ||
| 48491 | Sun Fire[TM] 12K/15K: Dstop: CP0 demand bus parity error | 1 Nov 2002 |
| Status | Issued |
| Description |
- Problem Statement:
Dstop: CP[01] demand bus parity error
- Symptoms:
'wfail' output reports something similar to the following:
01 redxl> dumpf load dsmd.dstop.020506.2128.46
02 Created Mon May 6 21:28:47 2002
03 By hpost v. 1.2 Generic 112488-03 Feb 15 2002 13:40:50 executing as pid=6862
04 On ssc name = rasputin-sc0.SD_RASCAL.West.Sun.COM
05 Domain = 0=A Platform = rasputin
06 Boards in dump: master SC CPs/CSBs[1:0]: 3
07 EXB[17:0]: 12100
08 Slot0[17:0]: 12100
09 Slot1[17:0]: 12100
10 -D option, -d
11 "DSMD DomainStop Dump"
12 0 errors occurred while creating this dump.
13 redxl> wfail
14 SDI EX08/S0 Master_Stop_Status0[31:0] = E004000F
15 MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
16 SDI EX08/S0 Dstop0[31:0] = 00418040
17 Dstop0[16]: D DARB texp requests all Dstop (M)
18 Dstop0[22]: D 1E SDI internal CP port requested Dstop
19 SDI EX08/S0 CP_Error0[31:0] = 2004A004 Mask = 580067FF
20 CPErr0[18]: D 1E CP0 demand bus parity error (M)
21 cp0_{dembusp,texp,unload,demand[1:0]} = 01
22 CPErr0[29]: D 1E CP arbiter lockstep consistency check error (M)
23 cp0_{dembusp,texp,unload,demand[1:0]} = 01
24 cp1_{dembusp,texp,unload,demand[1:0]} = 00
25 FAIL EXB EX8: Dstop/Rstop detected by SDI EX8/S0.
26 Primary service FRU is EXB EX8.
27 FAIL EXB EX8 with CP C0: Dstop/Rstop detected by SDI.
28 Primary service FRU is EXB EX8.
29 Secondary service FRU is CSB C0 or the logic centerplane.
30 SDI EX13/S0: All SDI is DStopped and RStopped, requested by DARB.
31 SDI EX16/S0: All SDI is DStopped and RStopped, requested by DARB.
32 DARB C0: enabled ports (expanders) [17:0]: 16100
33 DARB C0: other darb req Dstop+Rstop for exps[17:0]: 00100
34 DARB C1: enabled ports (expanders) [17:0]: 16100
35 DARB C1: other darb req Dstop+Rstop for exps[17:0]: 00100
SOLUTION SUMMARY:
- Troubleshooting:
The dump header tells us that this Dstop was generated by dsmd (lines 10,11)
while a domain was active. This is also evident by the dumpf file name -
dsmd.dstop files are created by dsmd as part of an ASR. Walking the
error chain:
- Master SDI on EX8 calls for Dstop as directed by itself (line 18)
- Master SDI on EX8 reports errors in the CPErr0 register (lines 20,22)
- EX8 is FAILed from the configuration and named as a primary FRU (lines 25,26)
- EX8's low centerplane half is FAILed from the configuration (line 27)
- EX8 and CS0/CP are named as primary and secondary FRUs (lines 28,29)
Each DARB sources a parity protected demand signal to an expander's Master
SDI. The demand tells the SDI to expect data to arrive four cycles later
(4 and 5 cycles later if the centerplane is degraded). In the 'wfail' output,
the demand signals are shown (lines 23, 24). The low two bits comprise the
demand.
00 = target is slot 0 [cp1 above]
01 = target is slot 1 [cp0 above]
10 = not used
11 = idle state (no demand event in progress)
In this example, DARB0 indicated slot 1 as the target (line 23) while DARB1
indicated slot 0 as the target (line 24). The demand signal from DARB0 had
a parity error (line 20) thus accounting for a bit flip in bit 0. This is
also why 'wfail' chooses to fail centerplane half 0 from the configuration.
Also, since the DARBs disagree, the SDI sees this as a loss of lockstep
in the centerplane. Therefore, the CP arbiter lockstep error (line 22) is
recorded. This error is a result of the parity error.
- Resolution:
Repair/replace EX8.
If errors persist, investigate issues with CS0 as it drives the low half of
the centerplane. If CS0 has no fault history, repair/replace the centerplane.
- Summary of part number and patch ID's
http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html
- References and bug IDs
SunSolve Article 48122
SunSolve Article 48223
DARB ASIC Specification
- Additional background information:
By using the capture information in the SDI, the specific bit in error in the
demand signal can be determined. Another example of a demand bus parity error:
36 SDI EX08/S0 CP_Error0[31:0] = 2004A004 Mask = 580067FF
37 CPErr0[18]: D 1E CP0 demand bus parity error (M)
38 cp0_{dembusp,texp,unload,demand[1:0]} = 04
39 CPErr0[29]: D 1E CP arbiter lockstep consistency check error (M)
40 cp0_{dembusp,texp,unload,demand[1:0]} = 04
41 cp1_{dembusp,texp,unload,demand[1:0]} = 00
Here, bit 2 differs, indicating a parity error on the unload signal from DARB0.
The unload signal is a unidirectional signal sent from the DARB to the Master SDI.
During operation, the SDI keeps track of the DARB input buffer fullness. The
unload signal asserted by the DARB is an indicator to the SDI that the DARB has
unloaded a prior request, thus freeing up a buffer slot.
This does not change the diagnosis listed earlier.
- Meta-Data/Problem categorization:
Product/Platform: SF12K/SF15K
Category:
- Keywords
15K, 12K, SF15K, SF12K, Sun Fire 15K, Enterprise, Server, Sun Fire 12K,
starcat, dstop, demand bus parity error
INTERNAL SUMMARY:
SUBMITTER: Scott Davenport APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: