| SRDB ID | Synopsis | Date | ||
| 48197 | Sun Fire[TM] 12K/15K: Dstop: Select command parity error | 31 Oct 2002 |
| Status | Issued |
| Description |
- Problem Statement:
Dstop: Select command parity error
- Symptoms:
'wfail' output reports something similar to the following:
01 redxl> dumpf load dsmd.dstop.020429.0840.40
02 Created Mon Apr 29 08:40:41 2002
03 By hpost v. 1.2 Generic 112488-03 Feb 15 2002 13:40:50 executing as pid=6794
04 On ssc name = rasputin-sc0.SD_RASCAL.West.Sun.COM
05 Domain = 0=A Platform = rasputin
06 Boards in dump: master SC CPs/CSBs[1:0]: 1 Requested/not enabled: 2
07 EXB[17:0]: 12100
08 Slot0[17:0]: 12100
09 Slot1[17:0]: 12100
10 'Not enabled' refers to the Console Bus master port on the parent board.
11 -D option, -d
12 "DSMD DomainStop Dump"
13 0 errors occurred while creating this dump.
14 redxl> wfail
15 SDI EX08/S0 Master_Stop_Status0[31:0] = C00400CF
16 MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
17 SDI EX08/S0 Dstop0[31:0] = 30019000
18 Dstop0[16]: D DARB texp requests all Dstop (M)
19 Dstop0[28]: D 1E Slot0 asserted Error, enabled to cause Dstop (M)
20 Dstop0[29]: D Slot1 asserted Error, enabled to cause Dstop (M)
21 EPLD SB08 Err1_Dom0: Mask= 00 Err= 41 1stErr= 40
22 Err1[0]: Error reported by AR
23 Err1[6]: 1E+ Error reported by BBC0
24 BBC SB08/BB0 Device_Err_Stat[31:0] = 80008010
25 DevErr[ 4]: 1E DCDS asserted error
26 DCDSs SB08/DG0 slice 5 CPU[1:0]_Cmd_Err[22:0] = 008008 008008
27 C0CE[ 3]: 1E C0 Select command parity error
28 C1CE[ 3]: 1E C1 Select command parity error
29 FAIL Port SB8/P0: Dstop detected by DCDS.
30 Primary service FRU is Slot SB8.
31 FAIL Port SB8/P1: Dstop detected by DCDS.
32 Primary service FRU is Slot SB8.
33 SDI EX13/S0: All SDI is DStopped and RStopped, requested by DARB.
34 SDI EX16/S0: All SDI is DStopped and RStopped, requested by DARB.
SOLUTION SUMMARY:
- Troubleshooting:
The dump header tells us that this Dstop was generated by dsmd (lines 11,12)
while a domain was active. This is also evident by the dumpf file name -
dsmd.dstop files are created by dsmd as part of an ASR. Walking the
error chain:
- The SDI on EX8 calls for Dstop as directed by its Slot 0 board, SB8 (line 19).
There is also a Slot 1 error asserted, but it is not the first error (line 20).
- The EPLD on SB8 indicates BBC0 asserted error the first error (line 23).
- BBC0 indicates the DCDS called for error (line 25).
- DCDS slice 5 reports select command parity errors (lines 26-28).
- The DCDSs off of BBC0 serve processors 0 and 1. Hence, 'wfail' FAILs
SB8/P0 and SB8/P1 (lines 29,31).
- The FRU called out is SB8 (lines 30,32).
The DCDSs are slave ASICs, and all transactions are controlled via select
commands sourced by processors. The select lines are parity protected. DCDS
slice 5 is configured as the parity checker, hence its detection of the error.
The select line pathways between the processors and DCDSs are entirely contained
within the system board, so the board is the FRU. In the general case, this error
could also occur on a MaxCPU board.
- Resolution:
Repair/replace SB8.
In general, repair/replace the board reporting the error.
- Summary of part number and patch ID's
http://infoserver.central.sun.com/data/syshbk/Devices/System_Board/SYSBD_SunFire_USIIICu.html
http://infoserver.central.sun.com/data/sshandbook/Devices/CPU_Module/UltraSPARC_MaxCPU.html
- References and bug IDs
SunSolve Article 48122
15K System Controller Specification
- Additional background information:
In the dump header, there's an indication of communication problems to CSB 1
(line 06). This indicates that console bus access to this component was
disabled (line 10). Console bus fans out from the SCM ASICs. They are
physically located on the CSBs, but are part of the SC's power domain. So
even if a CSB is powered off, the SCMs still have power.
Examining the SCMs, we do not see CSB 1 enabled (lines 40, 56):
35 redxl> shscm 0
36 Note: Data is displayed from the currently loaded dump file.
37 scm 0 Component ID = 215C007D
38 DevTemp[8:0] = 041: Valid 43.83 DegC
39 CBus_Config[31:0] = 3FFF8103
40 0x103 MasterPortEnbl[9:0] CbCnf[9:0] EXBs 06100
41 0 CBH_SlavePortEnbl CbCnf[13]
42 0 ShortTimeout CbCnf[14]
43 0x7FFF PortErrMask[14:0] CbCnf[29:15]
44 0 DisableArb CbCnf[30] With other SC's CBH
45 0 ForceBusy CbCnf[31] To other SC's CBH
46 ResetStat[26:0] = 01000000
47 SCM_Mapping_Reg[5:0] = 10
48 CBus_PortErr[ 0][25:0] = 0000000 (EXB 14 (master))
49 CBus_PortErr[ 1][25:0] = 0000000 (EXB 13 (master))
50 CBus_PortErr[ 8][25:0] = 0000000 (EXB 8 (master))
51 redxl> shscm 1
52 Note: Data is displayed from the currently loaded dump file.
53 scm 1 Component ID = 215C007D
54 DevTemp[8:0] = 03F: Valid 42.50 DegC
55 CBus_Config[31:0] = 3FFF8030
56 0x030 MasterPortEnbl[9:0] CbCnf[9:0] EXBs 10000 CSBs 1
57 0 CBH_SlavePortEnbl CbCnf[13]
58 0 ShortTimeout CbCnf[14]
59 0x7FFF PortErrMask[14:0] CbCnf[29:15]
60 0 DisableArb CbCnf[30] With other SC's CBH
61 0 ForceBusy CbCnf[31] To other SC's CBH
62 ResetStat[26:0] = 01000000
63 SCM_Mapping_Reg[5:0] = 11
64 CBus_PortErr[ 4][25:0] = 0000000 (CSB 0 (master))
65 CBus_PortErr[ 5][25:0] = 0000000 (EXB 16 (master))
The platform logs for this system should be investigated to determine why
CSB1 was deconfigured
- Meta-Data/Problem categorization:
Product/Platform: SF12K/SF15K
Category:
- Keywords
15K, 12K, SF15K, SF12K, starcat, dstop, Select command parity error INTERNAL SUMMARY:
SUBMITTER: Scott Davenport APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: