| SRDB ID | Synopsis | Date | ||
| 48187 | Sun Fire[TM] 12K/15K: Dstop: Steering bus A input parity error | 31 Oct 2002 |
| Status | Issued |
| Description |
- Problem Statement:
Dstop: Steering bus A input parity error
- Symptoms:
'wfail' output reports something similar to the following:
01 redxl> dumpf load dsmd.dstop.020410.1454.52
02 Created Wed Apr 10 14:54:53 2002
03 By hpost v. 1.2 Generic 112488-03 Feb 15 2002 13:40:50 executing as pid=26063
04 On ssc name = f15k-02-sc0-hme0.
05 Domain = 0=A = omis320 Platform = f15k-02
06 Boards in dump: master SC CPs/CSBs[1:0]: 3
07 EXB[17:0]: 3FFFF
08 Slot0[17:0]: 3FFFF
09 Slot1[17:0]: 3FFFF
10 -D option, -d
11 "DSMD DomainStop Dump"
12 0 errors occurred while creating this dump.
13 redxl> wfail
14 SDI EX00/S0: All SDI is DStopped and RStopped, requested by DARB.
15 SDI EX01/S0: All SDI is DStopped and RStopped, requested by DARB.
16 SDI EX02/S0: All SDI is DStopped and RStopped, requested by DARB.
17 SDI EX03/S0: All SDI is DStopped and RStopped, requested by DARB.
18 SDI EX04/S0: All SDI is DStopped and RStopped, requested by DARB.
19 SDI EX05/S0: All SDI is DStopped and RStopped, requested by DARB.
20 SDI EX06/S0: All SDI is DStopped and RStopped, requested by DARB.
21 SDI EX07/S0: All SDI is DStopped and RStopped, requested by DARB.
22 SDI EX08/S0: All SDI is DStopped and RStopped, requested by DARB.
23 SDI EX09/S0: All SDI is DStopped and RStopped, requested by DARB.
24 SDI EX10/S0: All SDI is DStopped and RStopped, requested by DARB.
25 SDI EX11/S0: All SDI is DStopped and RStopped, requested by DARB.
26 SDI EX12/S0: All SDI is DStopped and RStopped, requested by DARB.
27 SDI EX13/S0: All SDI is DStopped and RStopped, requested by DARB.
28 SDI EX14/S0: All SDI is DStopped and RStopped, requested by DARB.
29 SDI EX15/S0 Dstop1[31:0] = 00088008
30 Dstop1[19]: D 1E SDI Slave 2 requested all Dstop
31 SDI EX15/S0 Master_Stop_Status0[31:0] = 3004000F
32 MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
33 SDI EX15/S0 Dstop0[31:0] = 00010001
34 Dstop0[16]: D DARB texp requests all Dstop (M)
35 SDI EX15/S2 Master_Stop_Status0[31:0] = 00000008
36 MStop0[3]: SDI is Recordstopped
37 SDI EX15/S2 Dstop0[31:0] = 00088008
38 Dstop0[19]: D 1E SDI internal core requested Dstop
39 SDI EX15/S2 Core_Error0[31:0] = 00108010 Mask = 7FE8FFFF
40 CoreErr0[20]: D 1E Steering bus A input parity error (S)
41 {steera_parin,steera_in[32:0]} = 0.00000020
42 FAIL EXB EX15: Dstop/Rstop detected by SDI EX15/S2.
43 Primary service FRU is EXB EX15.
44 SDI EX16/S0: All SDI is DStopped and RStopped, requested by DARB.
45 SDI EX17/S0: All SDI is DStopped and RStopped, requested by DARB.
46 DARB C0: enabled ports (expanders) [17:0]: 3FFFF
47 DARB C0: other darb req Dstop+Rstop for exps[17:0]: 08000
48 DARB C1: enabled ports (expanders) [17:0]: 3FFFF
49 DARB C1: other darb req Dstop+Rstop for exps[17:0]: 08000 SOLUTION SUMMARY:
- Troubleshooting:
The dump header tells us that this Dstop was generated by dsmd (lines 10,11) while a
domain was active. This is also evident by the dumpf file name. dsmd.dstop files are
created by dsmd as part of an ASR. Walking the error chain:
- EX15 is the first error in the domain. The slave SDI2 requests the Dstop (line 30).
- The specific errors in SDI2 are reported next. We have a Steering bus A input parity
error (lines 39-41).
- All other expanders error free. We can quickly determine this because these expanders
only have a single line of output in wfail.
- wfail then informs us to FAIL EX15 (lines 42-43) as the primary FRU.
The steering busses direct data flow through the SDI. Steering is generated in the Master
SDI and driven to the slave SDIs. The steering tells the SDIs where to look for the next
transfer of data. For example, if the centerplane wants to transfer to Slot 0, steering
tells the Slot 0 port of the SDIs to take data from the Centerplane. Referring back to
the wfail output, EX15/S0 is our Master SDI and EX15/S2 is the slave SDI reporting the
error. Thus, the parity error occurred between EX15/S0 and EX15/S2. Since the steering bus
is completely contained within the expander, EX15 is the faulty FRU.
- Resolution:
Replace the Expander reporting the steering parity error. In this example, replace EX15.
- Summary of part number and patch ID's
501-5179 Expander
- References and bug IDs
SunSolve Article 48122
- Additional background information:
Looking deeper, the history of the SDIs can be examined further to illustrate the parity
error. Let's start with the steering history on EX15/S0:
50 redxl> shsdi 15 0 steera
51 Note: Data is displayed from the currently loaded dump file.
52 SDI EX15/S0 Output history of Steer A
53 <----- STEERA ---->
54 STEERA STOP
55 [32:0] P DEMA P entry
56 1FFFFFFDF 1 1 1 0 old
57 1FFFFFF9F 0 1 1 1
58 1FFFFFFDF 1 1 1 2
59 1FFFFFF9F 0 1 1 3
60 1FFFFFFDF 1 1 1 4
61 1FFFFFF9F 0 1 1 5
62 1FFFFFFDF 1 1 1 6
63 1FFFFFF9F 0 1 1 7
64 1FFFFFFDF 1 1 1 8
65 1FFFFFF9F 0 1 1 9
66 1FFFFFFDF 1 1 1 10
67 1FFFFFF9F 0 1 1 11
68 1FFFFFFDF 1 1 1 12
69 1FFFFFF9F 0 1 1 13
70 1FFFFFFDF 1 1 1 14
71 1FFFFFF9F 0 1 1 15
72 1FFFFFFDF 1 1 1 16
73 1FFFFFF9F 0 1 1 17
74 1FFFFFFDF 1 1 1 18
75 1FFFFFF9F 0 1 1 19
76 1FFFFFFDF 1 1 1 20
77 1FFFFFF9F 0 1 1 21
78 1FFFFFFDF 1 1 1 22
79 1FFFFFF9F 0 1 1 23
80 1FFFFFFDF 1 1 1 24
81 1FFFFFF9F 0 1 1 25
82 1FFFFFFDF 1 0 0 26<
83 1FFFFFF9F 0 1 1 27
84 1FFFFFFDF 1 1 1 28
85 1FFFFFF9F 0 1 1 29
86 1FFFFFFDF 1 1 1 30
87 1FFFFFF9F 0 1 1 31 new
The cycle of interest is cycle 26 (line 82) and tagged by a <, where we have a steering
value of 1FFFFFFDF a parity of 1. The steering busses are protected by even parity, so
already we've got a disconnect. 1FFFFFFDF has 32 1's. Parity should be a zero. Now for
the steering history on EX15/S2:
88 redxl> shsdi 15 2 steera
89 Note: Data is displayed from the currently loaded dump file.
90 SDI EX15/S2 Output history of Steer A
91 <----- STEERA ---->
92 STEERA STOP
93 [32:0] P DEMA P entry
94 1FFFFFFFF 1 1 1 0 old
95 1FFFFFFFF 1 1 1 1
96 1FFFFFFFF 1 1 1 2
97 1FFFFFFFF 1 1 1 3
98 1FFFFFFFF 1 1 1 4
99 1FFFFFFFF 1 1 1 5
100 1FFFFFFFF 1 1 1 6
101 1FFFFFFFF 1 1 1 7
102 1FFFFFFFF 1 1 1 8
103 1FFFFFFFF 1 1 1 9
104 1FFFFFFFF 1 1 1 10
105 1FFFFFFFF 1 1 1 11
106 1FFFFFFFF 1 1 1 12
107 1FFFFFFFF 1 1 1 13
108 1FFFFFFFF 1 1 1 14
109 1FFFFFFFF 1 1 1 15
110 1FFFFFFFF 1 1 1 16
111 1FFFFFFFF 1 1 1 17
112 1FFFFFFFF 1 1 1 18
113 1FFFFFFFF 1 1 1 19
114 1FFFFFFFF 1 1 1 20
115 1FFFFFFFF 1 1 1 21
116 1FFFFFFFF 1 1 1 22
117 1FFFFFFFF 1 1 1 23
118 1FFFFFFFF 1 1 1 24
119 1FFFFFFFF 1 1 1 25
120 1FFFFFFFF 1 1 1 26<
121 1FFFFFFFF 1 1 1 27
122 1FFFFFFFF 1 1 1 28
123 1FFFFFFFF 1 1 1 29
124 1FFFFFFFF 1 1 1 30
125 1FFFFFFFF 1 1 1 31 new
On cycle 26 (line 120), all values are high. The steering value is 1FFFFFFFF with
a parity of 1. This parity is correct. Comparing 1FFFFFFDF (EX15/S0) to this, bit 5
is flipped. This bit flip is seen in SDI2 (line 137) by an XOR of the steering
histories on that cycle.
126 redxl> shsdi -e 15 2
127 Note: Data is displayed from the currently loaded dump file.
128 SDI EX15/S2 Component ID = 64317049
129 Master_Stop_Status0[31:0] = 00000008
130 MStop0[3]: SDI is Recordstopped
131 Master_Stop_Status1[31:0] = 7F7F0000
132 Dstop0[31:0] = 00088008
133 Dstop0[19]: D 1E SDI internal core requested Dstop
134 Recordstop0[31:0] = 00000000
135 Core_Error0[31:0] = 00108010 Mask = 7FE8FFFF
136 CoreErr0[20]: D 1E Steering bus A input parity error (S)
137 {steera_parin,steera_in[32:0]} = 0.00000020
138 Core_ErrData[4:2][31:0] = 00000000 00080600 00000020
139 Core_ErrData[1:0][31:0] = 00000002 02DD3000
140 Core_Error1[31:0] = 00000000 Mask = FFFFFFFF
141 CP_Error0[31:0] = 00000000 Mask = 7F3F67FF
142 Slot0_Error0[31:0] = 00000000 Mask = 703FFFFF
143 Slot0_Error1[31:0] = 00000000 Mask = FFFF4FFF
144 Slot0_Error2[31:0] = 00000000 Mask = FFFFFFFF
145 Slot1_Error0[31:0] = 00000000 Mask = 703FFFFF
146 Slot1_Error1[31:0] = 00000000 Mask = FFFF4FFF
147 Slot1_Error2[31:0] = 00000000 Mask = FFFFFFFF
We also saw this in the initial wfail (line 41).
As an aside, a steering state of all 1's is the idle state for the bus. If we look
at the steering history for the remaining SDIs on EX15 (left to an exercise for the
reader), we'd see that all of the other slave SDIs are in the idle state.
- Meta-Data/Problem categorization:
Product/Platform: SF12K/SF15K
Category:
- Keywords
15K, 12K, SF15K, SF12K, starcat, dstop, Steering bus A input parity error INTERNAL SUMMARY:
SUBMITTER: Scott Davenport APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: