| SRDB ID | Synopsis | Date | ||
| 48204 | Sun Fire[TM] 12K/15K: Dstop: Data path command parity error detected by SDI(M) | 31 Oct 2002 |
| Status | Issued |
| Description |
- Problem Statement:
Dstop: Data path command parity error detected by SDI(M).
- Symptoms:
redx wfail command output reports the following failure signature:
redxl> dumpf load dsmd.dstop.020507.2037.16
Created Tue May 7 20:37:18 2002
By hpost v. 1.2 Generic 112488-04 Mar 18 2002 14:43:00 executing as pid=4959
On ssc name = rasputin-sc0.SD_RASCAL.West.Sun.COM
Domain = 0=A Platform = rasputin
Boards in dump: master SC CPs/CSBs[1:0]: 3
EXB[17:0]: 12100
Slot0[17:0]: 12100
Slot1[17:0]: 12100
-D option, -d
"DSMD DomainStop Dump"
0 errors occurred while creating this dump.
redxl> wfail
SDI EX08/S0 Master_Stop_Status0[31:0] = B004000F
MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
SDI EX08/S0 Dstop0[31:0] = 00098008
Dstop0[16]: D DARB texp requests all Dstop (M)
Dstop0[19]: D 1E SDI internal core requested Dstop
SDI EX08/S0 Core_Error0[31:0] = 00208020 Mask = 0051FFFF
CoreErr0[21]: D 1E AXQ Data path command parity error (M)
{dat_cmdp,dat_cmd[23:0]} = 0000001. {retired,half_used} = 3
NOTE: Compare dat+par to AXQ out history to isolate 1-bit errors.
FAIL EXB EX8: Dstop/Rstop detected by AXQ.
Primary service FRU is EXB EX8.
SDI EX13/S0: All SDI is DStopped and RStopped, requested by DARB.
SDI EX16/S0: All SDI is DStopped and RStopped, requested by DARB.
DARB C0: enabled ports (expanders) [17:0]: 16100
DARB C0: other darb req Dstop+Rstop for exps[17:0]: 00100
DARB C1: enabled ports (expanders) [17:0]: 16100
DARB C1: other darb req Dstop+Rstop for exps[17:0]: 00100
redxl> shsdi -e 8
Note: Data is displayed from the currently loaded dump file.
SDI EX08/S0 Component ID = 64317049
Master_Stop_Status0[31:0] = B004000F
MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
Master_Stop_Status1[31:0] = E8E8000E
0x08 CP1StopExp[4:0] MSS1[20:16]
3 CP1StopSlot[0:1] MSS1[22:21] Dstop is 1st stop
1 CP1StopInfoValid MSS1[23]
0x08 CP0StopExp[4:0] MSS1[28:24]
3 CP0StopSlot[0:1] MSS1[30:29] Dstop is 1st stop
1 CP0StopInfoValid MSS1[31]
Dstop0[31:0] = 00098008
Dstop0[16]: D DARB texp requests all Dstop (M)
Dstop0[19]: D 1E SDI internal core requested Dstop
Dstop1[31:0] = 00000000
Recordstop0[31:0] = 00018001
Rstop0[16]: R 1E DARB texp request Recordstop (M)
Recordstop1[31:0] = 00000000
Core_Error0[31:0] = 00208020 Mask = 0051FFFF
CoreErr0[21]: D 1E AXQ Data path command parity error (M)
{dat_cmdp,dat_cmd[23:0]} = 0000001. {retired,half_used} = 3
NOTE: Compare dat+par to AXQ out history to isolate 1-bit errors.
Core_ErrData[4:2][31:0] = 00000000 00080700 00000060
Core_ErrData[1:0][31:0] = 00000007 00001001
Core_Error1[31:0] = 00000000 Mask = FFFFFFFF
Sysreg_Error[31:0] = 00000000 Mask = 780377FF
STB_Error[31:0] = 00000000 Mask = 7F00FFFF
CP_Error0[31:0] = 00000000 Mask = 580067FF
CP_Error1[31:0] = 00000000 Mask = 7FFCFFFF
Slot0_Error0[31:0] = 00000000 Mask = 7000FFFF
Slot0_Error1[31:0] = 00000000 Mask = 31444EBF
Slot0_Error2[31:0] = 00000000 Mask = 7FFCFFFF
Slot1_Error0[31:0] = 00000000 Mask = 3000FFFF
Slot1_Error1[31:0] = 00000000 Mask = 31404EBF
Slot1_Error2[31:0] = 00000000 Mask = 7FFCFFFF
redxl> shaxq 8 h
Note: Data is displayed from the currently loaded dump file.
AXQ EX08 Ecc-compressed output history[6:0] to AMX, RMX, and SDI.
<---- AMX ----> RMX SDI DpCmd Sysreg
1.1 1.0 0.1 0.0 0 1 OE Ecc OE Ecc entry
15 15 15 15 05 05 1 68 0 49 0 old
15 15 15 15 05 05 1 68 0 49 1
15 15 15 15 05 05 1 68 0 49 2
15 15 15 15 05 05 1 68 0 49 3
15 15 15 15 05 05 1 68 0 49 4
15 15 15 15 05 05 1 68 0 49 5
15 15 15 15 05 05 1 68 0 49 6
15 15 15 15 05 05 1 68 0 49 7
15 15 15 15 05 05 1 68 0 49 8
15 15 15 15 05 05 1 68 0 49 9
15 15 15 15 05 05 1 68 0 49 10
15 15 15 15< 05 05< 1 68 0 49 11
15 15 15 15 05 05 1 68 0 49 12
15 15 15 15 05 05 1 68 0 49 13
15 15 15 15 05 05 1 68 0 49 14
15 15 15 15 05 05 1 68 0 49 15
15 15 15 15 05 05 1 68 0 49 16
15 15 15 15 05 05 1 68 0 49 17
15 15 15 15 05 05 1 68 0 49 18
15 15 15 15 05 05 1 68 0 49 19
15 15 15 15 05 05 1 68 0 49 20
15 15 15 15 05 05 1 68 0 49 21
15 15 15 15 05 05 1 68 0 49 22
15 15 15 15 05 05 1 68 0 49 23
15 15 15 15 05 05 1 68 0 49 24
15 15 15 15 05 05 1 68 0 49 25
15 15 15 15 05 05 1 68< 0 49 26
15 15 15 15 05 05 1 68 0 49 27
15 15 15 15 05 05 1 68 0 49 28
15 15 15 15 05 05 1 68 0 49 29
15 15 15 15 05 05 1 68 0 49 30
15 15 15 15 05 05 1 68 0 49 31 new
NOTE: If a parity error was detected by a receiving AMX, RMX, or SDI, the ecc history
entry indicated by '<' in this display can be compared to the receiver's data capture
to isolate 1-bit errors. Use the command "parse axqoh" to do this analysis.
This assumes only a single error exists in the system; multiple
errors can delay recordstop, causing the history of interest to be in an indeterminate
older entry in the output history.
redxl> parse axqoh d x0000001 x68
SDI Dpath cmd capture[24:0] = 0000001. Computed ecc = 47. AXQ hist ecc = 68.
Could be a 1-bit error in bit 0 (as used to compute AXQ oh ecc).
SOLUTION SUMMARY:
- Troubleshooting: It is evident from dump header that this Dstop dumpfile was generated by dsmd while the domain was running. This is also evident by the dump file name - dsmd.dstop files are created by dsmd as part of an ASR. Note the following first two errors (1E) on the two different error registers: Dstop0 - SDI internal core requested Dstop CoreErr0 - AXQ Data path command parity error (M) Note FAIL EXB EX8. This would be what POST would choose to deconfigure in order to recover the domain with maximal fault-free domain given the fault implied by this error during the POST run. Note the recommendation to the FRU(s) to be replaced in order to remove the fault: Primary service FRU is EXB EX8. AXQ sends data commands and domain/record stop information to the SDI(M) over the 24 bit unidirectional data path command interface data_cmd_l. One command can be sent per cycle. Parity (even) is provided concurrently with the transfer and allows a quiescent state of all highs on the active low bus. Using redx on AXQ out history to isolate 1-bit errors: redxl> parse axqoh d x0000001 x68, we have a possible 1-bit error in bit 0. - Resolution: This Data Path Command signal is on the Expander Board from AXQ to SDI(M). The service FRU is EXB 8. - Summary of part number and patch ID's http://infoserver.central.sun.com/data/syshbk/Systems/SunFire15K/component.centerplane.html - References and bug IDs Specification for an ASIC - SDI. - Additional background information: - Meta-Data/Problem categorization: Product/Platform: SF12K/SF15K Category: - Keywords 15K, 12K, SF15K, SF12K, starcat, dstop, AXQ, SDI(M), axqoh, Data path Command parity error
INTERNAL SUMMARY:
SUBMITTER: Tong-Pheng Koh APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: