| SRDB ID | Synopsis | Date | ||
| 48190 | Sun Fire[TM] 12K/15K: Rstop: No components would be failed based on this state | 31 Oct 2002 |
| Status | Issued |
| Description |
- Problem Statement: Rstop: No components would be failed based on this state - Symptoms: 'wfail' output reports something similar to the following: 01 redxl> dumpf load dsmd.rstop.020805.1100.46 02 Created Mon Aug 5 11:00:46 2002 03 By hpost v. 1.2 Generic 112488-04 Mar 18 2002 14:43:00 executing as pid=22831 04 On ssc name = sc0. 05 Domain = 4=E = etuac21 Platform = sfgedas1 06 Boards in dump: master SC CPs/CSBs[1:0]: 3 07 EXB[17:0]: 00C00 08 Slot0[17:0]: 00400 09 Slot1[17:0]: 00C00 10 -D option, -d 11 "DSMD RecordStop Dump" 12 0 errors occurred while creating this dump. 13 redxl> wfail 14 SDI EX10/S0: SDI is RStopped, requested by DARB. 15 SDI EX11/S0 Master_Stop_Status0[31:0] = F0040308 16 MStop0[3]: SDI is Recordstopped 17 SDI EX11/S0 Recordstop0[31:0] = 04018400 18 Rstop0[16]: R DARB texp request Recordstop (M) 19 Rstop0[26]: R 1E Slot0 asserted EccErr, enabled to cause Rstop (M) 20 Note: SDI EX11/S0 detects error from Slot SB11, not in dump. Ignored. 21 DARB C0: enabled ports (expanders) [17:0]: 03FFF 22 DARB C0: exps request Rstop [17:0]: 00800 23 DARB C0: other darb req Rstop for exps [17:0]: 00800 24 DARB C1: enabled ports (expanders) [17:0]: 03FFF 25 DARB C1: exps request Rstop [17:0]: 00800 26 DARB C1: other darb req Rstop for exps [17:0]: 00800 27 No components would be failed based on this state.
SOLUTION SUMMARY:
- Troubleshooting:
The dump header tells us that this Rstop was generated by dsmd (lines 10,11) while
a domain was active. This is also evident by the dump file name. dsmd.rstop files are
created by dsmd as part of error capturing. Walking the error chain:
- EX11/S0 (SDI0) reports a first error from its Slot 0 board, SB11 (line 19).
- However, on line 20, wfail notes that SB11 is not in the dump. The "Ignored"
statement means that SB11 is not considered in selecting a component to FAIL.
- As no other errors are present, the diagnosis returns no failures (line 27).
Therefore, the source of the error is attributed to an ECC error from a board that is
not in this domain. We can confirm that SB11 was indeed not included in the dump
by checking the Slot 0 board mask (line 08). Since POST refers to the PCD to
determine which boards are part of a domain, this tells us that SB11 is not part
of Domain E in the PCD.
At this point in the analysis, it is wise to examine activity on other domains
around the time of this Dstop. Looking through the explorer we see:
28 % ls sf15k/[A-R]/adm/dump/dsmd.rstop.020805.*.
29 D/adm/dump/dsmd.rstop.020805.2042.40 E/adm/dump/dsmd.rstop.020805.1100.46
30 F/adm/dump/dsmd.rstop.020805.1101.08
There's also an Rstop on Domain F shortly after the Domain E Rstop. And we find
a relationship between Domains E and F:
31 % grep "^[SI][BO]11" sf15k/showboards_-v.out
32 IO11/C3V0 On C3V - - etuac21
33 IO11/C5V0 On C5V - - etuac21
34 IO11/C3V1 On C3V - - etuac21
35 IO11/C5V1 On C5V - - etuac21
36 SB11 On CPU Active Passed etuac31
37 IO11 On HPCI Active Passed etuac21
38 % grep "^[EF]" sf15k/showplatform_-v.out | head -2
39 E etuac21 etuac21 Running Solaris
40 F etuac31 etuac31 Running Solaris
Domains E and F share EX11, and SB11 is assigned to Domain F. We can determine
that EX11 is split from the dump file as well:
41 redxl> shsdi -v 11
42 Note: Data is displayed from the currently loaded dump file.
43 SDI EX11/S0 Component ID = 64317049
44 Master_Reset_Config[31:0] = 0B000000
45 0 SDI_diserrlog MResC[0] => SDI Intern Reset
46 0 Slot0_diserrlog MResC[1]
47 0 Slot1_diserrlog MResC[2]
48 0x0B ExpID[4:0] MResC[28:24]
49 0 Mode[2:0] MResC[31:29] Master (0)
50 Master_Stop_Config[31:0] = 41001997
51 1 DstopEnbl MStopC[0]
52 1 RstopEnbl MStopC[1]
53 1 SCIntEnbl MStopC[2]
54 0 L1Err->ErrPause MStopC[3]
55 1 Dstop->ErrPause MStopC[4]
56 0 L1Ecc->ScInt[1:0] MStopC[6:5]
57 3 L1Ecc->Rstop[1:0] MStopC[8:7]
58 0 L1Err->ScInt[1:0] MStopC[10:9] L1Slot asserted err
59 3 L1Err->Dstop[1:0] MStopC[12:11] L1Slot asserted err
60 0 SBBCErr->SCInt MStopC[13]
61 0 SBBCErr->Dstop MStopC[14]
62 1 EnblStopReqChk MStopC[24]
63 0 L1Dstop->ExpDStop MStopC[28]
64 0 AnyDstop->ExpDStop MStopC[29]
65 1 Dstop->DReset MStopC[30] For split exp
66 0 ShiftErrPausePhase MStopC[31]
67 Core_Config[21:0] = 0DB3E2
68 0 Pass4TargIDDisbl CoreC[0] Rev 4+
69 1 Slot1=SerDom1 CoreC[1] Rev 4+
70 1 SplitSlotEnbl CoreC[5] In master SDI (0)
However, SB11's assignment is only available from explorer (or, of course, a live SC).
Because EX11 is a split expander, and the source of the error is within that
expander's boardset, Domain E suffers a residual Rstop. It can be ignored.
- Resolution:
The real source of the problem is a stop condition on a different domain
that shares this expander (Domain F in the example above). Analyze that
stop dump.
- Summary of part number and patch ID's
- References and bug IDs
SunSolve Article 48122
- Additional background information:
http://cpre-amer.west.sun.com/esg/hsg/starcat/xctt/hw_expander_split.html
- Meta-Data/Problem categorization:
Product/Platform: SF12K/SF15K
Category:
- Keywords
15K, 12K, SF15K, SF12K, starcat, rstop, split, expander, no, components, failed INTERNAL SUMMARY:
SUBMITTER: Scott Davenport APPLIES TO: Hardware/Sun Fire /15000, Hardware/Sun Fire /12000 ATTACHMENTS: