| SRDB ID | Synopsis | Date | ||
| 48834 | Sun Fire[TM] 3800-6800: Troubleshooting NCPQ_TO errors | 9 Dec 2002 |
| Status | Issued |
| Description |
Problem Statement:
This document aids in troubleshooting Non Cacheable Pending Queue Time Outs (NCPQ_TO) on Sun Fire 6800-3800 systems. NCPQ_TOs occur when data requests in Non Cacheable address space do not complete a transaction. Non Cacheable addresses space is Safari Device config and I/O address space.
Symptoms:
Error messages indicating a NCPQ_TO occurred are seen on the Domain Console. The error messages are also stored in the Domain Console Buffer and can be retrieved by the Sun Fire System Controller (SSC) command showlogs. If a loghost is configured, the error messages are stored on the loghost. NCPQ_TOs can occur during normal operation of the Domain or during POST. Here an example log of a NCPQ_TO error:
Feb 26 10:46:02 sq1sc Domain-C.SC: ErrorMonitor:Domain C has a SYSTEM ERROR
Feb 26 10:46:02 sq1sc Domain-C.SC: /N0/SB1 encountered the first error
Feb 26 10:46:02 sq1sc Domain-C.SC: RepeaterSbbcAsic reported first error on /N0/SB1
Feb 26 10:46:02 sq1sc Domain-C.SC: /partition1/domain0/SB1/bbcGroup0/sbbc0:
FE [15:15] : 0x1
ErrSum [31:31] : 0x1
SafErr [09:08] : 0x1 Fireplane device asserted an error
Feb 26 12:20:47 SunFireSc0 Domain-C.SC: /partition1/domain0/SB1/bbcGroup0/cpuAB/cpusafariagent0:
AFAR (high)[0x531] : 0x0000063c
AFAR [42:32] [10:00] : 0x63c
AFAR (low)[0x541] : 0xff800000
AFAR_2 (high)[0x571] : 0x0000063c
AFAR_2 [42:32] [10:00] : 0x63c
AFAR_2 (low)[0x581] : 0xff800000
AFSR (high)[0x551] : 0x00080000
PERR [19:19] : 0x1
AFSR_2 (high)[0x591] : 0x00080000
PERR [19:19] : 0x1
EMU B[0x511] : 0x03000000
AID_LK [24:24] : 0x1
NCPQ_TO [25:25] : 0x1 Interpretation:
A System error is detected and Domain C is PAUSED. From the device path in the error messages it can be determined that the error is detected on SB1 CPU A.
/partition1/domain0/SB1/bbcGroup0/cpuAB/cpusafariagent0
The Error Type is an NCPQ_TO. Using the Address Space Assignment in InfoDoc:
Non Cacheable Schizo Device Pair Agent ID 1E Leaf B. (I/O Boat 9 Slots 0,1,2 )
Possible Causes:
There are many possible hardware and software root causes for NCPQ_TOs. They can be caused by faulty CPUs, I/O Bridge ASICs (Schizo), PCI cards as well as Bugs in the Microcode of cPCI/PCI cards. The following scenarios have been known to cause NCPQ_TOs on Sun Fire 3800-6800 systems:
Troubleshooting:
In general the device indicated by the AFAR_2 is likely to be the cause for the NCPQ_TO. However the device reporting the error can as well be the cause. If an NCPQ_TO occures the following steps should be taken to isolate the suspect FRU:
Run POST with a diag level set to default or higher.
0x00000400.0a400010 -> Safari Agent ID 14(hex), CPU0 on CPU/Memory board 5.
0x00000402.61000380 -> Safari Agent 18(hex) Schizo 0 Leaf B, on I/O Boat 6 P0 B1.
References and bug IDs:
Keyword:
Sun Fire 6800,Sun Fire 4800, Sun Fire 3800, NCPQ_TO
INTERNAL SUMMARY: