TCO-number: 7.1000 Written-by: MCCOLLUM Creation-date: 24-Nov-86 13:27:38 Edit-date: 5-Jan-87 14:09:29 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Problem: This is a test TCO for release 7.0 Diagnosis: As above. Solution: Write release 7.0 [End of TCO 7.1000] TCO-number: 7.1002 Written-by: RASPUZZI Creation-date: 28-May-87 15:36:47 Edited-by: RASPUZZI Edit-date: 5-Jun-87 15:09:14 Edit-checked: Yes Document: No TCO-tested: No Maintenance-release: No Hardware-related: Yes Program: MONITOR Routines-affected: APRSRV GLOB PAGEM PAGUTL PHYKLP PHYKNI PHYP2 PHYP4 PHYSIO STG Problem: Some of the I/O that TOPS-20 does uses the simulation of PMOVE/PMOVEM. These instructions now exist in the KL microcode. Diagnosis: It would be advantageous to use these instructions in the monitor instead of simulating them. Solution: Implement PMOVE/PMOVEM instructions in several modules that reference physical memory and watch performance get better (hopefully). [End of TCO 7.1002] TCO-number: 7.1003 Written-by: RASPUZZI Creation-date: 28-May-87 16:21:21 Edited-by: RASPUZZI Edit-date: 5-Jun-87 15:10:58 Edit-checked: Yes Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC Problem: During system startup, a brain damaged operator can walk away from the CTY and leave the "Why Reload?" and "Run CHECKD?" questions unanswered. Diagnosis: The monitor does a RDTTY% to obtain an answer from these questions and the RDTTY% does not have a timeout feature. Solution: Redo the code around the two questions for the benefit of these neanderthal operators so that the questions will timeout in 60 seconds and the system will continue to boot. [End of TCO 7.1003] TCO-number: 7.1005 Written-by: GSCOTT Creation-date: 29-May-87 10:38:36 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: pagutl Problem: CFZCNT BUGHLTs Diagnosis: Between the time that DDMPF selects an OFN for DDOCFS to process and DDOCFS starts working on the OFN at process level, a vote can come in at interrupt level that causes the OFN to become cached. The code at DDOCFS+12 attempts to detect this and avoind doing anything with the cached OFN, and goes OKSKED then goes to DDOCF1 which called CFSFOD. CFSFOD was already called when we cached the OFN, and should not be called again after the OFN is cached. Solution: Don't call CFSFOD if the OFN is cached; just JRST down to DDGOD. [End of TCO 7.1005] TCO-number: 7.1006 Written-by: GSCOTT Creation-date: 1-Jun-87 19:03:36 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC Problem: Various problems with edit 7456 to 6.1 monitor Diagnosis: There was one edit lost and there are three minor changes needed to MEXEC to make it all work right. Solution: Fix MEXEC to report proper times on logout, not account for not-logged-in jobs, and correct entries in LOGLST and LOGLSD tables. [End of TCO 7.1006] TCO-number: 7.1009 Written-by: RASPUZZI Creation-date: 3-Jun-87 10:53:15 Edited-by: RASPUZZI Edit-date: 5-Jun-87 15:11:53 Edit-checked: Yes Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: SCAPAR Problem: There is no way to determine how many words are in the user area of an SCA buffer. Diagnosis: No one defined a mnemonic for it. Solution: Add C%MUDA (Maximum User Data Area) to SCAPAR so anyone who needs to know this can use C%MUDA. [End of TCO 7.1009] TCO-number: 7.1010 Written-by: MCCOLLUM Creation-date: 5-Jun-87 11:25:43 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: ENACT RUNDI1 Related-SPR: 21635 Problem: ILMNRF BUGHLTs when ACCOUNTS-TABLE.BIN is an empty file. Diagnosis: Routine ENACT is called to map the first page of the ACCOUNTS-TABLE.BIN into the monitors address space at location HSHPG. No check is made to see if the file is empty. Later when this page is touched by routine VERACT, an ILMNRF BUGHLT results if the file was, in fact, empty. Solution: In routine ENACT, check to see if the file is zero pages long. If it is, don't attempt to map in the first page of the file and return an error to the user. Also, fix up routine RUNDI1 in MEXEC that calls ENACT to display a more useful error message when ENACT fails. [End of TCO 7.1010] TCO-number: 7.1011 Written-by: LOMARTIRE Creation-date: 15-Jun-87 09:03:32 Edit-checked: Yes Document: Yes TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: CFSJYN Problem: When TOPS-20 initializes, one of its many functions is to ensure that it has "joined" the cluster completely. In this case, "joining" means that this node has an open virtual circuit to every other node on the CI that is answering REQUEST-IDs and that there exist a CFS connection to all such nodes which are KLs running TOPS-20. This "joining" check is done early in the initialization of the system and before any shared file I/O can take place. This process ensures that the file system remains intact while systems leave and enter the cluster. However, sometimes there are problems with this "joining" process. When this occurs, the system appears hung and, until the problem is resolved, will not initialize. Diagnosis: When this occurs, it would be helpful to indicate what the problem is so that some steps can be taken to resolve it. The blocking can be caused by two factors: 1. The system sees a node on the CI which is answering REQUEST-IDs but a System Block has not yet been formed by the CI Port Driver (PHYKLP). This can occur if the START/STACK/ACK sequence is not completing. 2. The system sees another TOPS-20 node on the CI, for which a System Block has been formed, but for which there does not exist a CFS connection. Solution: When one of these problems is detected in the "joining" process, a message will be printed on the CTY describing the condition. This message will appear at most once a minute for any node to which "joining" is blocked. Also, no message will be printed for the first 5 seconds of these conditions to allow them to resolve themselves. The messages will be of the following format: %CANNOT JOIN CLUSTER WITH NODE nn BECAUSE: No System Block created - OR - %CANNOT JOIN CLUSTER WITH NODE nn BECAUSE: No CFS connection Note that the node number will be printed in decimal. [End of TCO 7.1011] TCO-number: 7.1012 Written-by: MCCOLLUM Creation-date: 16-Jun-87 15:03:36 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CFMDSN Problem: CFSSUF BUGHLTs when dismounting a structure. Diagnosis: Problems arise when there are two disk drives with the same serial numbers on the same system. When a structure is mounted, CFS acquires tokens for the DSN of each disk drive in the structure. If two structures contain disks with the same serial number, then the DSN token is released when the first of these two structures is dismounted. Later, when MOUNTR attempts to dismount the second structure, it tries to gain exclusive access for the structure in the CFS cluster. Routine CFSSUG attempts to look up the DSN token in the CFS data base and crashes with a CFSSUF BUGHLT when it is not found. Solution: Having two disk drives with the same serial number is an illegal configuration in a CFS environment, but TOPS-20 should handle the situation better. Routine CFMDSN, which registers DSN tokens during a structure mount, will be changed to fail when the DSN token is already in use for another disk structure. It will also now issue a CFDDSN BUGCHK informing the system manager or operator of this illegal configuration so Field Service can change a disk serial number. This change also requires that routine CFSSDI be changed. CFSSDI currently removes all CFS tokens acquired when a structure mount failed. This routine must be changed to release only the DSN tokens that were actually acquired during the process of the failed mount. This can be done by checking the HSHCNT in the DSN token resource block. If the value in HSHCNT is one, the block can be released. If it is greater than one, then HSHCNT is decremented but the block is not released. [End of TCO 7.1012] TCO-number: 7.1013 Written-by: RASPUZZI Creation-date: 23-Jun-87 14:27:51 Edited-by: RASPUZZI Edit-date: 23-Jun-87 14:39:02 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LLMOP Problem: NSKDIS BUGHLTs when using LLMOP% JSYS. Diagnosis: Fix fat fingered mistake. Routine RCRWAI was mistakenly moved to XRESCD when LLMOP was dropped into section 6. Scheduler tests must be in RESCD. Solution: Move RCRWAI out of XRESCD into RESCD. [End of TCO 7.1013] TCO-number: 7.1014 Written-by: RASPUZZI Creation-date: 29-Jun-87 15:48:47 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: GTJFN LOOKUP DIRECT STG GLOBS COMND Problem: It would be nice if TOPS-20 did partial file recognition and partial COMND% keyword/switch recognition. Diagnosis: COMND, GTJFN and DIRECT do not have that functionality yet. Solution: Give COMND, GTJFN and DIRECT the ability to partially recognize files and command keyword/switches. [End of TCO 7.1014] TCO-number: 7.1015 Written-by: LOMARTIRE Creation-date: 30-Jun-87 12:00:17 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DRMIO Problem: TCSOFN BUGHLTs occur when they are not necessary. Diagnosis: During the caching process, it is possible for an OFN to be swapped out before it is completly cached. So, when it is referenced again (i.e. when it is uncached), the OFN will have to be swapped in. The current code will TCSOFN BUGHLT if a cached OFN is swapped in or out. It is not necessary to BUGHLT if the OFN is swapped in. Solution: Only issue the TCSOFN BUGHLT on a disk write of a cached OFN. [End of TCO 7.1015] TCO-number: 7.1016 Written-by: RASPUZZI Creation-date: 7-Jul-87 14:14:03 Edited-by: RASPUZZI Edit-date: 7-Jul-87 14:15:06 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: STG Problem: Not enough SNOOP pages to run WATCH and SYSDPY at the same time. Diagnosis: SNPDPC is set to a mere 12 pages. Solution: Increase SNPDPC to a number (like 30) so that SYSDPY and WATCH can coexist. [End of TCO 7.1016] TCO-number: 7.1017 Written-by: RASPUZZI Creation-date: 7-Jul-87 14:20:56 Edited-by: RASPUZZI Edit-date: 7-Jul-87 14:21:29 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JNTMAN Problem: The NODE% JSYS always takes the first six characters when verifying a DECnet node. Diagnosis: DECnet nodes only have 6 characters. However, who's to say that a moronic user can't pass NODE% a string longer than 6 characters. Solution: Have the .NDVFY function return COMX19 when the user passes a string of more than 6 characters. [End of TCO 7.1017] TCO-number: 7.1018 Written-by: RASPUZZI Creation-date: 9-Jul-87 09:36:26 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: STG Problem: Can't build monitor. Diagnosis: Too many SNOOP pages. Solution: Remove TCO 7.1016. WATCH and SYSDPY will have to fight it out. [End of TCO 7.1018] TCO-number: 7.1019 Written-by: RASPUZZI Creation-date: 9-Jul-87 16:11:39 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC Problem: The "R" in the "Why reload?" question is capitalized when the question times out. Diagnosis: This is a major catastrophic inconsistency. Solution: Shoot engineer, then make the "R" be lowercase. [End of TCO 7.1019] TCO-number: 7.1020 Written-by: RASPUZZI Creation-date: 14-Jul-87 14:11:32 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSA Problem: A system administrator can expire any user's password but the unprivileged user can override the system administrator's work. Diagnosis: Half baked implementation of password expiration. Non-WHEELies should not be able to change their password expiration. Solution: Only allow WHEELed (or OPERATORs) to change the .CMPMU and .CMPED words when doing a CRDIR%. [End of TCO 7.1020] TCO-number: 7.1021 Written-by: LOMARTIRE Creation-date: 15-Jul-87 10:11:11 Edited-by: LOMARTIRE Edit-date: 6-Aug-87 15:34:26 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: PISC7 CLOSV4 LSNUP CFSDMP CFDLSN CFSJYN Related-TCO: 7.1033 Problem: There is no easy way to obtain a simultaneous dump of the entire cluster. Diagnosis: No code to do it. Solution: Add the cluster dump facility. [End of TCO 7.1021] TCO-number: 7.1022 Written-by: RASPUZZI Creation-date: 17-Jul-87 10:40:37 Edited-by: RASPUZZI Edit-date: 17-Jul-87 10:43:53 Edit-checked: Yes Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC Related-SPR: 21621 Problem: ILMNRF crashes out of JOBCOF. Diagnosis: The monitor assumes that the controlling terminal will not go away in routine LDTACH before it puts this terminal number in a STKVAR variable. Unfortunately, there is a case where the controlling terminal can go away and cause the monitor to pick up a bogus value out of this STKVAR location. Solution: Save the controlling terminal in the STKVAR variable as soon as it has been discovered that the controlling terminal still exists. [End of TCO 7.1022] TCO-number: 7.1023 Written-by: RASPUZZI Creation-date: 18-Jul-87 09:47:18 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: FUTILI Problem: DLUSER doesn't work. Diagnosis: Edit 7411 attempted to prevent ILMNRFs in the wrong place. Solution: Remove edit 7411 and figure out where the fix really should be. [End of TCO 7.1023] TCO-number: 7.1024 Written-by: MCCOLLUM Creation-date: 21-Jul-87 10:53:43 Edited-by: MCCOLLUM Edit-date: 21-Aug-87 17:08:57 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV CTHSRV Related-TCO: 7.1037 7.1042 7.1043 Problem: Section 0/1 address space is full. The Increase Structure Limit project requires section 0/1 address space. Diagnosis: LATSRV and CTHSRV can be made to run in section XCDSEC. Solution: Move the code in LATSRV.MAC and CTHSRV.MAC into section XCDSEC. [End of TCO 7.1024] TCO-number: 7.1026 Written-by: RASPUZZI Creation-date: 23-Jul-87 14:51:32 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: TCPJFN Related-SPR: 21611 Problem: Repeated IPGCOL and INTFR6 BUGINFs and IP free space has disappeared. Diagnosis: Doing a GTJFN% on a TCP: device causes a prototype TCB to be allocated from IP free space. If the the user attempts to OPENF% this JFN and the open fails, the free space is lost when the user discards the JFN. Solution: Teach TCPOP5 to return IP free space when the open on the TCP connection fails for the TCP: JFN. [End of TCO 7.1026] TCO-number: 7.1027 Written-by: RASPUZZI Creation-date: 28-Jul-87 14:24:53 Edited-by: RASPUZZI Edit-date: 28-Jul-87 14:26:14 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: TCPTCP Related-SPR: 21610 Problem: ILMNRF or possible ILPSEC BUGHTLs. Diagnosis: Routine TRMPKT is not resetting an AC properly before calling routines to free IP free space blocks. Solution: Have TRMPKT reset T1 to the appropriate value before getting to cade that calls RETPKT. [End of TCO 7.1027] TCO-number: 7.1029 Written-by: RASPUZZI Creation-date: 29-Jul-87 08:15:51 Edited-by: LOMARTIRE Edit-date: 6-Aug-87 15:38:53 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CFSSRV Problem: CFCTNF BUGHLTs. Diagnosis: When CFS does garbage collection to cleanup tokens that are no longer in use, it calls routine CFSSPC. Eventually, CFSSPC gets to routine CFSRS0 which attempts to collect all unused CFS tokens. The problem is this routine is cleaning up cached tokens also. Somehow, the keep bit is cleared for this particular cached token. Once the token has been removed and the OFN is garbage collected, the cached token will not be found and the CFCTNF BUGHLT will result. Solution: Have CFSRS0 not remove any cached token. Also, ensure that the keep bit is set everytime the token access marker (indicating a cached token) is set. Finally, when routine CFSAWT/CFSAWP exits, check to see if the keep bit is set. If not, then issue a CFKBNS BUGHLT. [End of TCO 7.1029] TCO-number: 7.1030 Written-by: RASPUZZI Creation-date: 30-Jul-87 14:23:43 Edited-by: RASPUZZI Edit-date: 30-Jul-87 14:24:41 Edit-checked: Yes Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MSTR Problem: MOUNTR is getting a local job number from the MONITOR during structure mount increments and decrements. Diagnosis: Yet another forgotten spot where a global job number should be used. Solution: Have the MONITOR send the global job number in the IPCF packet to MOUNTR instead of the local job number. [End of TCO 7.1030] TCO-number: 7.1032 Written-by: RASPUZZI Creation-date: 4-Aug-87 14:21:18 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: IO Problem: Device or data errors being returned through the SIN% JSYS when everything appears to be fine with the device. Diagnosis: Edit 7391 checks the error code explicitly for IOX7 (JSB free space exhausted) and if the error is not IOX7, it changes it to IOX5. This is bad if the routine is entered with MONX02 error. Solution: Have the routine check for MONX02. If the error was MONX02, have it changed to IOX7 and not IOX5. [End of TCO 7.1032] TCO-number: 7.1033 Written-by: LOMARTIRE Creation-date: 6-Aug-87 15:33:05 Edited-by: LOMARTIRE Edit-date: 6-Aug-87 15:34:25 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: CFSDMP Related-TCO: 7.1021 Problem: PITRAP BUGHLTs when doing cluster dump. Diagnosis: Routine CFSDMP sets the stack to be the extended dump stack and then does an EA.ENT to enter section 1. Well, $EAENT does a HRRZS 0(P) in order to zero the flags of the (assumed) return PC which is on the top of the stack. CFSDMP is called from section 0 from PISC7, so the HRRZS was done to 0,,address instead of 11,,address. This produces indeterminate results - usually BUGHLTs! Solution: Replace the EA.ENT with a XJRST [MSEC1,,.+1] to enter section 1. [End of TCO 7.1033] TCO-number: 7.1034 Written-by: RASPUZZI Creation-date: 10-Aug-87 14:05:06 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: IPCF MEXEC GLOBS Problem: Cluster INFO% and Dump on BUGCHK need some 0/1 space. Diagnosis: Section 0/1 space is a commodity. Solution: Move IPCF out of section 0/1. This gains about 4 pages of space. [End of TCO 7.1034] TCO-number: 7.1035 Written-by: GSCOTT Creation-date: 11-Aug-87 15:52:23 Edited-by: GSCOTT Edit-date: 14-Aug-87 10:09:19 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PHYSIO Related-SPR: 94 Problem: PDBSTA BUGCHKs followed by OVRDTA BUGCHK when booting system with two or more drives dual ported to the same pair of RH20s on the same system. Diagnosis: CHBDON gets the UDBST1 flags from the secondary UDB when a home block check is done using the secondary path. It discovers that UDBST1 is zero, and the result is the PDBSTA. Solution: Add code at PDBSTA to point P3 back to the primary UDB if P3 is pointing to the secondary UDB. [End of TCO 7.1035] TCO-number: 7.1036 Written-by: LOMARTIRE Creation-date: 12-Aug-87 10:30:29 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: PAGEM PAGUTL GLOBS Related-QAR: 21659 Problem: The "OFN lock tracer" feature enabled via the SPTDSW switch does not work anymore. Diagnosis: It was broken during 6.1 development and also additional trace points are needed. Solution: Add the additional trace points and make routine SPTRAC global. [End of TCO 7.1036] TCO-number: 7.1037 Written-by: MCCOLLUM Creation-date: 12-Aug-87 12:45:14 Edited-by: LOMARTIRE Edit-date: 2-Sep-87 14:17:05 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: SCAMPI Related-TCO: 7.1024 7.1042 7.1043 7.1048 Problem: More section 0/1 address space is needed for the Increase Structure Limit project. Diagnosis: SCAMPI can be made to run in section XCDSEC. Solution: Move the code in SCAMPI.MAC into section XCDSEC. [End of TCO 7.1037] TCO-number: 7.1040 Written-by: LOMARTIRE Creation-date: 19-Aug-87 12:30:04 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: ENQ Problem: We still need more 0/1 space. Diagnosis: Most of ENQ.MAC can be moved into XCDSEC. Solution: Move all of ENQ.MAC into XCDSEC except for routines: .ENQ, .DEQ, .ENQC, ENQCD, ENQTST, ENQFKR, ENQCLS, and ENQINI. Also, repaginate the listing to make it more readable. [End of TCO 7.1040] TCO-number: 7.1041 Written-by: LOMARTIRE Creation-date: 19-Aug-87 12:33:50 Edit-checked: Yes Document: Yes TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: ENQ Problem: Functions 1 and 2 of ENQC% do not work correctly when a global job number or -1 is supplied for the job number. Diagnosis: An IFNSK. has to be changed to an IFSKP.. Solution: Do it. Also, there is no mention in the documentation that a -1, supplied as the job number in the argument block, mean your own job. [End of TCO 7.1041] TCO-number: 7.1042 Written-by: MCCOLLUM Creation-date: 19-Aug-87 16:30:24 Edited-by: MCCOLLUM Edit-date: 21-Aug-87 17:09:02 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: STG Related-TCO: 7.1037 7.1024 7.1043 Problem: TOPS-20 will only allow a maximum of 32 (decimal) structures to be mounted system-wide. Diagnosis: There is not enough "units" free space in the section 0/1 resident free space pool to allow for more than 32 structures. Solution: Create more section 0/1 free space by moving code to XCDSEC. See the related TCOs for more details on this. Increase the "units" free space from 17500 (octal) words to 37200 words. Reorganize the JSB to allow the JSSTRT table (which is based on the number of structures) to increase in size. Increase STRN from 32 to 64 (decimal). [End of TCO 7.1042] TCO-number: 7.1043 Written-by: MCCOLLUM Creation-date: 21-Aug-87 17:08:57 Edited-by: MCCOLLUM Edit-date: 21-Aug-87 17:10:16 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: SCSJSY Related-TCO: 7.1024 7.1037 7.1042 Problem: Section 0/1 address space is nearly exhausted. Diagnosis: Increasing the structure limit to 64 (decimal) via TCO 7.1042 used up much section 0/1 address space. Solution: Reclaim the section 0/1 address space by moving the code in SCSJSY.MAC to section XCDSEC. [End of TCO 7.1043] TCO-number: 7.1044 Written-by: RASPUZZI Creation-date: 27-Aug-87 14:49:45 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSF Related-SPR: 21672 Problem: The RFTAD% JSYS does not handle argument blocks that are not in the local section when it is called from a different section outside of the argument block. Diagnosis: RFTAD% uses a BLT to user space when initializing the argument block. This BLT appears to be a mysterious NOP. Solution: Make RFTAD% call routine BLTUU to initialize the user's argument block. [End of TCO 7.1044] TCO-number: 7.1045 Written-by: RASPUZZI Creation-date: 28-Aug-87 09:08:45 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: IPCF Problem: ILMNRFs when attempting to delete a directory with archived files. Diagnosis: Slight oversight when moving IPCF into section 6. Mainly, CPTAB is in GTJFN and is not an immediate value. Solution: Make instruction that loads the byte from CPTAB run in section 1. [End of TCO 7.1045] TCO-number: 7.1046 Written-by: RASPUZZI Creation-date: 28-Aug-87 09:11:59 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSA Problem: NTINF% does not work with terminal line 0. Diagnosis: Fence post error that was accidently introduced with edit 7410. Solution: Make sure NTRRH1 checks for a 0 line too. [End of TCO 7.1046] TCO-number: 7.1047 Written-by: RASPUZZI Creation-date: 1-Sep-87 14:47:18 Edited-by: RASPUZZI Edit-date: 1-Sep-87 14:49:08 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: COMND Problem: Cannot define logical names with _^Vs in them. Diagnosis: Semi brain dead part of edit 7413. It checks the break mask when parsing a _^V and then, get this, decides to RETBAD a COMNX4! This put COMND in a funny state and all sorts of wild things come back to the user. Solution: Prevent COMND from going to Garkland by checking the function when a _^V is seen. If the function is .CMUSR or .CMDIR, then have COMND return to the user via its NOPARS macro. [End of TCO 7.1047] TCO-number: 7.1048 Written-by: LOMARTIRE Creation-date: 2-Sep-87 14:17:03 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: SCAMPI Related-TCO: 7.1037 Problem: Routines SC.ABF and SC.ALD now return "6,,return-address" as the return address. Diagnosis: The XMOVEI now gets section 6 as a result of the code move. Solution: Change the instructions to MOVE AC,[MSEC1,,return-address]. [End of TCO 7.1048] TCO-number: 7.1049 Written-by: GSCOTT Creation-date: 3-Sep-87 16:01:18 Edited-by: GSCOTT Edit-date: 3-Sep-87 16:26:47 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSA Problem: Bad terminal numbers passed via USAGE% JSYS to monitor can cause various BUGHLTs including ILPPT3. Also USAGE% JSYSes can fail with "default item not allowed" errors when the line number we are trying to supply is -1, indicating that the job is detached (this is seen when running LPTSPL under job 0). Diagnosis: No line number range checking is done and the code doesn't handle -1 properly for a line number. Solution: Add code in JSYSA to properly range check the line number and allow -1 to be specified. [End of TCO 7.1049] TCO-number: 7.1050 Written-by: MCCOLLUM Creation-date: 4-Sep-87 09:42:29 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: FREE Problem: PHYICE BUGNIFs when booting a system. Diagnosis: When MSCP units are coming online, PHYMSC calls routine ASGRES to obtain section 0 resident free space with which to build the unit's UDB. If ASGRES determines that there is not enough pages of resident free space locked into core, it calls routine GRORES to attempt to lock down additional pages. Unfortunately, MSCP disk come online at interrupt level and page faults cannot be tolerated. Therefore, GRORES fails, a PHYICE BUGINF is generated and all futher disks on the current controller are ignored (and possibly the controller itself). Solution: At system startup, routine RESLCK is called to lock down a fixed number of resident free space pages. During normal operation of the system, this many spare pages of free space always remain locked down. While this is sufficient at most times, more pages are required during system startup. Add an addtional entry point to RESLCK (RESLCI) that can be called to lock down twice as many resdident free space pages as normal. Call this entry point from RUNDD which runs at system startup. Later, CHKR will call the RESLCK entry point which will unlock any pages above and beyond the normal value. [End of TCO 7.1050] TCO-number: 7.1051 Written-by: GSCOTT Creation-date: 8-Sep-87 16:11:17 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: Yes Program: MONITOR Routines-affected: MEXEC JSYSA globs STG Problem: SYSTAT NODE * command may be slow in a cluster environment. Diagnosis: A large number of INFO% or GETJI% JSYSes are needed to find all of the active jobs on the cluster; each system must have jobs 1 to 511 checked for a SYSTAT NODE * command. In addition, the header printed by the SYSTAT command shows the number of user and operator jobs logged in. In order to get this information jobs 1 to 511 must be checked for each system that is displayed by the SYSTAT NODE * command. Solution: Add two new GETAB words to the SYSTAT table. The first word (ACTJOB) will contain the lowest,,highest job number in use on each system. This will be used by SYSTAT NODE * to only search a range of jobs on a system rather than searching jobs 1 to 511. The second GETAB word (WHOJOB) will contain operator,,user jobs logged into the system. This word can be directly used by the SYSTAT command rather than searching jobs 1 to 511 looking for which jobs are logged in. The addition of these two words should make the *** PERFORMANCE *** of the SYSTAT NODE * command bearable. [End of TCO 7.1051] TCO-number: 7.1052 Written-by: WADDINGTON Creation-date: 8-Sep-87 16:41:34 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: latsrv Problem: LHDSP macro can cause link to loop during monitor builds. Diagnosis: We changed various IFIWs to XADDR.s. LHDSP (and LSLDT) use .ORG to fill values into various tables. This smashes fixup chains of various types, which can cause LINK to loop, or code to be loaded incorrectly, or other bizarre symptoms. Solution: Get rid of LHDSP and LSLDT. [End of TCO 7.1052] TCO-number: 7.1053 Written-by: GSCOTT Creation-date: 10-Sep-87 07:21:35 Edited-by: GSCOTT Edit-date: 10-Sep-87 16:21:56 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: monsym Problem: Orion complains "Invalid type argument in message" from some component. Another symptom is that you never get "DECnet link messages" from the monitor. Diagnosis: MONSYM defines .QBDMX to be 5 then redefines it to be 4. DECnet link messages are type 5, so ORION thinks that they are invalid since .QBDMX ends up being 4. Solution: Remove brain damage in MONSYM, so that .QBDMX is 5. [End of TCO 7.1053] TCO-number: 7.1054 Written-by: GSCOTT Creation-date: 10-Sep-87 16:25:32 Edited-by: GSCOTT Edit-date: 10-Sep-87 19:52:14 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: phyx2 Problem: Many PHYX2 BUGCHKs don't supply enough information such as channel, DX20 number, and drive number. Some of the "Action:" areas could also be improved. Diagnosis: Several of the BUGCHKs weren't documented when the code was written, and since DX20s are so reliable we don't see too many DX2xxx BUGCHKs. Solution: Fix up PHYX2's BUG. macros as needed. [End of TCO 7.1054] TCO-number: 7.1055 Written-by: RASPUZZI Creation-date: 15-Sep-87 15:45:55 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: GLOBS PAGUTL DISC JSYSF Problem: CFCTNF BUGHLTs along with CFKBNS BUGHLTs. Diagnosis: When a user deletes a directory, the CFS resource block describing this directory is placed on the free list. However, cached OFNs may still exist. A new OFN may now grab this block and use it for a file open token. Now we have a scenario where the block is being used for 2 resources. The cached OFN may release this block and cause the keep bit to be cleared. When this happens, the CFS garbage collector returns the resource block back to CFS' free list. This is bad as some other OFN was using this block as a file open token. Solution: Currently, a directory cannot be destroyed if some file is still open in this directory. However, REMALC always called and, hence, always removes the resource block and returns it to CFS. While this is not incorrect, it is inherently dangerous. Have a new routine uncache all OFNs associated with this resource block before deleting the directory. This will help prevent CFS from allowing a block to be used for 2 different resources. [End of TCO 7.1055] TCO-number: 7.1056 Written-by: GSCOTT Creation-date: 15-Sep-87 16:12:05 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: TTYSRV DTESRV Problem: [DECSYSTEM-20 continued] messages come out on lines that are not connected to the console front end when it crashes. Diagnosis: Monitor (DTESRV) sends to all lines when system reloaded, it should just send the reloaded message to front end lines. Solution: Implement a -2 line number argument to TTMSG, which can only be specified by the monitor, which will do a sendall to only front end lines. Then have DTESRV use this new function. [End of TCO 7.1056] TCO-number: 7.1057 Written-by: GSCOTT Creation-date: 15-Sep-87 16:19:50 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: IPFREE Problem: Occasional failure to allocate blocks of the size requested. Diagnosis: Code which insures that block sizes are quantized is a bit too conservative. Solution: In IPFREE, at GETBB2, change to allow split blocks. [End of TCO 7.1057] TCO-number: 7.1058 Written-by: GSCOTT Creation-date: 17-Sep-87 14:35:36 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: IMPDV Problem: TCP/IP-20 does not support class B and C networks as the SPD states. Diagnosis: No code to support B and C networks. Solution: Add code to support class B and C networks to IMPDV. [End of TCO 7.1058] TCO-number: 7.1059 Written-by: GSCOTT Creation-date: 17-Sep-87 16:23:22 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSF DISC Problem: Byte counts of 34359738367, used by LIBOL files, get turned into byte counts of 1073741823 when these files are copied. Diagnosis: OFNLEN truncates the byte count to 30 bits so that it can put the byte size in bits 0-5. A real file of that length is impossible to create and 34359738367(36) is meaningful to COBOL. When the monitor truncates the byte count, COBOL programs get confused. Solution: Set OFNLEN to -1 if the byte count is 34359738367(36), and each time that OFNLEN is referenced and is -1, report the byte count and size to be 34359838367(36). [End of TCO 7.1059] TCO-number: 7.1060 Written-by: RASPUZZI Creation-date: 21-Sep-87 15:45:09 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: SCAMPI GLOBS SCAPAR Problem: A routine is needed to break data up and store it in a series of SCA buffers. Diagnosis: SCAMPI does not have a routine to do this. Solution: Be ambitious and write SC.BRK. [End of TCO 7.1060] TCO-number: 7.1062 Written-by: LOMARTIRE Creation-date: 22-Sep-87 16:45:11 Edit-checked: Yes Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: PAGUTL Related-SPR: 21567 Problem: FILBAT BUGINFs no longer issued after edit 7247. Diagnosis: RELOFN no longer returns the SPTH word in T2. Solution: Make RELOFN do so and document the behavior in the routine header. [End of TCO 7.1062] TCO-number: 7.1063 Written-by: MCCOLLUM Creation-date: 23-Sep-87 10:05:32 Edited-by: MCCOLLUM Edit-date: 29-Sep-87 18:32:25 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: Yes Program: MONITOR Routines-affected: PHYSIO JSYSA GTJFN JSYSF DISC DSKALC MEXEC MSTR PAGUTL STG Problem: Jobs hang accessing disk structures that are composed of at least one disk unit that is offline. Also, DDMPNR and CHKRNR BUGHLTs may be seen. Diagnosis: If a disk unit goes offline and jobs subsequently attempt to access the structure to which the unit belongs, they will become hung. This occurs because they attempt to do I/O to the disk and the monitor permits it, despite the fact that the monitor is aware that the disk is offline. The monitor routine DDOCFS, which is a part of the DDMP process, writes OFNs out to disk at the request of CFS. However, DDOCFS does not check the state of the structure before attempting this operation. If the structure is composed of one or more offline disks, then DDMP will hang. This can result in the above mentioned BUGHLTs. With the addition of CI disks in release 6.0 of the monitor, the observed frequency of these problems has increased dramatically. Solution: Implement the Offline Structures feature. This feature involves the addition of a new bit to the SDBSTS word of the Structure Data Block (SDB). This bit, MS%OFS, will be turned on and off by a new routine, UDBCHK, in PHYSIO. When UDBCHK notices that a disk unit has gone offline, it will time stamp word SDBTMR in the SDB of the structure associated with the unit. After a variable timeout period has passed, UDBCHK will turn on the MS%OFS bit in the SDB of the structure. This timeout interval can be set via the SMON% .SFOFS function. When the MS%OFS bit is lit, no new access to the disk will be allowed. This will be done by having JSYS's ACCES%, CHFDB%, CHKAC%, CRDIR%, DELDF%, DELF%, DELNF%, DIRST%, DSKOP%, GTDAL%, GTDIR%, GTFDB%, GTJFN%, OPENF%, RCDIR%, SIZEF%, and MSTR% check the MS%OFS bit and return the STRX10 (Structure is offline) error message to the process. This will prevent the process from attempting the I/O that will hang it up. UDBCHK will continue to monitor the state of the disk unit that caused MS%OFS to be turned on. When the disk unit goes online again, MS%OFS will be cleared in the SDB if all other units in the structure are also online. The following changes are made as a part of this project: 1) SETSPD will have two new commands added to it. The first is DISABLE OFFLINE-STRUCTURES. This command will turn this feature off. When this is done, behaviour will be identical to pre-release 7.0 systems. The second command is the ENABLE OFFLINE-STRUCTURES command. This command takes as an argument a timeout interval. Valid intervals are controlled by the SMON% .SFOFS function and are 1 to 900 seconds. Note that the default state of the Offline Structures feature is ENABLED with a timeout interval of 60 seconds. 2) The EXEC will have two new commands added to it. They are ^ESET options. The first is OFFLINE-STRUCTURES followed by a timeout interval or a confirmation. Confirmation implies use the default value of 60 seconds. The second is NO OFFLINE-STRUCTURES. This disables Offline Structures. Also, the INFORMATION SYSTEM-STATUS command will display the state of this feature along with the timeout interval if it is enabled and the INFORMATION STRUCTURE command will state that the structure is offline if MS%OFS is on in the results of an MSTR% .MSGSS function. 3) DDOCFS will call routine STROFL to insure the structure to which it is writing out an OFN is accessible. If it is not, no attempt will be made to write out the OFN. This should reduce the number of DDMPNR and NOCHKR BUGHLTs observed. [End of TCO 7.1063] TCO-number: 7.1064 Written-by: GSCOTT Creation-date: 24-Sep-87 14:31:35 Edited-by: GSCOTT Edit-date: 24-Sep-87 14:33:49 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: TAPE Problem: There are various problems when reading or writing ANSI labeled tapes. When the file name field was 17 characters without an embedded dot, IOX5 errors are returned by the monitor resulting in the message "?File data error on file ..." from the EXEC COPY command. Generation numbers ending in 00 were turned into generation numbers ending in 44. Wildcarded copies from tape stopped after the first file. Diagnosis: In the TAPE module, the monitor did not handle 17 character tape filenames properly; it only allowed the length of the file name and file type field to be 16 characters (it was accounting for the dot that seperates the file name and file type fields even if there was no file type specified). Also in TAPE, the monitor didn split the generation number into the two fields on tape (generation number and generation version) as specified in DEC STD 149 if the last two digits of the generation number were zero. In GTJFN, the VANISH routine, added by edit 7393, broke GNJFN% wildcarding on labeled tapes by trying to find the "old" file before stepping versions. Solution: Insert proper code in TAPE to account for the dot in the filespec only if the file type field is non-zero; insert code to properly handle generation numbers and generation versions as specified in DEC STD 149; make VANISH only do the extra lookup if the file is on disk. [End of TCO 7.1064] TCO-number: 7.1065 Written-by: RASPUZZI Creation-date: 25-Sep-87 13:27:58 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PAGUTL Problem: PLKRPQ BUGHLTs. Diagnosis: There seems to be a race between the swapper and the free space grower. The free space grower is attempting to lock a page that currently has a write in progress. As part of the PLKMOD fix, this is not really allowed. When the swapper completes, it notices that it had swapped out a locked page (because the free space grower hadn't finished diddling the page age) and that's when the PLKRPQ BUGHLT occurs. Solution: In MLKPG3, turn PI interrupts off while we diddle the page age in routine AGESET to prevent this race. When we have changed the age in AGESET, PIs will be turned back on and the swapper will then do the right thing. [End of TCO 7.1065] TCO-number: 7.1066 Written-by: RASPUZZI Creation-date: 29-Sep-87 14:23:21 Edited-by: RASPUZZI Edit-date: 3-Nov-87 13:48:59 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PAGUTL Problem: INVDIR no worky. Diagnosis: Engineer has been drinking too many wine coolers. He used the Tx registers to save important information and routine OC.UNC was trashing this information. Solution: Make INVDIR use the Qx registers for important things. Also make INVDIR preserve these registers. [End of TCO 7.1066] TCO-number: 7.1067 Written-by: SHREFFLER Creation-date: 29-Sep-87 15:25:25 Edited-by: RASPUZZI Edit-date: 29-Sep-87 15:26:42 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PAGUTL DISC GLOBS Problem: When renaming files between directories, sometimes the count of pages used will not match the actual usage. It may even go negative but it always corrects itself in time. Diagnosis: When a file is renamed it's OFN is cached. This cached OFN follows the file to the new directory. The problem is that this cached OFN has a pointer to the disk allocation block of the old directory. If this file is renamed to another directory and the cached OFN is still around the disk pages are subtracted from the alloc block for the original directory instead of the current directory. Solution: In the rename code, after releasing the OFN for the source file, uncache the OFN. [End of TCO 7.1067] TCO-number: 7.1069 Written-by: RASPUZZI Creation-date: 13-Oct-87 14:15:17 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: SCAMPI Problem: Routine SC.BR1 does not get data out of the user's address space from the correct section. Diagnosis: SC.BR1 was calling routine BLTUM to get the data. This routine always gets stuff from section 0 of user address space. Solution: Have SC.BR1 use routine BLTUM1. This is the correct routine as it uses the correct section number for the XBLT. [End of TCO 7.1069] TCO-number: 7.1070 Written-by: RASPUZZI Creation-date: 13-Oct-87 14:18:39 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: TTYSRV GLOBS STG Problem: Every second the scheduler is calling routine TTYCHK and this routine is wasting time doing absolutely nothing. Diagnosis: Sometime ago this routine was used to check DZ lines. Now, it is just a dinosaur. Solution: Remove TTYCHK from CLK2CL and remove the routine from TTYSRV. [End of TCO 7.1070] TCO-number: 7.1072 Written-by: LOMARTIRE Creation-date: 19-Oct-87 08:03:59 Edited-by: LOMARTIRE Edit-date: 24-May-88 16:18:37 Edit-checked: No Document: Yes TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: CFSSRV ENQ ENQPAR ENQSRV GLOBS MEXEC PHYKLP SCAPAR STG SYSFLG MONSYM Related-TCO: 7.1088 7.1096 7.1103 7.1111 7.1115 7.1138 7.1142 7.1145 7.1172 7.1179 7.1180 7.1286 7.1292 Problem: Under 6.1, the ENQ/DEQ functionality is limited to system-wide only. No cluster-wide lock coordination is possible. Diagnosis: No code was written to support this feature. Solution: Add cluster-wide ENQ/DEQ functionality. This new feature will not be on for the process by default. The process must do a new function to ENQ% (.ENECL) in order to have its ENQ%/DEQ%/ENQC% requests assume the cluster-wide implementation. [End of TCO 7.1072] TCO-number: 7.1074 Written-by: RASPUZZI Creation-date: 20-Oct-87 15:53:19 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CFSSRV Problem: CFS has no way to return a CFS node name to other places in the monitor without knowledge of CFS' host tables. Diagnosis: No code to do it. Solution: Add routine CFSNOD to CFSSRV to return a node name given a CI node number. [End of TCO 7.1074] TCO-number: 7.1075 Written-by: GSCOTT Creation-date: 20-Oct-87 16:06:30 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: ALL Problem: Monitor modules without updated Table Of Contents. Diagnosis: Sometimes the TOC wasn't updated. Solution: Update modules to have a new TOC using EMACS. [End of TCO 7.1075] TCO-number: 7.1076 Written-by: RASPUZZI Creation-date: 20-Oct-87 21:40:38 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: GLOBS CLUDGR CLUFRK CLUPAR MEXEC JSYSA MONSYM STG FUTILI Problem: Currently, TOPS-20 offers no easy way for systems that are clustered together to share information. The only way this can be done is if the user writes his own SYSAP using the unsupported SCS% JSYS. Also, this JSYS can only be used by users with WHEEL or OPERATOR privileges. Diagnosis: No code to do it. Solution: Add a new monitor SYSYAP, the CLUDGR SYSAP to aid users in gathering information within a cluster by using the INFO% JSYS. Also, add code in the EXEC to support cluster sendalls (which is part of the CLUDGR SYSAP because of new hooks in TTMSG%) and also add code in the EXEC to support cluster SYSTATs. [End of TCO 7.1076] TCO-number: 7.1078 Written-by: MCCOLLUM Creation-date: 21-Oct-87 16:53:50 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: GTJFN Problem: A JFNS% performed on a parse only JFN always shows the structure name, even if you are currently connected to that structure. Diagnosis: TCO 7.1063 changed GTJFN% to not save the unique code of a structure in the JFN block if the user specified parse only. However, JFNS% compares this field against the connected structure and outputs the structure name if they differ. Since it is always zero for a parse only, the structure name is always displayed. Solution: Leave the unique code in the JFN block for parse-only JFNs. [End of TCO 7.1078] TCO-number: 7.1079 Written-by: MCCOLLUM Creation-date: 21-Oct-87 16:58:55 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MSTR MONSYM Problem: STROFF BUGCHKs. Diagnosis: The STROFF BUGCHK was moved from routine STROFL to CKSTOF when the offline structures feature was implemented. STROFL was changed to call CKSTOF. CKSTOF is also called by a variety of JSYSes to verify structure numbers. However, these JSYSes fetch the structure number from user supplied arguments. If the user gives a bad argument, CKSTOF will issue a STROFF BUGCHK. Solution: Move the STROFF BUGCHK back to routine STROFL. In CKSTOF, if the structure number is invalid, return the STRX11 error code. [End of TCO 7.1079] TCO-number: 7.1080 Written-by: RASPUZZI Creation-date: 23-Oct-87 12:44:14 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MONSYM STG CLUDGR Problem: System PID table is not large enough. There is no room in it to add NEBULA's PID. Diagnosis: SPIDTB not big enough and no mnemonic in MONSYM for it. Solution: Add .SPNEB for system PID and .SDNEB for private GALAXY's to MONSYM. Also make SPIDTB big enough to hold these. [End of TCO 7.1080] TCO-number: 7.1081 Written-by: GSCOTT Creation-date: 23-Oct-87 14:47:35 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: APRSRV DSKALC GLOBS MEXEC MSTR PAGUTL PHYSIO SETSPD MONSYM DOB Problem: There is an occasional need to take a dump and continue the system. Diagnosis: No code to implement continuable dumps. Solution: Add DOB% JSYS and code in new module DOB to take continuable dumps. Change modules APRSRV, DSKALC, GLOBS, MEXEC, MSTR, PAGUTL, PHYSIO to support DOB. Change MONSYM and SETSPD utilites to support DOB. [End of TCO 7.1081] TCO-number: 7.1082 Written-by: WADDINGTON Creation-date: 23-Oct-87 15:30:53 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: monsym Problem: No symbols for new LATOP% JSYS Functions. Diagnosis: Add them to MONSYM.MAC Solution: Add them to MONSYM.MAC [End of TCO 7.1082] TCO-number: 7.1086 Written-by: RASPUZZI Creation-date: 27-Oct-87 15:47:39 Edited-by: RASPUZZI Edit-date: 3-Nov-87 13:44:54 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC Problem: When a system in the cluster is being shutdown, there is no way for other systems in the cluster to know this. Diagnosis: No code to do it. Solution: Now that cluster sendalls exist, make routine THSYS use this feature to let other machines in the cluster know when a shutdown is going to occur. This message is only broadcast throughout the cluster at the 60 minute, 5 minute and 1 minute marks. Also, change the "System going down" message to say which system is going down. [End of TCO 7.1086] TCO-number: 7.1087 Written-by: RASPUZZI Creation-date: 27-Oct-87 15:55:40 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUFRK Problem: INFO% does not return job class, share or useage information when a system has class scheduling turned on. Diagnosis: Wrong test used on AVALON to see if class scheduling is turned on. Solution: Change SKIPGE to SKIPL to see if AVALON implicates class scheduling. [End of TCO 7.1087] TCO-number: 7.1088 Written-by: LOMARTIRE Creation-date: 27-Oct-87 15:56:59 Edit-checked: No Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQ Related-TCO: 7.1072 Problem: FSPSCC BUGHLTs, FSPPRE BUGHLTs, FSPBPC BUGHLTs, FSPBND BUGHLTs and jobs hung in the ENQ related JSYSes. Diagnosis: The code which manages the EQLBLT in routine LOKREL was wrong and was corrupting the hash table and the various ENQ blocks. Solution: Fix the code in LOKREL to do the removal from EQLBLT correctly. [End of TCO 7.1088] TCO-number: 7.1090 Written-by: RASPUZZI Creation-date: 28-Oct-87 10:14:08 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUPAR CLUDGR CLUFRK Problem: Cluster SYSTAT appears to be somewhat slow on a loaded cluster. This may be an offshoot of the fact that CLUDGR sends over the maximum number of words per SCA message, when, in fact, it could send over fewer words. Diagnosis: When CLUDGR calls SC.SMG, it always sends C%MUDA words across the CI. This means that if there is only 1 data word to be sent, CLUDGR will send far too many words across the wire. Solution: Make the send routines in CLUDGR and CLUFRK figure out how many words to send over and send no more than necessary. This will help increase performance but cluster SYSTAT may still be slower than a local one. [End of TCO 7.1090] TCO-number: 7.1094 Written-by: RASPUZZI Creation-date: 28-Oct-87 14:08:00 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUFRK CLUDGR Problem: INFO% return a remote failure of "Structure is not mounted" when, the structure is mounted on the remote system. Diagnosis: Routine CLFMSR that handles the remote execution of the MSTR% is not getting the structure name from the right place in the data area. Solution: Teach CLFMSR to get structure name from correct spot. Also change part of INFMSR to return structure physical ID to user address space correctly. [End of TCO 7.1094] TCO-number: 7.1095 Written-by: MCCOLLUM Creation-date: 28-Oct-87 14:13:06 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: GTJFN Problem: WILD% returns "Structure is dismounted" when a JFN against another which is parse-only. Diagnosis: The recent changes to the JFN block when a parse-only JFN is created have caused CHKJFN to return an incorrect error message to WILD%. Solution: Set the up JFN block for parse-only JFNs as it was before. [End of TCO 7.1095] TCO-number: 7.1096 Written-by: LOMARTIRE Creation-date: 28-Oct-87 14:13:51 Edit-checked: No Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQSRV GLOBS CFSSRV Related-TCO: 7.1072 Problem: SKDPF1 and PITRAP BUGHLTs from routine ENQRSV. Diagnosis: This routine was being called at CI interrupt level or scheduler level and was trying to set a bit in the Lock-Blocks. However, these reside in swappable free space and so could be swapped out. Solution: Change the algorithm for reacting to a cluster state change. Add flag EQCSTF which will be set to -1 by CFSRSV. The ENQ Resched fork will notice that EQCSTF is non-zero and will call ENQRSV to set bit EN.SDO in each Lock-Block on the system. Since the ENQ Resched fork runs at process context, this will avoid the BUGHLTs. [End of TCO 7.1096] TCO-number: 7.1098 Written-by: MCCOLLUM Creation-date: 29-Oct-87 15:03:38 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: STG Problem: The Internet host table is too big for TOPS-20. Diagnosis: The table size needs to be increased. Solution: Increase NHOSTS to 7001, decimal. [End of TCO 7.1098] TCO-number: 7.1102 Written-by: RASPUZZI Creation-date: 29-Oct-87 16:03:30 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUDGR Problem: System going down messages appear on nodes that have cluster sendalls disabled. Diagnosis: <SYSTEM>INFO runs under job 0 and that's when the system going down messages are broadcast. CHKPID in CLUDGR assumes that this process is part of GALAXY and allows the send to happen. Solution: Remove the check to see if the PID belongs to <SYSTEM>INFO. If this process ever needs to do an INFO% JSYS or cluster TTMSG%, then this will have to be changed. Also note that any GALAXY process running under job 0 will cause the system going down messages to go through to nodes that have cluster sendalls disabled. It is not advisable to run GALAXY jobs under job 0 though. [End of TCO 7.1102] TCO-number: 7.1103 Written-by: LOMARTIRE Creation-date: 29-Oct-87 16:11:28 Edit-checked: No Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQSRV Related-TCO: 7.1072 Problem: Cluster-wide ENQC% fails for locks described by a user code. Diagnosis: Routine HDRFIP is not correctly putting the user code into the VRQA so it is not getting set to the other systems. This prevents the other systems from finding the lock and reporting the status. Solution: Make HDRFIP get the user code from the user's block and not from P2. [End of TCO 7.1103] TCO-number: 7.1104 Written-by: RASPUZZI Creation-date: 29-Oct-87 16:15:08 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUFRK Problem: 1) The CLUDGR fork cannot execute the MTOPR% JSYS correctly. Diagnosis: The MTOPR% JSYS assumes that when MTOPR% is used from any non-zero section that T1 will contain a OWGB instead of a universal device designator. Solution: Write gross code that makes the CLUDGR fork execute the MTOPR% in section 0. [End of TCO 7.1104] TCO-number: 7.1105 Written-by: RASPUZZI Creation-date: 29-Oct-87 16:18:00 Edited-by: RASPUZZI Edit-date: 3-Nov-87 13:49:34 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUFRK Problem: The CLUDGR fork still does not return class scheduling information correctly. Diagnosis: Brain dead TCO 7.1087 uses AVALON instead of CLASSF when checking for class scheduling. Solution: Redo the TCO once again and make the code do the right thing. [End of TCO 7.1105] TCO-number: 7.1107 Written-by: GSCOTT Creation-date: 2-Nov-87 11:34:23 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC Problem: Jobs occasionally get hung in a DOBE% at LOGDOB. Diagnosis: Characters in output buffer but terminal line is disconnected, causing the job to hang forever in the DOBE. Solution: Since this DOBE is really meant for the use of something besides .NULIO in LOGDES, we really don't want to do the DOBE unless T1 matches LOGDES. [End of TCO 7.1107] TCO-number: 7.1108 Written-by: GSCOTT Creation-date: 3-Nov-87 14:42:53 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DOB Problem: PSINSKs among other things when using DOB. Diagnosis: When the DOB lock is locked we go CSKED. We can't get a page failure while CSKED, and the symbol table and bug list storage is swappable. Solution: When getting the DOB lock go ECSKED, then when releasing it go CSKED first. This allows us to page fault inside of the DOB JSYS. [End of TCO 7.1108] TCO-number: 7.1111 Written-by: LOMARTIRE Creation-date: 4-Nov-87 12:31:46 Edit-checked: No Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: STG Related-TCO: 7.1072 Problem: ENQ% and ENQC% JSYSes hang. The ENQ reply fork on the remote systems hangs in a loop and does not return any replies. Diagnosis: Routine LOCLOK uses the hash index as a starting point for the search of the lock block described in the VRQA. Unfortunately, HSHLEN is defined as 3*NJOBS so it can be different on various systems depending upon the number of jobs. So, the hash index is invalid and the ENQ reply fork loops while in the search. Solution: Make HSHLEN be 605 (which is 389 decimal which is prime) on each system so that the hash index is valid on all nodes. This means that HSHLEN must be defined to be the same value on every 7.0 system in the cluster! [End of TCO 7.1111] TCO-number: 7.1112 Written-by: MCCOLLUM Creation-date: 4-Nov-87 13:58:37 Edited-by: MCCOLLUM Edit-date: 4-Nov-87 14:00:05 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DISC DSKALC FLINI FUTILI GLOBS JSYSA JSYSF MEXEC MSTR PHYKLP PHYSIO PROLOG STG SYSERR Problem: In a clustered environment, there is now way for multiple system to share the same username data base. Diagnosis: The username data base comes always from the boot structure. Since this structure must be a massbus device, the shared MSCP disks cannot be used. Solution: Add a new startup routine, FNDLGS, to DSKALC to find a separate structure to use as a Login Structure. This routine will be called after the CI-20 has been started and the MSCP disks are available to the system. This routine will try to find a complete structure that has the local system's CPU number saved in the home block words HOMLS1, HOMLS2, HOMLS3, or HOMLS4. If found, the structure will be manually mounted and the STRTAB index for it will be saved in global cell LGSIDX. Routines needing to get to the username data base can use the contents of this cell as the structure number for the Login Structure. If no separate Login Structure is found, this cell will contain zero, indicating that the Login Structure is the same as the Boot Structure. This TCO involves utility changes to CHECKD, SETSPD, ACTGEN, and GALAXY to support it. [End of TCO 7.1112] TCO-number: 7.1114 Written-by: RASPUZZI Creation-date: 4-Nov-87 16:25:15 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUFRK Problem: ILMNRFs and KLPHOGs out when using the INFO% JSYS. Diagnosis: Remote system sending back incorrect information for the .INLNS function. Specifically, the number of words sent back is off by a large count and the system thinks more SCA buffers came in to the local system than really did. This causes PHYKLP's port queues to get so fouled up, you don't want to even consider looking at the dump. Solution: Change routine CLFLNS to correctly calculate the number of words to send back to the remote system. [End of TCO 7.1114] TCO-number: 7.1115 Written-by: LOMARTIRE Creation-date: 5-Nov-87 14:56:07 Edited-by: LOMARTIRE Edit-date: 5-Nov-87 15:35:10 Edit-checked: No Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQ ENQSRV GLOBS CFSSRV Related-TCO: 7.1072 Problem: Various hangs in ENQ%/ENQC%. Incorrect vote replies being returned. Diagnosis: There are two problems. First, the VRQA is used by both the interrupt level routine EQMSG and by the process level routine EQATOP. EQMSG assembles an incoming Request Message Set into the VRQA to be processed by the Vote Responder routines. EQATOP is the fork which calls the appropriate Vote Responder routine to send the reply to the vote request in the VRQA. There is no way now for the Responder routines to guarantee that the VRQA stays stable through cluster state changes. So, this could cause problems during the formation of the reply. The second problem is that the setting of bit VPRTY is now in a process context routine. This bit is set when the cluster changes state and will allow a fork in EVWAIT to wake up and try the vote again. But, with the bit setting in process context, it can potential be blocked from occurring thus hanging the job in ENQ%/ENQC% waiting for a vote reply to return which will never come. Solution: Create a new data structure called the Vote Answer Area (VANA). When EQATOP is run to send a reply, it will copy the VRQA into the VANA. All the Vote Responder routines will work with the VANA in deciding the correct response. Also, the setting of bit VPRTY in the VRPA must be moved into CFSRSV so that state changes are correctly handled. Finally, add word EQLBCT to keep the count of the number of Lock-Blocks on the system (they will be linked on the EQLBLT). [End of TCO 7.1115] TCO-number: 7.1116 Written-by: GSCOTT Creation-date: 6-Nov-87 10:49:27 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PHYM78 Problem: TM78's KDB active bit is set with no active UDBs. Support of TU79s makes some BUG messages inappropriate since they talk about TU78s. Kontroller reset failures are not logged or noticed. Diagnosis: Somewhere the TM78's active bit is not getting cleared. Nobody thought you could have a follow on to the TU78. Kontroller resets usually work. Solution: If the TM78 isn't active and the active bit is on, reset it and BUGCHK. Change textual messages from "TU78" to "tape drive". BUGCHK if the kontroller reset didn't work. [End of TCO 7.1116] TCO-number: 7.1117 Written-by: GSCOTT Creation-date: 6-Nov-87 10:54:14 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: GLOBS MEXEC PHYSIO STG Problem: Monitor too big and too slow. Diagnosis: Historic code exists in MEXEC and other places that appears to be leftover from pre PHYSIO days (e.g. Tenex). Solution: Remove code in MEXEC (CHKR) and PHYSIO (DRMINI) that shouldn't be there. Some time in the future more of the drum stuff could be ripped out but this will help **PERFORMANCE**, so lets do this part now. [End of TCO 7.1117] TCO-number: 7.1119 Written-by: GSCOTT Creation-date: 6-Nov-87 14:38:50 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC Problem: Monitor too big. Diagnosis: Use of literal ASCIZ/CRLF/ when CRLF is in STG. Solution: Teach engineer to use CRLF in STG rather than liter at LOGCR. [End of TCO 7.1119] TCO-number: 7.1120 Written-by: WADDINGTON Creation-date: 6-Nov-87 16:55:51 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: latsrv Problem: TOPS-20 can't talk to reverse-lat devices, such as LN03's. Diagnosis: TOPS-20 uses the LAT 5.0 protocol. Reverse-lat requires the LAT 5.1 protocol. Solution: Upgrade TOPS-20 to LAT 5.1 protocol. [End of TCO 7.1120] TCO-number: 7.1121 Written-by: RASPUZZI Creation-date: 10-Nov-87 14:52:29 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUDGR Problem: None seen but CLDISC and CLUCONs are annoying. Diagnosis: Why see these if they are annoying? Solution: Put CLDISC and CLUCON under CIBUGX. Also, change CLORBF to a BUGINF and put it under CIBUGX. Now watch things start to break in massive quantities. [End of TCO 7.1121] TCO-number: 7.1122 Written-by: MCCOLLUM Creation-date: 10-Nov-87 15:55:19 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MSTR Problem: MOUNTR refuses mount requests with "insufficient MOUNTR resources" after a few forced dismounts of disk structures. Diagnosis: When MSTR% is called to dismount a structure, it sends an IPCF message to MOUNTR informing it of the structure dismount. MOUNTR scans its account blocks looking for users of that structure. When the blocks are located, accounting is done and the blocks are removed. However, MSTR% is not providing the correct structure unique code in this block due to a problem with the placement of a TRVAR in the JSYS code. This causes MOUNTR to not find the matching account blocks and therefore they are never released. If this scenario is repeated often enough, MOUNTR runs out of space for the account blocks. Solution: Fix MSTR% to send the correct structure unique code to MOUNTR when a structure dismoutn is done. [End of TCO 7.1122] TCO-number: 7.1125 Written-by: GSCOTT Creation-date: 10-Nov-87 16:57:38 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DOB Problem: DOB% JSYS function .DBTIM doesn't work and returns DOBX08. Diagnosis: Several problems; one is that the correct offset into the argument block for .DBTIM was not used, the second is that the argument block was not checked using the right symbol; the third is that the wrong AC was being used to range check the timeout value. Solution: Replace 4 of 8 lines in DBTIM routine in DOB. [End of TCO 7.1125] TCO-number: 7.1126 Written-by: MCCOLLUM Creation-date: 10-Nov-87 17:05:17 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSA Problem: The TMON% .SFLGS will not accurately reflect the state of the Login Structure feature if someone disables Login Structures after the system has been booted. Diagnosis: The TMON% .SFLGS function simply returns the value of the Login Structure flag. Once the system has been booted, the Login Structure flag has no meaning. Therefore, if the state of the feature is changed through SMON%, the TMON% will nop longer be reliable. Solution: Instead of returning the value of the LGSFLG, return the negative of the LGSIDX. LGSIDX will contain 1 if the system is running with a Login Structure, and 0 otherwise. TMON% show whether or not a Login Structure is in use. [End of TCO 7.1126] TCO-number: 7.1127 Written-by: MCCOLLUM Creation-date: 11-Nov-87 10:24:20 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: SCSJSY Problem: OPERATOR jobs cannot execute the SCS% JSYS. New GALAXY components will not run under user OPERATOR. Diagnosis: The SCS% JSYS requires WHEEL, MAINTENANCE , or ARPANET-WIZARD privileges. When the OPERATOR account is created at structure initialization time, it is build with only OPERATOR privileges. Solution: Change SCS% to allow users with OPERATOR privileges enabled to execute the JSYS. [End of TCO 7.1127] TCO-number: 7.1129 Written-by: MCCOLLUM Creation-date: 11-Nov-87 10:59:20 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MSTR Problem: ILMNRF BUGHLTs when doing an MSTR% .MSRUS function. Diagnosis: .MSRUS and .MSRNU share code. When .MSRUS is done, the address of the UDB for the given unit is not saved in the MSTUDB STKVAR. If the unit goes offline during the function processing, an ILMNRF BUGHLT can result if the call to MSTRHB fails. .MSRNU saves the UDB address and therfore does not have this problem. Solution: Move the instruction that saves the UDB address for .MSRNU to common code so that it is saved in all cases. [End of TCO 7.1129] TCO-number: 7.1130 Written-by: MCCOLLUM Creation-date: 11-Nov-87 12:06:12 Edited-by: MCCOLLUM Edit-date: 11-Nov-87 12:15:51 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: SCLINK SCJSYS Problem: ILMNRF BUGHLTs out of routine NETSQX in SCJSYS. Diagnosis: The PTSTS field of the port block contains invalid data. NETSQX uses this address to index into a dispatch table. The bad value in PTSTS causes NETSQX to dispatch into a random monitor location. PTSTS is set up by copying the SAAST of the SAB. This cell is being corrupted earlier by a call to routine SCTNSF when the channel is in a transition state. The port block and the SAB exist, but the link block has been cleaned up. SCTNSF calls routine SCTNIS which detects that the link block no longer exists. This routine then call SCTNIX which expects the pointer to the link block to be negative if there is no link block. But it is zero and SCTNIX proceeds to store a bogus value in SAAST (actually the address of the SJB) which subsequently is copied into PTSTS. Solution: If a zero value is passed to SCTNIX as the value of the link block, just leave SAAST as zero. Fix up the dispatch table in NETSQX to return a device or data error when the PTSTS value is zero. [End of TCO 7.1130] TCO-number: 7.1132 Written-by: RASPUZZI Creation-date: 12-Nov-87 15:06:41 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUFRK Problem: Cluster SYSTAT still a little on the slow side when systems in the cluster get loaded. Diagnosis: What do you want from a KL? It's doing its best. However, it would help to give the CLUDGR fork a priority boost since it really is not a hog. Solution: Light JP%SYS in JOBBIT for CLUDGR fork's PSB and make it run about the same priority as CHKR. [End of TCO 7.1132] TCO-number: 7.1133 Written-by: GSCOTT Creation-date: 12-Nov-87 16:07:24 Edited-by: GSCOTT Edit-date: 12-Nov-87 16:08:25 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DOB Problem: Can you say LOKODR? Can you say CSKBUG? Diagnosis: We should have used the LOCK and UNLOCK macros rather than LOKK and UNLOKK. Solution: Use LOCK and UNLOCK macros making things a lot smaller and quicker not to bug free. [End of TCO 7.1133] TCO-number: 7.1134 Written-by: GSCOTT Creation-date: 12-Nov-87 16:10:31 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PHYKLP Problem: PHYKLP doesn't know about the new IPALOD created by that extra fine program IPAGEN. Diagnosis: Code not cleaned up from 6.1 days. Solution: (1) Check microcode version when the microcode is loaded into resident storage by PHYKLP. (2) When read counters is done to get the version make sure it reports the same version as the one we think we loaded. If not then BUGINF. (3) Clean up comments relating to IPALOD and IPADMP. [End of TCO 7.1134] TCO-number: 7.1138 Written-by: LOMARTIRE Creation-date: 16-Nov-87 14:22:14 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: STG GLOBS CFSSRV ENQPAR ENQSRV Related-TCO: 7.1072 Problem: CFSSRV won't build anymore! Diagnosis: MACRO complains of "not enough core" if CFSSRV searchs ENQPAR. Solution: Reorganize code so that CFSSRV no longer has to search ENQPAR. [End of TCO 7.1138] TCO-number: 7.1141 Written-by: RASPUZZI Creation-date: 18-Nov-87 10:53:46 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUDGR Problem: ILMNRFs when ACJ denies the use of the INFO% JSYS. Diagnosis: The GTOKM macro can only be used in section 1. INFO% is in section 6. It attempts to execute the GTOKM macro through the use of another macro, S1XCT. Unfortunately, the address to return to upon ACJ denial is given as an address in section 6. Since the denial happened out of section 1, the JRST only looks at the low 18 bits and winds up in the weeds. Solution: In the GTOKM macro, change it so that when ACJ denies INFO% requests it will jump back to section 6 and ITERR back to the user. [End of TCO 7.1141] TCO-number: 7.1142 Written-by: LOMARTIRE Creation-date: 18-Nov-87 13:07:14 Edit-checked: No Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQSRV Related-TCO: 7.1072 Problem: There are still jobs which hang in ENQ%/DEQ%/ENQC%. Diagnosis: After a cluster state change, all Lock-Blocks on the system are rescheduled. Also, any vote in progress is marked for a retry. Finally, any answer which is being made is aborted. It is possible for a retried vote to arrive on a node, then have that node notice that the cluster changed state and abort the answering process. Therefore, the sending node will be hung waiting for a reply which will never arrive. Solution: Do not abort answering attempts after a cluster state change. Should the CALL to SC.SMG fail in routine ANSWER, check the error code. If it was SCSNEC (not enough credit), retry the answer attempt. If it should fail for any other reason, or should it succeed, check the unique code of the vote we are replying to (held in the VANA) against the one that EQMSG put in the left half of EQFKFL. If they are the same, then we just replied to the correct vote request so clear the left half of EQFKFL. Otherwise, some other vote request has arrived and must be processed. Do not clear EQFKFL in this case. [End of TCO 7.1142] TCO-number: 7.1144 Written-by: GSCOTT Creation-date: 19-Nov-87 15:57:09 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DOB Problem: Engineer confused about structure offline bit. Diagnosis: MS%OFL is returned by MSTR; MS%OFS is in SDBSTS. Solution: Use MS%OFS in CKSTR in DOB. [End of TCO 7.1144] TCO-number: 7.1145 Written-by: LOMARTIRE Creation-date: 20-Nov-87 07:08:39 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQSRV Related-TCO: 7.1072 Problem: The ENQ Answer fork and ENQ Resched fork are not as responsive as they could be. Diagnosis: When they are created, they are not given any special consideration like the other special system forks. Solution: Make both forks a special system fork (JP%SYS) and insure that they fall no lower that scheduler queue 1. [End of TCO 7.1145] TCO-number: 7.1147 Written-by: MCCOLLUM Creation-date: 24-Nov-87 15:33:11 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: EXECSE STG Problem: The offline structures default timeout interval is too large. Diagnosis: Is it now 60 seconds, a lower value is more reasonable. Solution: Modify STG to make the default value 5 seconds. Also change the EXEC to set it to five seconds when no value is supplied in the ^ESET OFFLINE-STRUCTURES command. [End of TCO 7.1147] TCO-number: 7.1148 Written-by: GSCOTT Creation-date: 24-Nov-87 16:57:40 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PHYSIO Problem: When a disk drive is dual ported to two RH20s per system the monitor makes the primary UDB the second path it finds to the drive, and the secondary UDB the second path it finds to the drive. This lowers performance when several drives are dual ported in this manner since (1) the monitor always does seeks on the primary UDB and (2) the monitor always tries the transfer on the primary UDB first. Diagnosis: Monitor should allocate the primary UDBs across the channel in a fair fashion. Routine PHYDUA in PHYSIO is stupid. Solution: Allocate odd numbered drives to odd numbered RH20 and even numbered drives to even numbered RH20. This should make the primary/secondary split across RH20s more efficient. Change PHYDUA to allocate drives in this manner. [End of TCO 7.1148] TCO-number: 7.1149 Written-by: WADDINGTON Creation-date: 25-Nov-87 13:44:37 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: FILMSC Problem: CLOSF causes reverse LAT tty's to go away. Diagnosis: In TTYCLZ, we call TTHNGU to lower DTR. TTHNGU does a TDCALL to LTHNGU which terminates the LAT connection, and invokes the cleanup code to blow away the TTY. Solution: In TTYCLZ, only call TTHNGU for RSX20F lines. [End of TCO 7.1149] TCO-number: 7.1150 Written-by: WADDINGTON Creation-date: 25-Nov-87 15:36:27 Edited-by: LOMARTIRE Edit-date: 13-Dec-87 15:27:46 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Problem: Reverse LAT requests that are queued by the server and then canceled, are not cleaned up properly. They remain in the server's queue. Diagnosis: In LATSRV, routine FMCAN is called to format a command message. The code in FMCAN is just plain wrong. Solution: Rewrite FMCAN to properly format a cancel command message. [End of TCO 7.1150] TCO-number: 7.1151 Written-by: RASPUZZI Creation-date: 1-Dec-87 14:54:44 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: COMND Problem: STI% in COMND% is causing partial recognition to beep after completing a keyword or switch. Diagnosis: The STI% appears to be extraneous. Solution: Either NOP the STI% or remove it totally. Since we must preserve 0/1 space, I suggest removing it. [End of TCO 7.1151] TCO-number: 7.1155 Written-by: WADDINGTON Creation-date: 3-Dec-87 14:48:50 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: latsrv Problem: 1) LATNSC Type 2 and 12 2) Bogus PRSTS value if pending request is queued by the server 3) Missing Test of Master/Slave Bit in MSGNXT Diagnosis: 1) MSGSID got transposed into MSGDID when we merged TOPS-10/TOPS-20 LATSRVs. 2) Developer needs a vacation. Did a LOAD AC,LITERAL instead of MOVX AC,LITERAL. 3) The LAT 5.1 spec says that all messages from Servers will have the Master/Slave bit lit. This is not true. In particular, the bit is clear on Response Information Messages and Status Messages. Consequently, Reverse LAT wouldn't work at all because we ignored all messages which had the bit clear. We commented out the test until we better understood the problem. Solution: 1) Change MSGDID to MSGSID at MTTSTP+5 lines. 2) Change LOAD T1,.LAQUE to MOVX T1,.LAQUE near the bottom of RDSTA. 3) Now that we understand the problem, restore the test in NXTMSG, but allow Response Information messages and Status messages in. [End of TCO 7.1155] TCO-number: 7.1156 Written-by: RASPUZZI Creation-date: 3-Dec-87 14:54:12 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PHYKLP Problem: Can't build debug=1 monitor. Diagnosis: Misspelled word in PHYKLP under debug conditional. Solution: Win spelling bee. [End of TCO 7.1156] TCO-number: 7.1158 Written-by: GSCOTT Creation-date: 10-Dec-87 14:37:29 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: APRSRV Problem: BUGNAM doesn't get set right if you have a problem calling SEBCPY. Diagnosis: If there is some problem queueing the ERROR.SYS block then we don't store the BUGNAM. Solution: Store the BUGNAM early on when doing a BUGHLT. This is the only case when the BUGNAM doesn't appear to be set right - it could be a problem with CHKs and INFs, but this hasn't been seen yet. Also APRSRV lacks a TOC and needs to be repaginated (I hate those messy page breaks). [End of TCO 7.1158] TCO-number: 7.1159 Written-by: RASPUZZI Creation-date: 10-Dec-87 14:44:12 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: COMND Problem: When user types an invisible keyword followed by an escape, the monitor complains that his command input buffer is too small. Diagnosis: Partial recognition is deficient in this case. It is taking nulls as the suffix (there is no suffix on invisible or norec keyowrds) and depositing them infinitely in the user's command buffer until space is exhausted. Solution: Teach COMND% to ignore norec keywords. [End of TCO 7.1159] TCO-number: 7.1162 Written-by: RASPUZZI Creation-date: 15-Dec-87 13:06:00 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: STG Problem: MONPDL BUGHLTs when using a debug=1 monitor. Diagnosis: BUGPDL is not big enough when debugging code is on. Solution: Increase BUGPDL in debug monitors but leave it the same in regular monitors. [End of TCO 7.1162] TCO-number: 7.1164 Written-by: WADDINGTON Creation-date: 16-Dec-87 09:21:51 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV Problem: LAT maximum active circuits can be set higher than maximum allocated circuits, resulting in a log of LATNSCs. Also, user's cannot customize the default max active circuits. MINACB is a bad name for default Maximum Allocated circuits blocks. Diagnosis: The LCP>SET MAX CIRCUITS 50 command changes the maximum active circuits (HNMAC) to the desired value, but does not change the maximum allocated circuit blocks (HNMXC). The LATOP% JSYS calls to either set or clear a parameter is done by executing an instruction out of the TBEXEC table, but unfortunately this table just moves the new maximum value into HNMAC. Solution: Change PARTB. for the set max circuits to jump to SETMAC, instead of just storing the new value into HNMAC. Add routine SETMAC which got dropped when we merged TOPS-10 and TOPS-20 sources. (We dropped it because it never got called in 6.1 or 7.03) Also, change symbol MINACB to MAXACB, add symbol MAXACC (for Max Active Circuits), and move both symbols to STG so customers can customize them if desired. [End of TCO 7.1164] TCO-number: 7.1165 Written-by: RASPUZZI Creation-date: 17-Dec-87 16:26:29 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSA Problem: Cannot set SNOOP% breakpoints in non 0/1 sections. This is a handicap for attempting to time monitor routines for performance purposes. Diagnosis: SNOOP% uses a full 30 bit address but when it stores the breakpoint in the breakpoint block, it steps on the section number with a "JRST" OP code. Solution: Add another word to the breakpoint block SNPADR. This will contain the 30 bit address of the breakpoint (to be used during insertion and removal of the breakpoint). It is still OK to use 18 bit addressing for JRSTs in the breakpoint block because it will be section relative and the monitor will be in the appropriate section already. [End of TCO 7.1165] TCO-number: 7.1167 Written-by: SHREFFLER Creation-date: 23-Dec-87 09:27:17 Edited-by: RASPUZZI Edit-date: 23-Dec-87 09:28:54 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PAGEM GLOBS PAGUTL Problem: SWOFCT BUGCHKs Diagnosis: On long files when the SMAP% Monitor call unmaps a file section it should check to see if the share count on the OFN is going to zero and if so it should cache it. It doesn't do this. As a result uncached unshared index blocks can be left lying around and the system may decide to swap them out if memory gets tight. This causes SWOFCTs Solution: Have SMAP% call the OFN caching routine when it unmaps a file section and sees the share count going to zero. [End of TCO 7.1167] TCO-number: 7.1168 Written-by: RASPUZZI Creation-date: 28-Dec-87 11:16:03 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: GTJFN MACSYM Problem: ASTJFN BUGHLTs. Diagnosis: GTJFN is working with a non-existant file specification that contains a wild card character. This is a no-no and can occur when a user types "COPY TTY: FOO.BAR;*". Solution: Make GTJFN smart enough to realize that ";*" is an undefined file attribute. We don't do VMS file specifications here. [End of TCO 7.1168] TCO-number: 7.1169 Written-by: RASPUZZI Creation-date: 31-Dec-87 10:27:17 Edited-by: RASPUZZI Edit-date: 31-Dec-87 10:32:10 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: TAPE Problem: GDSTS% and SDSTS% JSYS leave JFN locked when returning with an error out of routines MTGTSX and MTSTSX. Diagnosis: Edit 7446 added an ITERR to prevent crashes. However, when this path is taken, the JFN is never unlocked in either routine. Solution: Release file lock by calling UNLCKF before taking error return out of MTGTSX and MTSTSX. [End of TCO 7.1169] TCO-number: 7.1172 Written-by: LOMARTIRE Creation-date: 6-Jan-88 13:54:12 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: Yes Program: Monitor Routines-affected: ENQ ENQSRV Related-TCO: 7.1072 Problem: ILLUUO and ILFPTE BUGHLTs when an ENQ% is done on systems which do not have a CI installed (or NOKLIP is set). Also, some strange results can occur when ENQs and DEQs are done. Diagnosis: Location EQLBLT is initialized in routine EQSINI when SCA starts up the SYSAPs. But, if there is no CI, then SCA is never started and EQLBLT is never initialized. This causes problems in later ENQ%/DEQ% attempts. Solution: Move the initialization of EQLBLT and EQLBCT into routine ENQINI. This routine is called at system startup. [End of TCO 7.1172] TCO-number: 7.1174 Written-by: WADDINGTON Creation-date: 13-Jan-88 09:01:50 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: latsrv Problem: SKDPF1 while trying to print to LAT printers. Diagnosis: Using T1 instead of SB. Not setting up T1 prior to call to PRWAKE. CALL PRWAKE should be CALLRET PRWAKE. Solution: Don't let the developer try to write code, cope with Thanksgiving, Christmas, an impending baby, and start field test, all at the same time. [End of TCO 7.1174] TCO-number: 7.1175 Written-by: MCCOLLUM Creation-date: 13-Jan-88 09:51:02 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: fsprem Problem: A routine is needed in FREE.MAC to determine the amount of free space left in a given extended space swappable pool. Diagnosis: As above. Solution: Add routine FSPREM. This routine takes as an argument the swappable free space pool number (an index into FSPTAB) and returns the amount of free space remaining in the pool in words. [End of TCO 7.1175] TCO-number: 7.1177 Written-by: MCCOLLUM Creation-date: 14-Jan-88 14:44:56 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DTIND6 Problem: DN60 front ends cannot be initialized. Diagnosis: Routine DTIND6 in DTESRV is called when the BOOT% .BTIPR function is used. This function attempts to intialize DN60 protocol on a given DTE. This routine attempts to acquire 1000 (octal) words from the section 0/1 resident free space general pool. However, the intialization is done after system startup and the request fails. This is because the new DOB feature of the monitor acquires 1000 words of the same free space at system startup. DOB does not release these words and there are not enough left in the pool to satisfy DTESRV's subsequent request. Solution: Increase the size of the section 0/1 resident free space general pool by 1000 (octal) words. This is defined by the .RESGQ parameter in STG. [End of TCO 7.1177] TCO-number: 7.1179 Written-by: LOMARTIRE Creation-date: 14-Jan-88 15:33:49 Edited-by: LOMARTIRE Edit-date: 15-Jan-88 15:49:59 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQ ENQSRV Related-TCO: 7.1072 Problem: When a vote is received for a cluster-wide non-file lock, the system will respond even if it only has the lock in non-cluster-wide mode. This can cause the cluster-wide request to fail incorrectly. Also, vote requests are being sent out from interface routines EQPOOL and EQLKSD for non-cluster-wide locks. This is simply extra work. Diagnosis: The EN.CLL bit must be used to represent a cluster-wide lock and not just a cluster-wide file lock. Solution: Make EN.CLL represent a cluster-wide lock in the Lock-Block if EN.CLL is lit in any of the associated Q-Blocks. Have LOCLOK return failure if the Lock-Block is found on the system but EN.CLL is not lit. Have EQPOOL and EQLKSD check EN.CLL before attempting to send the vote request. Routine EQRSTS should always vote if the process has cluster-wide capabilities even if the Lock-Block does not have EN.CLL set. [End of TCO 7.1179] TCO-number: 7.1180 Written-by: LOMARTIRE Creation-date: 14-Jan-88 16:52:53 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQ GLOBS STG Related-TCO: 7.1072 Problem: FSPOUT BUGINFs during ENQ usage from the ENQ free pool. Diagnosis: The Lock-Block caching feature will make a large number of Lock-Blocks remain on the system instead of being returned to the pool when the last Q-Block is dequeued. The variable ENQSPC keeps track of the amount of ENQ free pool space used and this amount is used to determine if garbage collection of long-term lock should occur. It appears that ENQSPC can drift from the real allocation and so garbage collection is incorrectly postponed. So, when an attempt to get free space is made, the pool is exhausted, an FSPOUT results, and a forced garbage collection is done. This will release the cached (long-term) locks and provide enough space to continue. In the dumps seen so far, ENQSPC is much greater than the real allocation and there are many Lock-Blocks on EQLBLT. Solution: Remove ENQSPC and use the new FSPREM routine to determine the amount of space left in the ENQ free pool. [End of TCO 7.1180] TCO-number: 7.1181 Written-by: RASPUZZI Creation-date: 15-Jan-88 10:54:37 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSF Problem: The RFTAD% JSYS initializes one too many words in the user's argument block. Diagnosis: TCO 7.1044 has a fencepost error in it. Solution: Adjust the count before calling BLTUU out of the RFTAD% JSYS so it initializes the correct number of words. [End of TCO 7.1181] TCO-number: 7.1183 Written-by: GSCOTT Creation-date: 15-Jan-88 14:41:22 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DATIME Problem: Monitor keeps running out of section 0/1 space. Diagnosis: Monitor needs code moved to section 6. Solution: Move DATIME into section 6. This gets a little over 2100 (octal) words out of SWAPCD and into XSWAPCD. [End of TCO 7.1183] TCO-number: 7.1185 Written-by: MCCOLLUM Creation-date: 15-Jan-88 16:17:02 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JNTMAN Problem: The available section 0/1 address space in the monitor is getting low. Diagnosis: Solution: Move the NODE% JSYS all all other code in module JNTMAN to XCDSEC. This gets back about 1400 (octal) words of section 0/1 space, mostly from NRCOD. [End of TCO 7.1185] TCO-number: 7.1186 Written-by: WADDINGTON Creation-date: 15-Jan-88 20:50:29 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV Problem: LATOP% Function .LASHC returns local job number. This is not overly useful. Diagnosis: LATOP% Function .LARHC calls routine LARHC. LARHC calls CKRHC. CKRHC stashes a copy of JOBNO in the PR block. JOBNO is the local job number. Solution: In CKRHC, use GBLJNO instead of JOBNO. In JBTHC, use GBLJNO instead of JOBNO when determining whether to terminate all requests for a particular job. [End of TCO 7.1186] TCO-number: 7.1190 Written-by: RASPUZZI Creation-date: 20-Jan-88 11:28:09 Edited-by: RASPUZZI Edit-date: 20-Jan-88 11:29:12 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CFSSRV CFSPAR CFSUSR Problem: It is no longer possible to obtain a CREF of CFSSRV. Diagnosis: CFSSRV is just so big right now that MACRO can't handle all the symbols that are generated. Solution: Like A T _& T, split up CFSSRV. Put the user related stuff in CFSUSR and put the JSYS code into SWAPCD. Leave the SCA and vote stuff in module CFSSRV. Also, remove the MSKSTRs and DEFSTRs from CFSSRV and put them in CFSPAR. [End of TCO 7.1190] TCO-number: 7.1191 Written-by: MCCOLLUM Creation-date: 21-Jan-88 10:26:18 Edited-by: MCCOLLUM Edit-date: 21-Jan-88 10:31:09 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: ARPSND Problem: None observed, but code looks wrong. Diagnosis: Replies to Address Resolution Protocol (ARP) requests are sent to the Ethernet broadcast address rather then the sender's hardware address. While this behavior does not specifically violate RFC 826 pertaining to ARP, it does cause extra packets to be processed by all TCP/IP nodes on the Ethernet for no gain. Solution: When replying to an ARP request, direct the reply only to the sender's hardware address. ARP requests will continue to be sent to the broadcast address as specified in the RFC. [End of TCO 7.1191] TCO-number: 7.1192 Written-by: EVANS Creation-date: 21-Jan-88 11:19:20 Edited-by: EVANS Edit-date: 21-Jan-88 14:33:33 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSF Related-SPR: 21695 Related-QAR: 5 Problem: ILMNRF's during creation of long directory names. Diagnosis: Edit 7416 to JSYSF didn't allow enough words for 39-character device, 39-character directory. Long names overflow the space allotted in JSB free space. Solution: Increase the size of UDGRNM to 2*MAXLW+1 (17 words). In addition, change the SOUT% which copies the name to terminate on 2*MAXLC+5 characters instead of null. [End of TCO 7.1192] TCO-number: 7.1193 Written-by: MCCOLLUM Creation-date: 21-Jan-88 14:46:18 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PHYMSC Problem: Too many MSCCDF BUGINFs. Diagnosis: These aren't worth seeing. When they occur there are other cluster problems that make themselves obvious. Solution: Save some paper. Put MSCCDF under CIBUGX. [End of TCO 7.1193] TCO-number: 7.1194 Written-by: GSCOTT Creation-date: 22-Jan-88 10:52:56 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: POSTLD Problem: POSTLD doesn't correct PSECTs properly when ENCOD overlays NPVAR. Diagnosis: POSTLD just checks to see that ENCODZ is less than <XCDSEC,,<NPVAR-1>> but doesn't do anything about NPVAR's start when this happens. Solution: Write code to fix NPVAR's origion when ENCOD overlaps it. [End of TCO 7.1194] TCO-number: 7.1196 Written-by: WADDINGTON Creation-date: 25-Jan-88 09:41:50 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: latsrv Problem: Global job number gets stored in device tables for reverse LAT ttys. Need local job number... Diagnosis: TCO 7.1186 is wrong. Solution: Rip out TCO 7.1186. Instead of storing the global job number in the PR block, call LCL2GL to translate the local job number to a global job number when needed. [End of TCO 7.1196] TCO-number: 7.1197 Written-by: GSCOTT Creation-date: 25-Jan-88 23:46:32 Edited-by: GSCOTT Edit-date: 26-Jan-88 00:49:31 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PAGEM Problem: MDDT JSYS seems to hang when swappable monitor not locked down. Diagnosis: Section XCDSEC is mapped roughly where NRCOD is mapped in section 0/1, and the rest of the section is mapped to section 0/1 through MMAP. When code was moved to XCDSEC recently by someone with good intentions, he caused XRCOD+XNCOD+ENCOD to exceed the size of NRCOD, something apparently which was unthinkable in 6.1. When MDDT is run it just happens to go to section ENCOD, which can be at a higher address than the end of NRCOD in the AN-MONDCN monitor. When routing FPTAXC is called to resolve the page fault (assuming that SWPMLK wasn't called sometime after GOTSWM to lock down this page of ENCOD), the address of the first page in ENCOD appears to be after the end of NRCOD, so FPTAXC thinks that this is time for a section 0/1 page fault and dispatches to take care of that. We then start looping in the code trying to resolve this page fault in section 6 by trying to substitute a page in section 0/1 space that isn't mapped. Solution: Change FPTAXC in PAGEM to address range of XRCODP (first page in section 6 address space) and ENCODL (last page in section 6 address space) rather than using NRCODP and NRCODL. A recent edit to POSTLD makes it move NPVAR up so that it won't collide with ENCOD or NRCOD. [End of TCO 7.1197] TCO-number: 7.1200 Written-by: GSCOTT Creation-date: 26-Jan-88 16:28:20 Edited-by: GSCOTT Edit-date: 26-Jan-88 16:35:01 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC JSYSA JSYSM Problem: MEXEC needs to go on a code diet. JSYSA is also too big. Not only that, but we are low on section 0/1 space. Diagnosis: MEXEC and JSYSA are two of the oldest modules and therefore are two of the biggest modules. We always need more section 0/1 space, don't we? Solution: Split MEXEC into MEXEC and JSYSM, moving all JSYS code to JSYSM. This makes MEXEC about half its previous size. Also move CRJOB, LOGIN, and USAGE code from JSYSA to JSYSM. Move GJINF, TIME, RUNTM, GTRPI, GTDAL, SYSGT, GETAB, SETSN, SETNM, GETNM, GETJI, SWTCH, LITES, USRIO, PEEK, and XPEEK to XCDSEC. This gets the free section 0/1 space up to 10 pages in the AN-MONDCN monitor. [End of TCO 7.1200] TCO-number: 7.1201 Written-by: MCCOLLUM Creation-date: 27-Jan-88 09:53:32 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV Problem: Throughput on LAT terminal lines is not great. Diagnosis: The LAT slot size on TOPS-20 is only 40 bytes. Throughput would be improved with any increase in this size. Solution: Due to address space limitations, the number of terminal buffers available to LAT lines cannot be increased. However, the current size of 123 (decimal) bytes would allow for slightly larger slot sizes. Increase the slot size to 60 (decimal) bytes. [End of TCO 7.1201] TCO-number: 7.1202 Written-by: RASPUZZI Creation-date: 28-Jan-88 14:45:55 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: Yes Program: MONITOR Routines-affected: PHYKLP Problem: PHYKLP does not gather statistics for how long it spends servicing interrupts. Diagnosis: No code to do it. Solution: Make PHYKLP do something similar to PHYKNI when it starts the processing of an interrupt. Mainly, save the time the interrupt service starts, and then calculate the time spent in interrupt service. This will be added to a location (TOTKLP) that will hold the total amount of time PHYKLP spent servicing interrupts. [End of TCO 7.1202] TCO-number: 7.1203 Written-by: MCCOLLUM Creation-date: 28-Jan-88 15:12:06 Edited-by: MCCOLLUM Edit-date: 28-Jan-88 15:13:08 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: STG Problem: DCN=0 monitors crash at startup. Diagnosis: When DCN=0, the NODE% JSYS is defined in module STG in section 0/1. JSTAB assumes that NODE% is in section XCDSEC. Solution: Move the NODE% JSYS that is under DCN=0 conditional in STG to section XCDSEC. [End of TCO 7.1203] TCO-number: 7.1204 Written-by: RASPUZZI Creation-date: 2-Feb-88 14:51:52 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DIRECT Problem: ILMNRF BUGHLTs when doing partial recognition on directories. Diagnosis: Partial recognition on directories is not implemented yet routine STRFND thought it would go ahead and do it without asking us. Solution: Don't make STRFND do partial recognition work if we are recognizing a directory specification. [End of TCO 7.1204] TCO-number: 7.1205 Written-by: RASPUZZI Creation-date: 2-Feb-88 14:54:27 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUDGR Problem: Weird, wonderful and wild BUGHLTs when using the INFO% JSYS. Diagnosis: Some of the JSYS have been moved into section 6. These include XPEEK%, GETAB%, GETJI% and SYSGT%. INFO% does an IMCALL to each one of these if the local node has been specified in the INFO% argument block. Solution: Have INFO% do IMCALLs to these JSYS but make sure that the IMCALL uses the correct section. [End of TCO 7.1205] TCO-number: 7.1206 Written-by: RASPUZZI Creation-date: 2-Feb-88 15:00:19 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MSTR IPCF Problem: MOUNTR is complaining about not writing accounting records for mount requests. Diagnosis: Edit 7508 introduced a change to make the monitor pass the global job number to MOUNTR. Unfortunately, this edit added a TRVAR variable in 2 routines in MSTR. This is bad as routine DISMES in IPCF depends on the order of the TRVAR. Solution: Shoot person who wrote DISMES. Since that is not feasible, nor is it legal, then we will just fix the TRVAR in MSTR and make a little note for our future siblings to be careful when treading in this code. [End of TCO 7.1206] TCO-number: 7.1207 Written-by: WADDINGTON Creation-date: 2-Feb-88 15:11:50 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV STG MONSYM Problem: New LATOP% function .LARHC doesn't talk to the ACJ. Diagnosis: It's in the spec but we didn't have time to implement it prior to FT1. Solution: Add it now, before we freeze for FT2... [End of TCO 7.1207] TCO-number: 7.1208 Written-by: RASPUZZI Creation-date: 2-Feb-88 16:02:31 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: STG Problem: Remote possiblity of a SKDPF1 BUGHLT in working set management code. Diagnosis: When the monitor adjusts a user's working set, it may decide to swap out a set of pages. This decision is made at scheduler level. When the swap out occurs, the monitor then calls CLROFN which calls DASOFN. DASOFN then references the ALOC tables. These tables are not resident are could possibly be swapped out at the time. Page faults in the scheduler are illegal. Solution: Move the ALOC tables from non-resident storage to resident storage. [End of TCO 7.1208] TCO-number: 7.1209 Written-by: MCCOLLUM Creation-date: 3-Feb-88 13:25:30 Edited-by: MCCOLLUM Edit-date: 3-Feb-88 13:41:21 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LOKWAI Problem: System load average and scheduler overhead increase when an incoming CTERM job in XOFFed. Diagnosis: The scheduler test LOKWAI is used to cause a job to block when a running CTERM circuit is XOFFed. A bad compare instruction in this routine causes it to return +2 (success) and the fork is woken up. Because the fork is not ready to do output, it blocks again immediately. This cycle is repeated until the CTERM circuit is XONed. Solution: Fix the compare instruction in the LOKWAI scheduler test. If the circuit's state goes from "run" to some other state, then the fork should wake up so it can be killed off. [End of TCO 7.1209] TCO-number: 7.1210 Written-by: GSCOTT Creation-date: 3-Feb-88 15:52:34 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MANY Problem: Many BUGs not set "normally not dumpable". Diagnosis: We only recently decided which ones were "normally not dumpable" for DOB. Solution: Change the ones we decided, and only the ones we decided, to have the "normally not dumpable" bit set in their BUG. macros. Many monitor modules will be changed for this. [End of TCO 7.1210] TCO-number: 7.1212 Written-by: RASPUZZI Creation-date: 4-Feb-88 13:23:24 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUDGR Problem: ILMNRF BUGHLTs out of INFO% JSYS. Diagnosis: When a remote system is out of free space and cannot create a request block for the request that it just got, it does not correctly report the INFX08 error to the local system. The local system now thinks that it just got a good result back from the remote system and things go downhill from there. The local system uses wrong free space size and address and tries to do something with this bogus address. Solution: When an INFO% request has arrived and the system cannot create a request block (due to insufficient resources), make sure that the INFX08 error is returned but also insure that CL%ERR is lit in the flag word of the response. This will make the requesting system handle the error case correctly. [End of TCO 7.1212] TCO-number: 7.1213 Written-by: GSCOTT Creation-date: 4-Feb-88 13:54:18 Edited-by: GSCOTT Edit-date: 5-Feb-88 11:07:56 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: Yes Program: MONITOR Routines-affected: PHYH2 Problem: Class A massbus errors cause ILLGO BUGHLTs. Diagnosis: If the RH20 has a stacked transfer pending and if the massbus device experiences a Class A Massbus error, the monitor does not clear the stacked transfer before starting a retry of the primary transfer. This leads to a hung transfer. When the monitor clears the RH20 to start the transfer over it does not clear the secondary transfer registers, and at the end of the hung transfer code an extra interrupt is generated. This interrupt causes the monitor to believe that one of the transfers it just restarted has completed. The monitor then throws away the IORB for that first transfer and stacks yet another transfer (if one is available). The origional first transfer then completes properly. Then the monitor checks out the RH20 logout area with the IORB it thinks belongs to the transfer (the second transfer started after the hung code tried to reset the RH20), and since things smell funny the monitor gives an ILLGO BUGHLT. A detailed investigation was performed by Nat Gillespie, System Support Group, UKCSC Basingstoke, UK. Solution: Change at RH2HNG from MOVEI T1,0 to MOVEI T1,1, in order to do a complete reset of the RH20 (including the secondary registers), which prevents the extra interrupt from being generated. This then prevents the ILLGO BUGHLT. [End of TCO 7.1213] TCO-number: 7.1214 Written-by: GSCOTT Creation-date: 4-Feb-88 14:03:30 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PHYM78 Problem: TM8FKRs aren't too informative. Diagnosis: Only the channel and unit numbers are printed. Solution: Print the useful parts of the additional information in the BUGCHK. [End of TCO 7.1214] TCO-number: 7.1215 Written-by: GSCOTT Creation-date: 4-Feb-88 16:50:24 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DOB Problem: DOB can trash multipack structures. Also it may not be able to read them properly (if super index block and/or index block for page 0 are not on the first unit of a multipack structure). Diagnosis: Code looks wrong and generally stinks. Solution: Fix several places where multipack structures aren't treated right: in GETPGS don't forget to do TLZ DSKMSK; remove routine NEXTXB and put 3 instructions inline in GETPGS instead; fix strange AC usage in MAPXB; move disk address computation from CCWSET to UDBSET. Also comment out code that searches for a specific generation on disk, and add small routine that prints out the usual "? DOB Error:" string. Output message if we are aborting the dump. [End of TCO 7.1215] TCO-number: 7.1216 Written-by: GSCOTT Creation-date: 5-Feb-88 11:03:02 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: POSTLD Problem: Many BUGs are set not normally dumpable. It is a royal pain to figure out which ones are. Diagnosis: It would be logical to expect this type of information in BUGSTRINGS.TXT Solution: Add code to POSTLD in DOBUGS to output a "*" column 1 of BUGSTRINGS.TXT if the BUG is normally not dumpable and a space in column 1 if the BUG is normally dumpable. [End of TCO 7.1216] TCO-number: 7.1218 Written-by: GSCOTT Creation-date: 9-Feb-88 10:48:37 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: ALL Problem: Copyrights are out of date. Diagnosis: Its up to me to fix them. Solution: Use EMACS/DIRED and update to corporate copyrights. [End of TCO 7.1218] TCO-number: 7.1219 Written-by: GSCOTT Creation-date: 9-Feb-88 14:19:51 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DTESRV Problem: Front end dumps end up in PS:<SYSTEM> rather than BS:<SYSTEM>. Diagnosis: DTESRV copies them to <SYSTEM>nDUMP11.BIN. It should use BS:. Solution: Make DTESRV know about even more BS than it knows now. [End of TCO 7.1219] TCO-number: 7.1220 Written-by: RASPUZZI Creation-date: 11-Feb-88 15:24:46 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CFSUSR GLOBS STG Problem: CFSUSR allocates 128 slots of global jobs at system startup. It is not necessary to have so many unused slots in use on the local system. It also prevents one from adding a fifth or even sixth KL to the cluster. Diagnosis: Routine CFGTJB does this allocation at system startup. It is not necessary to grab so many slots. Solution: At system initialization, only have CFGTJB allocate 64 jobs for the local system. When all 64 are in use, then routine JBGET1 will call CFSGJB (an alternate entry point to CFGTJB) and this will attempt to get 32 more global job slots for the system. 64 seemed like a good number because this will allow one to hook up as many as 8 KL CPUs in the cluster and then all global job slots will be in use. [End of TCO 7.1220] TCO-number: 7.1221 Written-by: RASPUZZI Creation-date: 11-Feb-88 15:32:54 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PHYKNI KNILDR Problem: Every time a new edit of the NI microcode appears, you have to hack KNILDR after loading the CRAM addresses to make KNILDR fool the monitor into believing this is the right NI ucode. Diagnosis: This may come as a surprise, but no one ever changed KNILDR or the monitor to use the right bit mask for the major version of the NI ucode. Also, in PHYKNI, the monitor will not load and start the KLNI if the wrong "version" of the NI ucode is loaded. Solution: Make PHYKNI check the edit level of the NI ucode. If it is not up to 171 for 7.0, then BUGCHK but still start the KLNI. For 6.1, we will BUGCHK if the ucode is not at least 167. [End of TCO 7.1221] TCO-number: 7.1222 Written-by: RASPUZZI Creation-date: 11-Feb-88 15:37:50 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: STG Problem: Can't build a DEBUG=1 monitor. Diagnosis: Not enough section 0/1 space. Solution: Decrease JSB free space by a few pages and drop a few SNOOP pages when building a DEBUG=1 monitor. [End of TCO 7.1222] TCO-number: 7.1223 Written-by: RASPUZZI Creation-date: 11-Feb-88 20:12:30 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: RSXSRV Problem: Can't compile RSXSRV. Diagnosis: When the copyright was updated, the exclamation point that was used to delimit a big comment field was deleted. This caused the next exclamation point seen to be used to delimit the comment field. Naturally, this now shifted the field and ate some code so there were various undefines and oddities. Solution: Put back the ! that was removed. [End of TCO 7.1223] TCO-number: 7.1225 Written-by: GSCOTT Creation-date: 16-Feb-88 13:45:04 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DISC Problem: Files can go from 34359738367(36) to 1073741823(36). Diagnosis: TCO 7.1059 incomplete - didn't change code at UPDLEN. Solution: Add code in DISC to do the same OFNLEN/-1 hack for UPDLEN that was implemented in TCO 7.1059. [End of TCO 7.1225] TCO-number: 7.1226 Written-by: RASPUZZI Creation-date: 16-Feb-88 14:22:55 Edited-by: RASPUZZI Edit-date: 16-Feb-88 14:23:52 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC BOOT Related-SPR: 21571 Problem: TOPS-20 does not ask "Why reload?" or "Run CHECKD?" when a new monitor is booted when RSX20F version 15-50 is on the front end. Diagnosis: RSX20F version 15-50 will now give the KL the date and time (if it has it set) when the KL asks for it. RSX20F never used to do this. When TOPS-20 gets the date and time from the front end, it assumes that it is being reloaded because of a BUGHLT. This may not be the case if a new monitor is being loaded. Solution: The solution of this problem is two fold. First, have boot tell TOPS-20 if it is being reloaded because of a BUGHLT. Second, have TOPS-20 decide what to do about the startup questions based on the information provided by BOOT. BOOT will put a positive number in location BOOTFL if the system is being manually restarted and it will put a negative value in BOOTFL if the system is being restarted because of a BUGHLT. For backwards compatibility, the monitor will do what it used to do if BOOTFL contains a 0. In this case, an older boot will be running. [End of TCO 7.1226] TCO-number: 7.1227 Written-by: WADDINGTON Creation-date: 16-Feb-88 14:35:09 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV Problem: LATNSC information not always helpful. In particular, the Ethernet Address is sometimes trashed. Diagnosis: Edit 7369 added the Ethernet Address to the additional data in the LATNSC BUGINF. Unfortunately, we used the Ethernet Address from the Circuit Block. In some cases, a LATNSC can occur when there is no Circuit Block. Consequently, the Ethernet address is garbage in these cases. Solution: Get the Ethernet Address from the Receive Buffer. This should always be correct. In addition add a little more info to the description of t BUGINF. [End of TCO 7.1227] TCO-number: 7.1230 Written-by: GSCOTT Creation-date: 18-Feb-88 15:13:28 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC JSYSM TTYSRV GLOBS Problem: CTY doesn't stay as an LA120 after system starts up - first <CRLF> on CTY turns it back into SYSTEM-DEFAULT. Diagnosis: Code in MEXEC and JSYSM and TTYSRV doesn't special case the CTY properly. Solution: Fix code to (1) make the CTY an LA120 in INICTY (TTYSRV), create new routine BLINKS in JSYSM that does the usual TLINK JSYS when we are done with a terminal to break links and advice, modify 4 places in MEXEC and JSYSM to call BLINKS, and have the BLINKS routine set the CTY to be an LA120 after doing the TLINK. In this way, whenever a job is not logged in on the CTY it will be an LA120. [End of TCO 7.1230] TCO-number: 7.1231 Written-by: RASPUZZI Creation-date: 18-Feb-88 15:19:32 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: EXECIN EXECSE GLOBS STG JSYSA JSYSF MONSYM SETSPD Problem: The monitor does not restrict the length of a password that a user can set on a directory. Diagnosis: No intelligent code to do it. Solution: Add code to CRDIR% to make sure that passwords pass the minimum length criterion. This is done before the password is encrypted so if CRDIR% is given an already been encrypted password, it will not count its size. Add a new SMON% function to set the minimum password length and add a corresponding TMON% function to read this length. Add a new _^ESET MINIMUM-PASSWORD-LENGTH command to the EXEC so that this can set easily. Also, make a similar ENABLE/DISABLE command in SETSPD so minimum lengths can be set at system startup. [End of TCO 7.1231] TCO-number: 7.1237 Written-by: GSCOTT Creation-date: 22-Feb-88 10:56:49 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DOB Problem: Bugs set not normally dumpable aren't. Diagnosis: DB%NND check lost from DOB.MAC. Solution: Reinsert check for DB%NND in DOB.MAC, start checking REDITs a little more carefully. [End of TCO 7.1237] TCO-number: 7.1238 Written-by: GSCOTT Creation-date: 22-Feb-88 18:09:03 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MANy Problem: BUGs under DEBUG monitor dump when they shouldn't. Diagnosis: Many BUGs not looked at because the BUGSTRINGS used was from a DEBUG=0 monitor. Solution: Set a few more BUGs not normally dumpable. [End of TCO 7.1238] TCO-number: 7.1240 Written-by: MCCOLLUM Creation-date: 23-Feb-88 15:42:44 Edited-by: MCCOLLUM Edit-date: 23-Feb-88 15:44:18 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV Problem: COMMMS BUGHLTs. Diagnosis: When LATSRV hands a buffer to PHYKNI to be transmitted, bit DLL.FL is turned on in the transmit buffer header. When the trasnmit completes, DLL.FL is turned off and the buffer is placed on the unacknowledged queue. If no acknowledgement is received from LATSRV before the circuit timer expires, routine XUNAKQ is called to retransmit the buffers. This routine neglects to turn on DLL.FL in the buffer header. If for any reason the circuit is stopped while the buffer is in PHYKNI, LATSRV will release the free space associated with all buffers on the unacknowledegd queue that have this bit turned off. Since XUNAKQ never lights this bit, this is true for all buffers, even the ones currently in PHYKNI. When the retransmit subsequently completes, LATSRV attempts to release the free space again and a COMMMS BUGHLT results. Solution: Turn on bit DLL.FL in XUNAKQ when the buffer is handed over to PHYKNI via routine DLLUNI. If DLLUNI fails to queue the buffer, turn DLL.FL off. If the circuit goes away while the buffer is in PHYKNI, the free space will be released when the transmit completes. [End of TCO 7.1240] TCO-number: 7.1241 Written-by: RASPUZZI Creation-date: 25-Feb-88 08:29:15 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: FILMSC Related-QAR: 105 Problem: ILMNRF BUGHLTs. Diagnosis: In FILMSC, we call routine TTYSCN and if this routine fails then we JRST to TTYCL1. TTYCL1 expects to have T1 set up with an index into the device tables and we have not done so. Solution: At TTYCL1, load T1 with the index into the device tables with the item saved in STKVAR variable location TTYCLX. [End of TCO 7.1241] TCO-number: 7.1243 Written-by: GSCOTT Creation-date: 26-Feb-88 10:17:55 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MNETDV Problem: We are all real sick and tired of waiting for HSTINI to carefully polish and store all of the hosts in a 572K character HOSTS.TXT file when we are debugging the monitor. Diagnosis: Cretinous code doesn't know that DBUGSW is set up for debugging. Solution: Make MNETDV know about DBUGSW and if it is greater than 1 try to load SYSTEM:HOSTS.DEBUG rather than SYSTEM:HOSTS.TXT. [End of TCO 7.1243] TCO-number: 7.1244 Written-by: GSCOTT Creation-date: 28-Feb-88 17:33:08 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DOB Problem: DOB has problems after TCO 7.1215 is installed. Diagnosis: This TCO busted GETPGS. This discovered a problem with IORBER. It causes page faults since it is damaged. Also, IORBs showing an error after DOB has continued the system cause KPALVH. Also, an error in the middle of the dump detected by PHYSIO doesn't abort the dump since the error bits are not checked in SAVMEM's loop. Solution: Fix GETPGS, IORBER. Use a 64 page chunk size to avoid overruns too. [End of TCO 7.1244] TCO-number: 7.1245 Written-by: GSCOTT Creation-date: 1-Mar-88 14:45:42 Edited-by: GSCOTT Edit-date: 1-Mar-88 15:02:27 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: T20-AN LN2070 Problem: Monitor builds too slow for development. Or maybe I'm just not patient enough in my old age. Diagnosis: T20-AN70 always builds AN-MONBIG, AN-MONMAX, and AN-MONDCN, however only AN-MONDCN is ever used for development. LN2070 always builds 2060-MONBIG and 2060-MONMAX, however only 2060-MONMAX is ever used for development. You can't just use labels ARPDCN or MONMAX since they don't compile the sources and append the REL files. Developers commonly copy the CTL files and remove the building of the monitors that are not used. It seems reasonable to standardize this. Solution: Add new tag MONDEV which compiles the sources, appends REL files, then builds just AN-MONDCN or 2060-MONMAX. [End of TCO 7.1245] TCO-number: 7.1246 Written-by: GSCOTT Creation-date: 1-Mar-88 15:04:21 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DSKALC Problem: The ASAASG, DEAUNA, and ASGBPG BUGs are set normally dumpable. Diagnosis: They shouldn't be. Solution: Make them not normally dumpable. [End of TCO 7.1246] TCO-number: 7.1247 Written-by: RASPUZZI Creation-date: 1-Mar-88 15:22:29 Edited-by: RASPUZZI Edit-date: 4-Mar-88 10:18:20 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUDGR Problem: XBLTAL BUGHLTs when using INFO% JSYS under mysterious circumstances. Diagnosis: This is a dandy. Basically, 2 forks on the remote system got the same unique CLUDGR ID (I guess they weren't unique then, were they?). This confused the remote system because it received 2 packets both indicating that they were 1 of 1. Nasty things then happen when the CLUDGR fork attempts to transfer the information in the working data page of the CLUDGR fork. Solution: Well, I wrote this code and I haven't got a clue as to how this could happen. The CLUID is obtained while a process is NOSKED and the obtainment of CLUID uses a AOS Q3,CLUID instruction which is not interruptable either. So, what do we do? DEBUG code has been added to CLUDGR to be defensive about a process using an old unique code. The only gotcha about this code is that a system may crash with a CLUFUD BUGHLT when the CLUID word wraps around 18 bits. [End of TCO 7.1247] TCO-number: 7.1251 Written-by: RASPUZZI Creation-date: 3-Mar-88 13:25:48 Edited-by: RASPUZZI Edit-date: 4-Mar-88 10:19:43 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: D36PAR Problem: NTMSQF BUGCHKs during system startup. Diagnosis: It appears that the signal queue is not large enough for a machine that is used as a designated router. Solution: Increase the potential size of the signal queue. [End of TCO 7.1251] TCO-number: 7.1252 Written-by: RASPUZZI Creation-date: 3-Mar-88 14:53:44 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: FREE Problem: RELRNG BUGCHKs. Diagnosis: RELRNG are serious enough such that they should be BUGHLTs but only for the duration of field test. They must be changed back to BUGCHKs when 7.0 is shipped to the SDC. Solution: Change RELRNG to BUGHLT for now. [End of TCO 7.1252] TCO-number: 7.1253 Written-by: GSCOTT Creation-date: 6-Mar-88 19:36:03 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC GLOBS Problem: DDMPNRs during the time that SETSPD is copying a DOB dump. Diagnosis: SETSPD takes too long to copy large files, and since CHKR is blocked waiting for SETSPD to run, DDMPNRs are the result. Solution: Change MEXEC's routine DOBSSP to just start the DOB copy then return, storing the fork handle in DOBFRK. Have CHKR see if there is a DOBFRK and call a little routine (DOBKSP) to kill SETSPD if it has finished. Don't start a new SETSPD in DOBSSP unless DOBFRK is zero. [End of TCO 7.1253] TCO-number: 7.1254 Written-by: GSCOTT Creation-date: 6-Mar-88 23:00:45 Edited-by: GSCOTT Edit-date: 8-Mar-88 17:46:50 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: DOB STG Problem: (1) Monitor and/or RSX-20F dies in the middle of a DOB error. RSX20F hangs seen at the beginning or end of a DOB session. (2) Still problems with DOI. (3) Section 0/1 space wasted. (4) Dump may have incorrect information. Diagnosis: (1) PIs on at bad times (like when calling MRTOFF, WATEPT and UNWEPT). Printing out error messages when called back by PHYSIO is a bad idea. (2) Apparently PHYSIO gets gastric distress when fed a rich meal of a number of IORBs each transferring many pages. (3) We got a page then divided it up into NUMIOR size pieces; this is wasteful since the transfers can only be XFRPAG pages long. (4) IORB still busy writing data when we turn on timesharing, this causes memory to be modified before it can be dumped. Solution: (1) Routine SAVPI should return PIOFF, then we can call mysterious routines, then go PION. Shut off the PI system before calling UNWEPT and PFHRST, letting RESTPI restore and reenable the PI system. Don't try to print out the errors using DOB routines when being called back by PHYSIO. IORBDN will set IORBER to the address of the IORB that had an error and light DB%ERR in DOBSTS. IORBER is now called by SAVMEM when DB%ERR is set. (2) Use one IORB and a generous number of contiguous pages for dumping to avoid DOI and let PHYSIO recover from any overrun. (3) Get section 0/1 space for the CCWs using an RS macro based on the computed largest possible IORB (XFRPAG pages) or a CCW size of (XFRPAG/XFRSIZ)+3. The resident general pool can now be cut back to 1400 words, removing TCO 7.1177. (4) Wait for all I/O to complete before shutting off and resetting the PI system (and returning back to APRSRV). [End of TCO 7.1254] TCO-number: 7.1257 Written-by: RASPUZZI Creation-date: 17-Mar-88 19:08:45 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSF Problem: Starting a monitor with the 143 dialog causes CFCLDPs. Diagnosis: Edit 7440 made to DIRINI assumed that CRDIR% was the only caller of DIRINI. As the 143 dialog demonstrates, this is not the case. DIRINI calls CRDSWH which attempts to store something in a JSBVAR location which does not exist because we did not come through here via CRDIR%. Solution: Make DIRINI have 2 entry points. One for CRDIR% and one for all other callers. Only call CRDSWH when DIRINI is entered via CRDIR%. [End of TCO 7.1257] TCO-number: 7.1258 Written-by: RASPUZZI Creation-date: 17-Mar-88 19:20:22 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: ENQ FREE Related-QAR: 88 Problem: FSPOUT while running a program that does like many ENQs and like many DEQs like a lot. Diagnosis: Routine FSPREM like only returns like the total amount of the free space like left in the ENQ pool. ENQ like checks this and fer sure the count is high enough but there may not be a block in the pool that is big enough to like satisfy ENQ's hungry request. Solution: Like add some code to that narly routine FSPREM and have it also return the largest block that is remaining in the pool. Then like teach ENQ that like this is the number that it really cares about. If the largest block is like too small, then have ENQ clean up those tubular cached lock blocks. [End of TCO 7.1258] TCO-number: 7.1259 Written-by: RASPUZZI Creation-date: 17-Mar-88 19:26:36 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUDGR Problem: LCLWAT scheduler test just doesn't work. User's hang in the INFO% JSYS because of it. Diagnosis: The scheduler test data is not put in the right place before MDISMSing to the scheduler test. Solution: Put scheduler test data in the left half of T1 and not the right hand. This may be the last 7.0 TCO. On to autopatch! [End of TCO 7.1259] TCO-number: 7.1260 Written-by: GSCOTT Creation-date: 18-Mar-88 10:26:40 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSM Problem: Jobs don't logout when logged out from another job. Diagnosis: When ELOGO (JSYSM) is called to do the logout it just sets the logout bit in the top fork of the target job. This bit is in FKINT. However if the job is stuck in TCOTST (because output buffers are full since the bozo hit BREAK from a LAT session or terminal is ^Sed) FKINT is not looked at by the scheduler since you never get out of TCOTST since you never output any characters to the terminal. Solution: Clear output buffers immediately after setting the logout bit in FKINT when logging out the target job. This causes the bit in FKINT to be noticed by the scheduler since you get out of TCOTST. Also there is a CFOBF in the logout code that leaves AC1 setup with 1000 from the DISMS in the code that waits for the "Killed by ..." message to complete. [End of TCO 7.1260] TCO-number: 7.1261 Written-by: GSCOTT Creation-date: 18-Mar-88 10:48:24 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MNETDV Problem: NOADDR takes a DOB. Diagnosis: NOADDR just means that the file SYSTEM:INTERNET.ADDRESS file is missing or owie. We should not dump in this case. Solution: Set NOADDR normally not dumpable. [End of TCO 7.1261] TCO-number: 7.1262 Written-by: RASPUZZI Creation-date: 23-Mar-88 11:49:01 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUDGR Problem: XBLTAL BUGHLTs. Diagnosis: CL.ENT is doing a stupid thing. Instead of following the design spec and sending the local port's number over in the SCA buffers, it is sending the destination's port in the SCA buffers. This can wreak havoc on the remote system if 2 requests come from 2 different systems with the same CLUID. Solution: Slap up CL.ENT into using MYPOR4 when slam dunking the CI node number into SCA buffers when calling routine FILLIN. [End of TCO 7.1262] TCO-number: 7.1264 Written-by: RASPUZZI Creation-date: 29-Mar-88 14:42:08 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: IMPDV Problem: SKDPF1 and PITRAP BUGHLTs when booting an ARPAnet monitor. Diagnosis: Code in IMPDV is doing things in places that it shouldn't. These places can get swapped out and IMPDV runs at interrupt level. Solution: Repeat 0 out not needed code. Why is it not needed? Well, it appears that the Internet fork does the work for us. [End of TCO 7.1264] TCO-number: 7.1266 Written-by: RASPUZZI Creation-date: 5-Apr-88 15:10:30 Edited-by: RASPUZZI Edit-date: 5-Apr-88 15:11:16 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: MEXEC Related-QAR: 158 Problem: When someone hits the ENABLE/DISK buttons, the system does not ask "Why Reload?" or "Run CHECKD?" Diagnosis: Small oversight in TCO 7.1226. Solution: Check to see if the front end knew the time. If it didn't, then someone hit enable disk or loaded the system fresh. In this case, flag that the questions must be asked. [End of TCO 7.1266] TCO-number: 7.1267 Written-by: LOMARTIRE Creation-date: 7-Apr-88 10:57:43 Edit-checked: Yes Document: Yes TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: DSKALC Related-QAR: 149 Problem: There is no way for a user to encrypt the system structure during the 143 dialog. Diagnosis: The saga continues....we have finally decided to put this in (again). Solution: Add a question to the 143 dialog to allow the user to set the system structure either encrypted or unencrytped. This question, which will follow "Do you want the default size bootstrap area?", will look like: Do you want to enable password encryption for the system structure? The help text printed if the user hits ? is: [Type 'YES' to enable password encryption or 'NO' to disable password encryption] Only YES (or Y) or NO (or N) can be entered at this point. The Installation manual will have to change to reflect this new question. [End of TCO 7.1267] TCO-number: 7.1268 Written-by: RASPUZZI Creation-date: 7-Apr-88 15:11:30 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: Yes Program: MONITOR Routines-affected: PHYMSC Problem: When PHYMSC builds a UDB, it does not set the US.UNA (disk is unavailable) bit. This is bad as the system now assumes access to the disk when the UDB is built. Diagnosis: This may be causing bad side effects in the login structures code. Some development is planned for the login structures code to use the fact that US.UNA will BE set at UDB creation. This bit will be cleared when the HSC disk has been onlined (unless some bozo puts a 16 bit HDA disk out there in which case the bit will not get cleared). Solution: Add US.UNA to the foray of bits that are set in UDBSTS during UDB creation. [End of TCO 7.1268] TCO-number: 7.1270 Written-by: WADDINGTON Creation-date: 8-Apr-88 15:55:26 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV Problem: LATOP% .LARHC interrupts don't. Diagnosis: We're grabbing the PSI channel from the wrong location in the argument block. Solution: Get PSI channel from correct location. Add a range check just to be on the safe side [End of TCO 7.1270] TCO-number: 7.1272 Written-by: GSCOTT Creation-date: 12-Apr-88 14:36:25 Edited-by: GSCOTT Edit-date: 12-Apr-88 14:48:07 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSM Problem: When edit 7456 was installed to correct a number of accounting problems, the path taken through ATACH with AT%TRM (aka "proxy attach") was not tested or considered since no DEC software makes use of AT%TRM. Edit 7456 fixed a number of accounting problems when session records were not written when session critical data was changed (particularly when the account string or terminal number was changed). Customers do use proxy attach and an SPR was the result. The ATACH JSYS code writes a session record for the target job (specified in AC 1) immediately before the job is detached in preparation for attachment to the target terminal (specified in AC 4). This is done to insure that a session record is written showing the time used in the session during which the ATACH JSYS was started. However, each time a proxy attach (AT%TRM) is done, a bad session record is written which includes a blank username string, a zero session start time, zero runtime, zero session elapsed time and other bad fields. Diagnosis: At ATACHB, it has been determined that the target job (as specified in AC1) is attached to a terminal and needs to be detached before it is attached to the target terminal number (the controlling terminal or the terminal specified in AC4 if AT%TRM is set). Before the job's terminal number is changed, a session record reflecting the time used on the target job's current terminal must be written. First, the target job's JSB is mapped by calling routine MAPJSB, then routine DETREC is called to write the session record. If a proxy attach and the caller's job number is specified as the target job, MAPJSB is called with our own job number. In this case, SETJSB (called by MAPJSB) maps nothing and returns zero as the JSB offset rather than mapping the target job's JSB into FPG1. DETREC depends on the target job's JSB mapping to FPG1, as noted by milestones in MAPJSB. DETREC assumes that the JSB is mapped to FPG1 since the offsets for the USAGE JSYS refer to FPG1A. Since a zero page is used to write the session record instead of a JSB, the USAGE JSYS in DETREC passes lots of zeroes to the accounting file (GIGO), resulting in a very strange session record. Solution: If DETREC is called with a JSB offset of zero then we are writing a session record for our own job, and routine DETSES should do the work instead of DETREC (insert "JUMPE T1,DETSES" at DETREC). [End of TCO 7.1272] TCO-number: 7.1273 Written-by: RASPUZZI Creation-date: 12-Apr-88 14:43:59 Edited-by: RASPUZZI Edit-date: 12-Apr-88 14:45:52 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: CLUDGR Problem: When a system in the cluster crashes, all nodes get INFO% errors even if they are waiting for information from a different system than the crashing one. Diagnosis: CLWAKE is obviously braindead. It went ahead and woke up all forks waiting for cluster information instead of the ones waiting on the crashing system only. Solution: Make CLWAKE check to see if the node of the request is the same as the node that is crashing. If so, then wake up the corresponding fork. If not, let the request remain for an answer. [End of TCO 7.1273] TCO-number: 7.1274 Written-by: RASPUZZI Creation-date: 13-Apr-88 20:49:14 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: IMPDV Problem: TCO 7.1264 was installed without thinking (that happens). Diagnosis: SKDPF1s and PITRAPs only occur when HSTSTS is bigger than 400000 and is not the case in the clock tape monitor. Therefore, 7.1264 should not be in the clock tape monitor. The code that was removed, does, in fact, appear to be OK. Solution: Restore the REPEAT 0ed code that 7.1264 took out. [End of TCO 7.1274] TCO-number: 7.1275 Written-by: GSCOTT Creation-date: 14-Apr-88 14:33:11 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PHYM78 Problem: TM8FKRs when they aren't deserved. Diagnosis: At TM8ZAP we do a TM CLEAR to the TM78 and then we enter a loop waiting for TM READY to come up. This loop consists of a massbus register read and a test for TM READY. When this loop doesn't get TM READY in 10000 (octal) tries, a TM8FKR is printed and TM8ZAP returns. Callers of TM8ZAP don't really care if TM READY comes up (at startup time we will fail to see some drives; after a TU FAULT, TM READY is checked before restarting the I/O). Solution: Investigation shows that this loop counter may get as high as 30000 (octal) before TM READY comes up. Further investigation shows that TOPS-blue uses 40000 as a loop counter when it wants to wait for TM READY after TM CLEAR comes up. So, the best thing to do is to change the loop counter to 40000, and if anyone geta a TM8FKR it will hopefully be from a broken TM78. [End of TCO 7.1275] TCO-number: 7.1276 Written-by: RASPUZZI Creation-date: 20-Apr-88 10:22:45 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: Yes Program: MONITOR Routines-affected: PHYMSC PHYPAR DSKALC Related-QAR: 170 Problem: Sometimes when a system boots in a clustered environment, it does not find the login structure. Diagnosis: It appears that MSCP activity has not settled enough in the dust to let FNDLGS do its thing. Solution: Wait for 10 seconds at the top of FNDLGS first. Then introduce a new bit (U1.NOL) that appears in the second status word of a unit's UDB. This bit will be set upon creation of the UDB for a disk unit and cleared when a unit is onlined. Have CHKUDB wait for this bit to be cleared but only for MSCP disks. [End of TCO 7.1276] TCO-number: 7.1278 Written-by: RASPUZZI Creation-date: 20-Apr-88 11:14:24 Edit-checked: No Document: Yes TCO-tested: No Maintenance-release: No Hardware-related: Yes Program: MONITOR Routines-affected: PHYSIO FREE Problem: Lots of ONSTR/OFFSTRs and RELRNG BUGHLTs. Diagnosis: ONSTRs and OFFSTRs are informational and RELRNGs are not serious enough to be BUGHLTs (except in field test). Solution: For the official release, ONSTR/OFFSTRs will be under CIBUGX and RELRNGs will be BUGCHKs. DOB% can be used to take a RELRNG dump. [End of TCO 7.1278] TCO-number: 7.1279 Written-by: GSCOTT Creation-date: 21-Apr-88 16:05:05 Edited-by: GSCOTT Edit-date: 21-Apr-88 16:10:49 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSM Problem: MOUNTR gives "accounting record not written: No such job" when a job logs out with a tape or disk mounted. The system must be under load to see this problem. Diagnosis: When a user logs out, the monitor sends an IPCF message to MOUNTR. When MOUNTR gets this IPCF message, it issues USAGE records for the regulated structures or magtapes that were mounted by the job. If the job finishes logging out before MOUNTR does the USAGE JSYS, the USAGE fails with a "No such job" error. The call to GL2LCL in UFNINI returns this error. Solution: The reason that GL2LCL is called is to validate the job and get the local job number to put into the block that is queued to job 0 to update the checkpoint file. The checkpoint file will not be updated when the USAGE record queued is not a session type record (e.g. the USAGE function is .USENT). There is no need to call GL2LCL when the checkpoint file is not being updated. The cure is to check the entry type (as specified in AC1 of the USAGE JSYS call) and do not call GL2LCL if the function is .USENT. During the investigation into this problem it was discovered that there are a numbr of places in the accounting code where "PS:[ACCOUNTS]" is referred to. These will be changed to "ACCOUNT:" (as changed by the Login Structures Project). [End of TCO 7.1279] TCO-number: 7.1280 Written-by: GSCOTT Creation-date: 25-Apr-88 13:20:10 Edited-by: GSCOTT Edit-date: 25-Apr-88 13:36:18 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: Yes Hardware-related: No Program: MONITOR Routines-affected: DTESRV Problem: Edit 7449 (to 6.1) attempted to prevent SKDCL1s at power fail restart time by replacing the routine DTBELL in DTESRV. After edit 7449 is installed, DN60s only load every other try. Diagnosis: Edit 7449 replaced DTBELL which had a scheduler test in it for the usual DBTMR scheduler test. Since at power fail restart time we are at scheduler level it is impolite to try a scheduler test. At system boot time and other times that DTBELL is called we are not at scheduler level. The replacement code doesn't make use of a scheduler test (hence no pesky SKDCL1s), but it doesn't allow the DN60 to be reliably loaded. We won't mention names here, but I guess you can imply that some engineer didn't test edit 7449 thoroughly ("No more SKDCL1s? Ship it!"). Since the only time that DTBELL is called at scheduler level is when we are in a power fail restart, and since most power fail restarts fail on non-core memory systems (and for other reasons not related to this problem), it seems reasonable to put the old code back for release 7. A better fix will have to go out on a future Autopatch tape. Solution: Remove edit 7449, and reinsert the old code. [End of TCO 7.1280] TCO-number: 7.1281 Written-by: WADDINGTON Creation-date: 3-May-88 14:51:29 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV Problem: Bad multicast messages being generated by TOPS-10/20 Diagnosis: We don't interlock the Multicast Buffer, so it can get changed while it is still in the DLL, thereby corrupting the buffer Solution: Interlock the Multicast buffer by setting bit DLL.FL in the UID Field. Clean up LAINTX's handling of multicast buffers. Remove a (now) redundant test from XMTDON. Test for the DLL.FL bit at the beginning of LATXMC. [End of TCO 7.1281] TCO-number: 7.1282 Written-by: RASPUZZI Creation-date: 3-May-88 16:37:48 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: PAGEM Related-SPR: 21881 Problem: NSKDIS BUGHLTs. Diagnosis: Routine SECMAP goes NOSKED and then calls LCKOFN. LCKOFN is assumed to be called OKSKED because it may wind up waiting for an OFN which is locked to be freed. It does this by calling WTOFNS. SECMAP is violating this rule by calling LCKOFN NOSKED. Solution: Instead of calling LCKOFN, have SECMAP simulate the code in line as per RELP4 has done. This will ensure that WTOFNS will be called in the correct state. [End of TCO 7.1282] TCO-number: 7.1283 Written-by: RASPUZZI Creation-date: 5-May-88 15:09:34 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: IPCF IPIPIP Problem: None observed but there are code changes to IPCF.MAC and IPIPIP.MAC that users of the GTDOM% JSYS would like to have. Mainly, it is code that makes page mode transfers work in monitor context. Diagnosis: As above. Solution: Add code to IPCF.MAC and make a routine global in IPIPIP.MAC . [End of TCO 7.1283] TCO-number: 7.1285 Written-by: LOMARTIRE Creation-date: 6-May-88 06:52:57 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: PAGUTL Related-QAR: 9 Problem: OFJFBD BUGHLTs. Diagnosis: Routine CHKLAC is used to insure that a long file is being opened consistently with regards to it's former short file access. If page table zero is open on the system, then CHKACC is called to do the validity check. If PT0 had previously only been opened unrestricted, and the new opening is "real", then the PT0 access flags in SPTH must be updated to reflect the PTT access. However, the wrong instruction was used to do this and the OFN2XB bit for PT0 was getting cleared. This was being detected later and the OFJFBD resulted. Solution: Change the XORM to an IORM to set the bits in SPTH of PT0. [End of TCO 7.1285] TCO-number: 7.1286 Written-by: LOMARTIRE Creation-date: 9-May-88 08:40:51 Edited-by: LOMARTIRE Edit-date: 9-May-88 10:28:47 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQ Related-TCO: 7.1072 Related-QAR: 200 Problem: Jobs on different systems can end up waiting forever in ENQTST for the same lock. Also, a job on one system is not always correctly notified of the release of a lock from another system. Diagnosis: Routine QSKDRC is an alternate entry point to QSKD. It is called by Vote Responder routine EVQSKD when an incoming Q-Block scheduling query is received. The exising logic in QSKD does not take into account that a Lock-Block can now have its first (and possibly only Q-Block) be unlocked. The existing code incorrectly assumes that the first block is locked and this causes a "No" reply to the incoming vote request. TCO 7.1179 attempted to cut down on needless broadcasts for non-cluster-wide locks. Unfortuneately, when the last Q-Block is DEQed, the EN.CLL bit is cleared in the Lock-Block (by routine QDLBFS) and then LOKSKD is called to cause other nodes to be notified. But, since EN.CLL is cleared, no notification will be sent. Solution: First, make QSKD smarter. If QSKDRC is called, check EN.LOK in the Q-Block during the scanning process. If it is not sent, then ignore that Q-Block in the verification process. Second, rearrange the code at QDEQ0 so that LOKSKD is called before QDLBFS. In this way, the state of the Lock-Block will accurately reflect the last locker (before being reset by QDLBFS). [End of TCO 7.1286] TCO-number: 7.1287 Written-by: RASPUZZI Creation-date: 13-May-88 16:32:43 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: STG Related-QAR: 211 Problem: SKDPF1 BUGHLTs when a reverse LAT connection is completed. Diagnosis: LATSRV attempts to write into the DEVCHR table to assign the TTY device to the job doing the host intiated connect. However, DEVCHR is in the swappable monitor and may be swapped out at the time the scheduler decides to do this. Solution: Make DEVCHR resident. [End of TCO 7.1287] TCO-number: 7.1288 Written-by: RASPUZZI Creation-date: 17-May-88 14:22:44 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: Yes Program: MONITOR Routines-affected: PHYKLP Problem: KLPNOM BUGHLTs and no useful code in PHYKLP being used. Diagnosis: Routine SANCHK is called when someone gives back a buffer to the port. Unfortunately, this routine has been RETed because it caused CI problems at one time. Since this code exists under the KLPDBG conditional, it is beneficial to have it useable. No one runs a DEBUG monitor unless they are having serious troubles and need traces (like we do for our KLPNOMs). Solution: Remove the RET in SANCHK and have it perform the sanity checks it was predestined to do. [End of TCO 7.1288] TCO-number: 7.1289 Written-by: WADDINGTON Creation-date: 19-May-88 14:58:56 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: LATSRV Problem: COMMMS Bughalts (again...) Diagnosis: TCO 7.1281 moved the testing/clearing of DLL.FL and SAV.FL from scheduler level to interrupt level, thereby opening a large window where transmit buffers could be released twice, which of course results in a COMMMS. Solution: Move the DLL.FL/SAV.FL code back to XMTDON where it belongs. [End of TCO 7.1289] TCO-number: 7.1290 Written-by: GSCOTT Creation-date: 19-May-88 16:08:05 Edited-by: GSCOTT Edit-date: 19-May-88 16:14:36 Edit-checked: No Document: No TCO-tested: No Maintenance-release: No Hardware-related: No Program: MONITOR Routines-affected: JSYSF Problem: When a structure is created and minimum password length is more than 6, then the new structure's <OPERATOR> directory is full of zeroes (which is not a legal format for a directory). Obvious result is DIRPG0 and inability to load files into <OPERATOR>. Diagnosis: It would appear that the well intentioned minimum password length project has reared its ugly head again. The check for minimum password length should not be enforced if the CRDIR is done from monitor context (FILINI building initial file structure directories in FILCRD). Solution: Check for previous context of monitor in CRDI2A plus some and don't do the minimum password length check if CRDIR called from monitor. God, I'm really glad we caught this one 2 days before the clock tape freeze! [End of TCO 7.1290] TCO-number: 7.1292 Written-by: LOMARTIRE Creation-date: 24-May-88 16:18:36 Edited-by: LOMARTIRE Edit-date: 25-May-88 16:00:09 Edit-checked: Yes Document: No TCO-tested: Yes Maintenance-release: No Hardware-related: No Program: Monitor Routines-affected: ENQSRV Related-TCO: 7.1072 Related-QAR: 210 Problem: Fork hung in EVWAIT waiting for a cluster-wide ENQ vote reply to be returned. Diagnosis: It is possible to have the ENQ Answer Fork running on the same system which is trying to issue a vote request. This is a very basic violation of the rule that says: "If thy ENQ Answer Fork is running, then thy ENQ Database Lock Token must be heldth on another node (thy one which issueith thy vote)." This violation will allow multiple people to be fooling around with the VRQA. This will make the voting results indeterminate and mess up the count of outstanding replies (VOTVCT). Solution: Do not exit EVWAIT if a "No" reply is received. This will insure that the Answer fork runs on all systems while the voting system has the Database Lock Token. Also, do not exit ASK4IT is a "No" reply is received during the voting loop. Instead, jump to ASKCHK to wait for replies from any votes sent thus far. In ASKCHK, don't return success if a "No" reply was received. [End of TCO 7.1292]