EXTENDED ADDRESSING ON THE DECSYSTEM-20 Programming Implications and Methodology Dan Murphy The basic architecture of the DECsystem-10 family of processors up through the KL10 provided an 18-bit, 256K-word addressing space. This means in particular that: 1. The Program Counter register is 18 bits; 2. Each and every instruction executed computes an 18-bit effective address. The contents of this address may or may not be referenced depending on the actual instruction, but the algorithm for calculating it (including indexing and indirecting rules) is the same for all instructions. The above is true regardless of the size of the physical core memory available on any particular configuration. Note also that paging (or relocation on earlier processors) will determine if a particular virtual address corresponds to an existing physical word in memory, but the fundamental size of the virtual address space is a constant. During the design of the KL10 processor, the need for a larger virtual address space was recognized. A design was developed which provides an extended address space to new programs while still allowing existing unmodified programs to execute correctly on the same processor. Although most of the essential data paths were provided in the original KL10 implementation, various design changes caused actual availability of extended addressing operation as described here to be deferred to the "model B" KL10 CPU. Page 2 VIRTUAL ADDRESS SPACE Under the extended addressing design, the virtual address space of the machine is now 30 bits, or 1,073,741,824 words. Although one can think of this as a single homogeneous space, it is generally more useful to consider an address as consisting of two components, the section number, and the word number. 0 5 6 17 18 35 - - - - -------------------------------------- ! ! ! ! ! ! section ! word ! ! ! 12 bits ! 18 bits ! - - - - -------------------------------------- The word (more precisely word-within-section) field consists of 18 bits and thus represents a 256K-word address space similar to the single address space on earlier machines. The section number field is 12 bits and thus provides 4096 separate sections, each of 256K words. Each section is further divided into pages of 512 words each just as on earlier machines. The paging facilities allow the monitor to independently establish the existence and protection of each section as a unit. In order to implement this extended 30-bit address space, the PC must be extended to hold a full address. The PC is always considered to consist of a section field and a word field, and the incrementing of the PC never allows a carry from the word field into the section field. That is, if a program allows flow of control to reach the last word of a section and the instruction in that location is not a jump, then the PC will "wrap-around", and the next instruction fetched will be word 0 of the same section. This will in fact be AC 0 as described below. The consequence of this is that flow of control of a program cannot inmplicitly cross section boundaries. In general, it would be a programming error to allow the PC to reach the last location of a section and execute a non-jump instruction or to execute a skipping instruction in either of the last two locations of a section. Page 3 COMPATIBILITY In order to allow efficient use of the extended address space, it was necessary to modify the operation of various machine instructions and to change the algorithm for the calculation of effective addresses. Because these changes have a high probability of causing any existing program to malfunction, the following convention was adopted: If a program is executing in section 0, all instructions are executed exactly as they would be on a non-extended machine. If a program is executing in any section other than 0, the extended addressing algorithms are used for effective address calculation and instruction execution. A program is said to be executing in section 0 when the section field of the PC contains 0. Effective address calculations and instruction executions are performed exactly as on a non-extended machine; hence any existing program will work correctly if run in section 0. Note however, that this also implies that no addresses outside of section 0 can be generated, either for data references or for jumps. That is, a program executing in section 0 cannot leave section 0 except via a monitor call.* It is easy however to write or generate code which is compatible with both extended and non-extended execution. This was a specific goal of the design, and in general requires only that certain precautions must be taken regarding previously unused fields. Since these precautions must be taken however, one cannot generally assume that any existing program will have observed them and so execute correctly in an extended section. - - - - - - - - - - - - - - - - - *The only exception to this is the XJRSTF instruction. Page 4 EFFECTIVE ADDRESS COMPUTATION Unless explicitly stated otherwise, everything in the following discussion refers to execution of instructions with the PC in a non-0 section; nothing applies to instructions executed from section 0. 1. Instruction format The format of a machine instruction is the same as on a non-extended machine. In particular, the effective address computation is dependent on three quantities from the instruction, the Y (address) field, the X (index) field, and the I (indirect) field. These are 18 bits, 4 bits, and 1 bit respectively. 0 8 9 12 13 14 27 18 35 --------------------------------------------------- ! ! ! ! ! ! ! OPCODE ! AC ! I! X ! Y ! ! ! ! ! ! ! --------------------------------------------------- Depending on the format of the index and indirect words, the effective address algorithm will perform either 18-bit or 30-bit address computations. When a 30-bit quantity is indicated, an explicit section number is being specified and the address is called a global address. When an 18-bit quantity is indicated, the section field is defaulted from some other quantity (e.g., the PC), and the address is thus local to the default section and is called a local address. In the simplest case, consider an instruction which specifies no indexing or indirection. E.g., 3,,400/ MOVE T,1000 Here the effective address computation yields a local address 1000, and the section used for the reference is 3, i.e., the section from which the instruction itself was fetched. Precisely stated: Whenever an instruction or indirect word specifies a local address, the default section is the section from which the word containing that address was taken. This means that the default section will change during the course of an effective address calculation which uses indirection. The default section will always be the section from which the last indirect word was fetched. Page 5 2. Indexing The first step in the effective address calculation is to perform indexing if specified by the instruction. The calculation performed depends on the contents of the specified index register: 1. If the left half of the contents of X is negative or 0, the right half of X is added to Y (from the instruction word) to yield an 18-bit local address. This is consistent with indexing on a non-extended machine, and means, for example, that the usual AOBJN and stack pointer formats can be used for tables and stacks which are in the same section as the program. 2. If the left half of the contents of X is positive and non-0, the entire contents of X are added to Y (sign extended) to yield a 30-bit global address. This means that index registers may be used to hold complete addresses which are referenced via indexed instructions. A Y field of 0 will commonly be used to reference exactly the address contained in X. Small positive or negative offsets (magnitude less than 2**17) may also be specified by the Y field, e.g., for referencing data structure items in other sections. 3. Indirection If indirection is specified by the instruction, an indirect word is fetched from the address determined by Y and indexing (if any). The indirect word is considered to be "instruction format" if bit 0 is a 1, and "extended format" if bit 0 is a 0. 1. Instruction format indirect word (IFIW): This word contains Y, X, and I fields of the same size and in the same position as instructions, i.e., in bits 13-35. Bit 1 must be 0 (its use is reserved for future hardware); bits 2-12 are not used. The effective address computation continues with the quantities in this word just as for the original instruction. That is, indexing may be specified and may be local or global depending on the left half of the index, and further indirection may be specified. Note that the default section for any local addresses produced from this indirect word will be the section from which the word itself was fetched. 2. Extended Format Indirect Word (EFIW): This word contains Y, X, and I fields also, but in a different format such as to allow a full 30-bit address field. Page 6 0 1 2 5 6 17 18 35 ----------------------------------------------------- ! ! ! ! ! ! ! ! ! ! <-----------! -----------------> ! !0 ! I! X ! (section) ! Y (word) ! ! ! ! ! ! ! ----------------------------------------------------- If indexing is specified in this indirect word, the entire contents of X are added to the 30-bit Y to produce a global address. A local address is never produced by this operation, and the type of operation is not dependent on the contents of X. Hence, either Y or C(X) may be used as an address or an offset within the extended address space just as is done in the 18-bit address space. If further indirection is specified, the next indirect word is fetched from Y as modified by indexing if any. The next indirect word may be instruction format or extended format, and its interpretation does not depend on the format of the previous indirect word. 4. Some examples 1. Simple instruction reference within the current PC section: 3,,400/ MOVE T,1000 ;fetches from 3001000 JRST 2000 ;jumps to 3002000 2. Local tables may be scanned with standard AOBJN loops: MOVSI X,-SIZ LP: CAMN T,TABL(X) ;TABL in current section JRST FOUND AOBJN X,LP Page 7 3. Global tables may be scanned with full address and index: MOVEI X,0 LP: CAMN T,@[EFIW TABL,X] ;TABL(X) in ext fmt JRST FOUND CAIGE X,SIZ-1 AOJA X,LP 4. Subroutine argument pointer may be passed to subroutine in another section: word in arglist: IFIW @VAR(X) ;indexing and indirecting ;if specified will be relative ;to the section in which this ;pointer resides, i.e., the ;section of the caller IMMEDIATE INSTRUCTIONS All effective address computations yield a 30-bit address defaulting the section if necessary, as described above. Immediate instructions use only the low-order 18-bits of this as their operand however, setting the high-order 18 bits to 0. Hence, instructions such as MOVEI, CAI, etc. produce identical results regardless of the section in which they are executed. Two immediate instructions are implemented which do retain the section field of their effective addresses. Page 8 1. XMOVEI (opcode 415) Extended Move Immediate: This instruction loads the entire 30-bit effective address into the designated AC, setting bits 0-5 to 0. If no indexing or indirection is specified, the current PC section will appear in the section field of the result. This instruction would replace MOVEI in those cases where an address (rather than a small constant) is being loaded, and the full address is needed. Example: calling a subroutine in another section (assuming arglist in same section as caller): XMOVEI AP,ARGLIST PUSHJ P,@[EFIW SUBR] The subroutine could reference arguments as: MOVE T,@1(AP) or could construct argument addresses by: XMOVEI T,@2(AP) In both cases, the arglist pointer would be found in the caller's section because of the global address in AP. The actual section of the effective address is determined by the caller, and is implicitly the same as the caller if an IFIW is used as the arglist pointer, or is explicitly given if an EFIW is used. 2. XHLLI (opcode 501) This instruction replaces the left half of the designated accumulator with the section number of its effective address. It is convenient where global addresses must be constructed. Page 9 AC REFERENCES Any reference to a local address in the range 0-17(8) will be made to the hardware ACs. Also, any gobal reference to an address in section 1 in the range 0-17(8) (i.e., 1000000-1000017) will be made to the hardware ACs. Global references to locations 0-17(8) in any section other than section 1 will reference memory. Thus: 1. Simple addresses in the usual AC range will be reference ACs as expected, e.g., MOVE 2,3 will fetch from hardware AC 3 regardless ot the current section. 2. To pass a global pointer to an AC, a section number of 1 must be included. 3. Very large arrays may lie across section boundaries; they will be referenced with global addresses which will always go to memory, never to the hardware ACs. 4. PC references are always considered local references; hence a jump instruction which yields an effective address of 0-17 in any section will cause code to be executed from the ACs. SPECIAL CASE INSTRUCTIONS In addition to the differences in effective address calculation, certain instructions are affected in other ways by extended addressing. 1. Instructions which store/load the PC; PUSHJ, POPJ, JSR, JSP. These instructions store a 30-bit PC without flags and with bits 0-5 of the destination word set to 0. POPJ restores the entire 30-bit PC from the stack word. JSA and JRA are not affected by extended addressing and store/load only 18 bits of PC. Hence they are not useful for inter-section calls. 2. Stack instructions (PUSHJ, POPJ, PUSH, POP, ADJSP) use a local or global address for the stack according to the contents of the stack register following the same convention as for indexing. That is, if the left half of the stack pointer is 0 or negative (prior to incrementing or decrementing), a local address using the right half of the stack pointer is computed and used for the stack reference. If the left half of the stack pointer is positive and non-0, the entire word is taken as a global address. In the latter case, incrementing and decrementing of the stack pointer is done by adding of subtracting 1, not 1000001. Hence, a global stack for routines in many sections may be used in a similar manner to present stacks. Stack overflow and underfow protection would be done by making the pages before and after stack inaccessible since a space count field is not present in a gobal stack pointer. Page 10 3. Byte instructions. Two formats of byte pointer are recognized by the byte instructions. The non-extended format is identical to the present standard byte pointer format and is recognized if bit 12 (previously unused) is 0. If bit 12 is 1, a two-word extended byte pointer format is recognized which contains the fields as shown: 0 5 6 11 12 13 17 18 35 ------------------------------------------------------------- ! ! ! ! ! ! ! P ! S ! 1 ! MBZ ! AVAIL TO SFTW. ! ! ! ! ! ! ! ------------------------------------------------------------- ! ! ! ! ! ! ! ! I!X! <------------------- ! Y ---------------> ! ! ! ! ! ! ! ------------------------------------------------------------- 0 1 2 5 35 Note that the second word is identical to the Extended Format Indirect Word (EFIW). The right half of the first word is specifically reserved to software for byte counts, etc. Incrementing of this format of byte pointer is consistent with non-extended format; the P field is reduced until the end of the word is reached, whereupon the address in the second word in incremented. Incrementing may cause a carry from the word field to the section field of the address; hence extended byte arrays may lie across section boundaries. 4. Other new or modified instructions are LUUOs, BLT, XBLT, XCT, XJRSTF, XJEN, XPCW, XSFM. Some of these are valid only in exec mode. Consult the System Reference Manual or chapter 2.2 of the KL10 Functional Specifications for details. COMPATIBLE PROGRAMMING It is possible to generate code which works correctly in both extended and non-extended environments. Such code may even utilize inter-section references when running in an extended environment; it is not limited to local addressing. The basic rule is to observe the extended addressing rules in the construction or computation of: 1. Index words: be sure the left half is cleared or contains a negative quantity (e.g., a count) when setting up and using an index register. This will cause a local reference in the extended environment which will produce the same result as on the non-extended machine. 2. Indirect words: always set bit 0 of indirect words used in argument lists, etc. so as to produce local addresses. Page 11 3. Stack pointers: the most common present format (negative count in left half) works consistently under extended addressing; modifying a program to use an extended stack should require no change to stack instructions except if the code expects to find the processor flags in the stacked PC words. When generating addresses to be passed to subroutines or used by other code, XMOVEI and XHLLI instructions may be used. In the non-extended environment, these opcodes are SETMI (equivalent to MOVEI) and HLLI respectively, which always load 0 into the left half. PROGRAM ARCHITECTURE AND FACILITIES Ultimately, the extended addressing hardware in conjunction with the monitor should provide some of the power of general segmentation and some other useful conventions and facilities. The following are some of the ideas which have been advanced. 1. The fact that instructions are generally executed relative to the current sections suggests that subroutine libraries and facilities packages can be written which can be dynamically loaded into any available section and used by many programs. An important point to observe in connection with this and most of the following conventions is that programs must not be compiled with fixed section numbers built into the code. Programs should be built so as to be loadable into any section and to request additional section allocations from the monitor as necessary. This is the only reasonable way to ensure that conflicting use of particular sections is avoided. 2. An entire section can be mapped by the monitor as quickly as a single page. Hence, an entire file (up to 256K) can be mapped into a section, and the data therein manipulated with ordinary instructions. 3. Programs can greatly reduce memory management logic by merely assuming very large sizes for all data bases. It is not however, very efficient to use many sections, each with only a small amount of data. A reasonable middle ground should be chosen.