# **CRAY® COMPUTER SYSTEMS** CRAY-2 COMPUTER SYSTEM FUNCTIONAL DESCRIPTION HR-2000 Copyright® 1985, 1986, 1987 by CRAY RESEARCH, INC. This manual or parts thereof may not be reproduced in any form without permission of CRAY RESEARCH, INC. Each time this manual is revised and reprinted, all changes issued against the previous version are incorporated into the new version and the new version is assigned an alphabetic level. Every page changed by a reprint with revision has the revision level in the lower righthand corner. Changes to part of a page are noted by a change bar in the margin directly opposite the change. A change bar in the margin opposite the page number indicates that the entire page is new. If the manual is rewritten, the revision level changes but the manual does not contain change bars. Requests for copies of Cray Research, Inc. publications should be directed to Logistics and comments about these publications should be directed to: CRAY RESEARCH, INC. ATTENTION: Publications 890 Industrial Blvd. Technical Operations Building Chippewa Falls, WI 54729 | Revision | Description May 1985 - Original printing. | |----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | A | October 1986 - This reprint with revision corrects various errata and improves the format of the manual. New instructions and CAL examples were added to section 3. A reference to pseudobanking was added to section 4. The name of a controller was changed to External I/O controller from Front-end Interface due to confusion with another device. All previous versions are obsolete. All trademarks are now documented in the record of revision. | | В | February 1987 - This reprint with revision incorporates the HSX channel and the 128-Mword Common Memory four-processor and two-processor versions of the CRAY-2 computer system. | The UNICOS operating system is derived from the AT&T UNIX System V operating system. UNICOS is also based in part on the Fourth Berkeley Software Distribution under license from The Regents of the University of California. CRAY, CRAY-1, SSD, and UNICOS are registered trademarks and APML, CFT, CFT77, CFT2, COS, CRAY-2, CRAY X-MP, CSIM, IOS, SEGLDR, SID, and SUPERLINK are trademarks of Cray Research, Inc. HYPERchannel is a registered trademark of Network Systems Corporation. UNIX is a registered trademark of AT&T. #### PREFACE This manual describes the functions of the CRAY-2 computer system and the Cray Assembly Language (CAL) Version 2 symbolic machine instructions specifically used with this machine. It is written to assist programmers and engineers, and the manual assumes the readers have a familiarity with digital computers and assemblers. The manual describes the overall computer system including its configuration and characteristics. It also describes the operation of the Common Memory, Foreground Processor, and Background Processors. This manual explains both the machine code and the associated symbolic machine instructions. Site planning information for the CRAY-2 computer system is available in the CRAY-2 Site Planning Reference Manual, publication HR-2001. Additional information on the Cray Assembly Language (CAL) Version 2 is available in the CAL Version 2 Assembler Reference Manual, publication SR-2003. ## WARNING This device operates in accordance with FCC Part 15, Subpart J Rules under exemption Rule 15.801 (c)(3) adopted August 5, 1986. | _ | |----------| | | | | | _ | | _ | | - | | <u> </u> | | | | | | | | _ | | _ | | _ | | →<br>- | | _ | | | # CONTENTS | 1.1 CRAY-2 COMPUTER SYSTEM FEATURES | ίi | |-------------------------------------|------------| | 1.1 CRAY-2 COMPUTER SYSTEM FEATURES | -1 | | 1.1.1 Physical characteristics | _ | | 1.1.1 Physical characteristics | -1 | | 1.2 CONVENTIONS | -2 | | 1.2.1 Examples | -4 | | 1.3 ORGANIZATION | -6 | | 2. BACKGROUND PROCESSOR | -6 | | 2.1 CONTROL SECTION | -7 | | 2.1 CONTROL SECTION | | | 2.1.1 Instruction issue and control | -1 | | 2.1.1 Instruction issue and control | 1 | | Program Address register | | | Instruction buffers | | | Instruction issue 2 | | | | - | | 2.1.2 Real-time clock | - 3<br>- 3 | | | -3<br>-3 | | <u> </u> | -3<br>-4 | | | -4 | | | -4 | | <del>-</del> | - <b>5</b> | | | -5<br>-5 | | | -5<br>-5 | | , | -5 | | | - 5<br>- 6 | | | -6 | | | -7 | | | -7 | | | -8 | | | -8 | | | -9 | | | -9 | | | -9 | | | -9 | | | -9 | | • | -10 | | | -10 | | | -10 | | | -10 | | | -11 | | | -11 | | | 2.5 | ARITHMETIC OPERATIONS (continued) | | |------|--------|-----------------------------------------------------|-----| | | | 2.5.2 Floating-point arithmetic 2- | -11 | | | | Normalizing 2- | -12 | | | | - J | -13 | | | | Floating-point addition 2- | -13 | | | | Floating-point subtraction 2- | -13 | | | | Floating-point to integer conversion 2- | -14 | | | | Integer to floating-point conversion 2- | -14 | | | | Floating-point product 2- | -14 | | | | Reciprocal approximation 2- | -16 | | | | Reciprocal iteration 2- | -17 | | | | | -19 | | | | | -19 | | | | | | | 3. | BACKG | ROUND PROCESSOR SYMBOLIC MACHINE INSTRUCTIONS | -1 | | | 3.1 | SYMBOLIC INSTRUCTION FORMAT | -1 | | | 3.2 | | -2 | | | 3.3 | | -3 | | | | | | | 4. | COMMO | N MEMORY | -1 | | | 4.1 | MEMORY ADDRESSING | -1 | | | 4.2 | MEMORY ACCESS | -1 | | | 4.3 | MEMORY CONFLICTS | -2 | | | 4.4 | MEMORY BACKUP 4 | -2 | | | 4.5 | MEMORY ERROR CORRECTION | -3 | | 5. | FOREG | ROUND SYSTEM | -1 | | | 5.1 | FOREGROUND COMMUNICATION CHANNELS | -1 | | | 5.2 | | -2 | | | J.L | | -3 | | | | 2 L | -3 | | | 5.3 | | -3 | | | 3.3 | | -3 | | | 5.4 | | -4 | | | | | -5 | | | | FOREGROUND PROCESSOR | | | | | | -6 | | APPE | NDIX S | ECTION | | | Α. | SYMBO | LIC MACHINE INSTRUCTIONS LISTED BY FUNCTIONALITY A- | 1 | | | A.1 | SYMBOLIC NOTATION | 1 | | λ. | SYMBOLIC | MACHINE | INSTRUCTIONS | LISTED F | BY | FUNCTIONALITY | (continued | |----|----------|---------|--------------|----------|----|---------------|------------| | | | | | | | | | | | DELVICE TYPE THE TOTAL TOTAL | | | | | | | | |------|---------------------------------------------|---|---|---|---|---|---|------| | A.2 | BRANCH INSTRUCTIONS | | | | | | | A-1 | | | A.2.1 Conditional branches | | | | | | | A-3 | | | A.2.2 Unconditional jumps | | | | | | | A-3 | | | A.2.3 Exits | | • | • | | | | A-4 | | A.3 | PASS INSTRUCTIONS | | | | | | • | A-4 | | A.4 | SEMAPHORE INSTRUCTIONS | | | | | | | A-4 | | A.5 | REGISTER ENTRY INSTRUCTIONS | | | | | | | A-5 | | | A.5.1 Entries into A registers | | | | | | | A-5 | | | A.5.2 Entries into S registers | | | | | | | A-6 | | | A.5.3 Entries into V registers | | | | | | | A-7 | | A.6 | INTER-REGISTER TRANSFER INSTRUCTIONS | • | • | • | • | • | • | A-7 | | A.0 | A.6.1 Transfers to A registers | • | • | • | • | • | • | A-7 | | | A.6.2 Transfers to S registers | | | | | | | | | | | | | | | | | A-8 | | | A.6.3 Transfers to V registers | | | | | | | A-8 | | | A.6.4 Transfer to Vector Mask register | | | | | | | A-9 | | | A.6.5 Transfer to Vector Length register. | | | | | | | A-9 | | A.7 | MEMORY TRANSFER INSTRUCTIONS | | | | | | | A-10 | | | A.7.1 Stores | | | | | | | A-10 | | | Local Memory writes | | | | | | | A-10 | | | Common Memory writes | • | | | • | • | | A-11 | | | A.7.2 Loads | | | | | | | A-12 | | | Local Memory reads | | | | | | | A-12 | | | Common Memory reads | | | | | | | A-13 | | | Memory Range Error flags | | | | | | | A-13 | | A.8 | INTEGER ARITHMETIC OPERATION INSTRUCTIONS . | | | | | | | A-14 | | | A.8.1 Integer sums | | | | | | | A-14 | | | A.8.2 Integer differences | | | | | | | A-14 | | | A.8.3 Integer products | | | | | | | A-15 | | A.9 | FLOATING-POINT ARITHMETIC INSTRUCTIONS | • | • | • | • | • | • | | | A.9 | | | | | | | | A-15 | | | A.9.1 Floating-point sums | | | | | | | A-15 | | | A.9.2 Reciprocal iterations | | | | | | | A-16 | | | A.9.3 Reciprocal approximations | | | | | | | A-16 | | | A.9.4 Floating-point differences | | | | | | | A-16 | | | A.9.5 Integer to floating-point conversions | | | | | | | A-17 | | | A.9.6 Floating-point to integer conversions | | | | | | | A-17 | | | A.9.7 Floating-point products | | • | | | • | • | A-17 | | | A.9.8 Square root iterations | • | | | • | • | | A-18 | | | A.9.9 Square root approximations | | | | | | | A-18 | | | A.9.10 Floating-point errors | | | | | | | A-18 | | A.10 | LOGICAL OPERATION INSTRUCTIONS | | | | | | | A-19 | | | A.10.1 Logical products | | | | | | | A-19 | | | A.10.2 Logical sums | | | | | | | A-19 | | | A.10.3 Vector streaming | | | • | • | • | • | A-20 | | | A.10.4 Logical differences | | | • | • | • | • | A-20 | | | | | | • | • | • | • | A-21 | | | | | | • | • | • | • | | | λ 11 | - | | | • | • | • | • | A-21 | | A.11 | | • | • | • | • | • | • | A-22 | | A.12 | SHIFT INSTRUCTIONS | • | • | • | • | • | • | A-23 | | | A.12.1 Left shifts | • | • | • | ٠ | • | • | A-23 | | | A 13 3 Diodeb obifho | | | | | | | | # FIGURES | 1-1 | CRAY-2 Computer System Mainframe | 1-3 | |-----|------------------------------------------------------|------| | 1-2 | CRAY-2 Four Background Processor Computer System | | | | Mainframe Configuration | 1-5 | | 2-1 | Control and Data Paths in One Background Processor | 2-2 | | 2-2 | Floating-point Data Format | 2-12 | | 2-3 | 48-by-48 Bit Matrix Used for Floating-point Product | 2-15 | | 2-4 | 48-by-48 Bit Matrix Used for Reciprocal Iteration | 2-18 | | 2-5 | 48-by-48 Bit Matrix Used for Square Root Iteration | 2-20 | | 3-1 | Instruction Parcel Format | 3-2 | | 4-1 | Memory Address for Common Memory | 4-1 | | 4-2 | Error Correction Matrix | 4-4 | | 5-1 | Channel Loop | 5-2 | | A 1 | CRAY-2 Computer System Symbolic Machine Instructions | Δ_2 | #### 1. INTRODUCTION The CRAY-2 computer system is a powerful, general-purpose computer system with extremely high processing rates. Scalar and vector capabilities in a multiprocessing environment combined with integrated foreground processing achieve these high rates. # 1.1 CRAY-2 COMPUTER SYSTEM FEATURES The CRAY-2 computer system mainframe contains either two or four independent Background Processors, each more powerful than a CRAY-1 computer system processor. Featuring a clock-cycle time faster than any other computer system available, each of these processors offers exceptional scalar and vector processing capabilities. The Background Processors can operate independently on separate jobs or concurrently on a single problem. The very high speed Local Memory integral to each Background Processor is available for temporary storage of vector and scalar data. Common Memory is one of the most important features of the CRAY-2 computer system. It consists of either 128 or 256 Mwords, 64-bits long, randomly accessible from any of the Background Processors and from any of the high-speed and common data channels. The memory is arranged in quadrants with 128 interleaved banks. All memory access is performed automatically by the hardware. Any user may use all or part of the memory not being used by the operating system. Control of network access equipment and the high-speed disk drives is integral to the CRAY-2 computer system mainframe hardware. A single Foreground Processor coordinates the data flow between the system's Common Memory and all the external devices across either two or four high-speed I/O channels. The synchronous operation of the Foreground Processor with the Background Processors and the external devices provides a significant increase in data throughput. The most important CRAY-2 computer system features are: - Extremely large directly addressable Common Memory - Fastest cycle time available in a computer system - Scalar, vector, and multiprocessing combined in one system - Integral Foreground Processor - Elegant architecture - Extremely high reliability - High density memory chips and extremely fast silicon logic chips - Liquid immersion cooling #### 1.1.1 PHYSICAL CHARACTERISTICS The CRAY-2 computer system mainframe is elegant in appearance as well as in architecture (see figure 1-1). The memory, computer logic, and DC power supplies are integrated into a compact mainframe composed of 14 vertical columns arranged in a $300^{\circ}$ arc. The upper part of each column contains a stack of logic modules and the lower part contains power supplies for the system. Total cabinet height, including the power supplies, is 45 in. (114.3 cm); the diameter of the mainframe is 53 in. (134.6 cm). Thus, the "footprint" of the mainframe is a mere $16 \text{ ft}^2 (1.49 \text{ m}^2)$ . An inert fluorocarbon liquid circulates in the mainframe cabinet in direct contact with the integrated circuit packages. This liquid immersion cooling technology allows for the small size of the CRAY-2 computer system mainframe and is thus largely responsible for the high computation rates. Significant CRAY-2 computer system physical characteristics are: - Occupies only 16 ft<sup>2</sup> (1.49 m<sup>2</sup>) of floor space - Stands 45 in. (114.3) high, diameter is 53 in. (134.6 cm) - Contains 14 columns arranged in a 300° arc - Contains 3-dimensional modules - Contains liquid immersion cooling - Contains chilled water heat exchange 1353 Figure 1-1. CRAY-2 Computer System Mainframe #### 1.1.2 ARCHITECTURE AND DESIGN In addition to the cooling technology, the extremely high processing rates are achieved by a balanced integration of scalar and vector capabilities and a large Common Memory in a multiprocessing environment. Significant architectural components of the CRAY-2 computer system include the following: - Two or four independent Background Processors capable of vector and scalar operation. Synchronization of the Background Processors is achieved through the Foreground Processor and semaphore flags in the Background Processors. - 128 or 256 Mwords of dynamic Common Memory - A foreground system that controls and monitors system operation, including: - A Foreground Processor for system supervision - Two or four high-speed synchronous communication channels - Up to 40 I/O devices - Disk controllers to control up to 36 disk storage units (DSUs) - Two or four Common Memory ports for data transfer - Two or four Background Processor ports to allow Foreground Processor control - External I/O controllers (from one to as many as four per channel) - HSX controllers (two maximum per channel) The identical Background Processors each contain registers and functional units to perform both vector and scalar operations. The single Foreground Processor supervises the Background Processors. The large Common Memory complements the processors and provides architectural balance, thus assuring extremely high throughput rates (see figure 1-2). Shown in figure 1-2 is the four-processor model. The two-processor version has two high-speed synchronous communication channels. The contents of a channel are the same in each version of the system. On-site maintenance is possible through the maintenance control console. Figure 1-2. CRAY-2 Four Background Processor Computer System Mainframe Configuration # 1.2 CONVENTIONS This manual uses the following conventions: | Convention | Description | |-----------------------------|---------------------------------------------------------------------------| | lowercase<br>italics | Variable information | | X or x or X | An ignored value | | n | An unknown variable value | | (xx) | The contents of a register designated by the $\mathbf{X}\mathbf{x}$ value | | Register bit<br>designators | Numbered right to left as powers of 2, starting with $2^{0}$ . | Unless otherwise indicated, numbers in this manual are decimal numbers. Octal numbers are indicated with an 8 subscript. Exceptions are instruction parcels in instruction buffers and instruction forms which are given in octal without the subscript. # 1.2.1 EXAMPLES Illustrations of the above conventions. | <pre>Example</pre> | Description | |----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| | Transmit (Ak) to Si | Transmit the contents of the A register specified by the $k$ designator to the S register specified by the $i$ designator | | 167 <i>ixk</i> | Machine instruction 167 where the $j$ register designator is not used and is an ignored value | | Read <i>n</i> words from memory | Read an unknown variable number of words from memory. You can read, within the stated restrictions, as few or many words from memory as you wish. | | Bit 2 <sup>63</sup> of an S or V<br>register | Value represents the most significant bit | | <u>Example</u> | Description | |-------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Bit $2^{31}$ of an A register | Value represents the most significant bit | | VM register element | The VM register contains 64 bits, each corresponding to a word element in a Vector register. Bit $2^{63}$ corresponds to element 0, bit $2^{0}$ corresponds to element 63. | # 1.3 ORGANIZATION This manual is organized into the following sections: | Section | Description | |------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 1 | Contains the introduction to this manual | | 2 | Describes the CRAY-2 computer system Background Processor. The registers, functional units, and algorithms used are described. | | 3 | Provides detailed information on the CAL instructions that operate on the CRAY-2 computer system. Each machine instruction can be represented symbolically in Cray Assembly Language (CAL) Version 2. The instructions are listed octally in a box format that provides the Cray Assembly Language (CAL) Version 2 syntax format, an operand if required, a brief description of each instruction, and the machine instruction. | | | Following the boxed information is a detailed description of the instruction and an example using the instruction. | | 4 | Describes the CRAY-2 Common Memory, phased memory access, and single-error correction/double-error detection (SECDED) | | 5 | Describes the CRAY-2 Foreground System, which handles the $\ensuremath{\text{I/O}}$ | | Appendix A | Lists the symbolic machine instructions by function. The octal machine code can be used as an index when referring to section 3 for a detailed description of the instruction. | | | _ | |--|----------------| | | _ | | | . <del>-</del> | | | _ | | | _ | | | | | | | | | - | | | - | | | _ | | | | | | - <del>-</del> | | | - | | | | | | _ | | | | | | - | | | - | | | | | | | | | <b>-</b> | | | - | | | _ | | | | #### 2. BACKGROUND PROCESSOR The CRAY-2 computer system has either two or four identical Background Processors each containing operating and vector control registers, and functional units to perform both vector and scalar operations. The Foreground Processor supervises the Background Processors. A Background Processor performs arithmetic and logical calculations. These operations, and the other functions of a Background Processor are coordinated through the control section. Figure 2-1 shows the control and datapaths for one Background Processor. #### 2.1 CONTROL SECTION Each Background Processor contains an identical, independent control section of registers and instruction buffers for instruction issue and control. This section describes the following control mechanisms: - Instruction issue and control - Real-time clock - Semaphore flags - Common Memory field protection ### 2.1.1 INSTRUCTION ISSUE AND CONTROL Each Background Processor contains a Program Address register, an instruction buffer with eight fields, and an instruction issue control mechanism to implement instruction issue and control. ## Program Address register Each Background Processor has a 32-bit Program Address (P) register indicating the address of the program instruction parcel currently in the issue position during normal operation. The Foreground Processor loads the P register with data at the beginning of a computation period. As each parcel issues from the instruction queue, the contents of the P register advance by 1. The P register contents are reset to the branch destination address when a jump instruction is executed. 1321 Figure 2-1. Control and Data Paths in One Background Processor ## Instruction buffers Each Background Processor has a buffer with eight independent fields to allow program loops to execute without additional Common Memory references. Programs can loop within the instruction buffer using any of the branch instructions. Each independent field contains 16 words. The total instruction buffer size is 128 words. The next sequential instruction out of the instruction buffer or a branch out of the instruction buffer discards the oldest data field and replaces it with 16 words of new data. ## Instruction issue Background instructions are translated in several steps and are allowed to issue sequentially by an instruction issue control mechanism. The words are disassembled into 16-bit parcels that are placed in a queue where the translation occurs. The instruction issue process involves checking the reservation flags for the registers and functional unit involved in the instruction sequence. The parcel waits in issue position in the instruction queue until all required resources are free. Instruction parcels and 16-bit constants are intermixed in the instruction queue. The constant parcels are passed through the instruction queue without test. ### 2.1.2 REAL-TIME CLOCK Each Background Processor has a 64-bit register that counts continuously at the clock period rate. This count value determines the passage of real time to an accuracy of 1 clock period (CP). The real-time clocks in the Background Processors are synchronized at deadstart. Instruction 115 reads the real-time clock. ### 2.1.3 SEMAPHORE FLAGS To synchronize Common Memory references, eight semaphore flags in the background system interlock Common Memory references when multiple Background Processors are executing a single job. One semaphore flag is assigned to each currently active job in the background system. A Background Processor assigned to a job is assigned a semaphore flag at the same time. The Background Processor uses four instructions in synchronizing its Common Memory references: 004, 005, 006, and 007. A 004 or 005 instruction requests the semaphore flag when the Background Processor program is accessing a Common Memory area that can interfere with other processors assigned to the job. The branch instruction results determine when the processor has exclusive access to this Common Memory area. The program must clear the semaphore flag to release the Common Memory area to another processor assigned to the same job. #### 2.1.4 COMMON MEMORY FIELD PROTECTION At execution time each object program has a designated field of Common Memory holding instructions and data. The foreground functions specify the field limits when the object program is loaded and initiated. Field limits are contained in the Base Address (BA) register and the Limit Address (LA) register. All memory addresses contained in the object program code are relative to the base address beginning the defined field. An object program cannot read or alter any Common Memory location with an absolute address lower than the base address. Each object program reference to Common Memory is checked against the limit and base addresses to determine if the address is within the assigned bounds. ## Base Address register Each Background Processor has a 32-bit BA register. The BA register defines the lower boundary of the Common Memory address field. The Foreground Processor enters data into this register while the Background Processor is in idle mode. The data remains in the register for the duration of the Background Processor computation period. Each Common Memory reference from the Background Processor includes the addition of the BA register contents to the other parts of the memory reference base address. All Background Processor references to Common Memory are relative to the base address boundary. ## Limit Address register Each Background Processor has a 32-bit LA register. The LA register defines the upper boundary of the Common Memory address field. The Foreground Processor enters data into this register while the Background Processor is in idle mode. The data remains in this register for the duration of the Background Processor computation period. # Memory range error When a memory reference exceeds the range limits, a memory range error occurs. Each Common Memory reference from the Background Processor includes a test of the resulting absolute Common Memory address against the contents of the BA and LA registers. An error signal is sent to the status register if the resulting absolute Common Memory address is less than the base address or equal to, or greater than, the limit address. A read reference results in zero data for this case. A write reference is aborted. # 2.2 OPERATING REGISTERS Each Background Processor contains the following independent set of operating registers: - Address - Scalar - Vector Operating registers, a primary programmable resource of the Background Processor, enhance the speed of the system by satisfying heavy demands for data made by functional units. Different functional units can be used concurrently. #### 2.2.1 ADDRESS REGISTERS Eight 32-bit Address (A) registers are used primarily to store memory locations for Local Memory and Common Memory references. A registers are used for 32-bit integer calculations and to move data directly from Local Memory. Data is also transferred between Address and Scalar registers. ### 2.2.2 SCALAR REGISTERS Eight 64-bit Scalar (S) registers serve as source and destination for operands executing scalar arithmetic and logical instructions. S registers can furnish one operand in vector instructions. The eight 64-bit S registers in a Background Processor support Vector (V) registers in operations when one element of the computation is a constant value. The S registers function as computational way stations between Common Memory and the functional units where vector implementation of the work is not possible. #### 2.2.3 VECTOR REGISTERS The major computational registers of the Background Processor are eight Vector (V) registers, each having 64 elements. Each V register element has 64 bits. When associated data is grouped into successive elements of a V register, the register quantity is treated as a vector. Examples of vector quantities are rows or columns of a matrix, and elements of a table. Computational efficiency is achieved by identically processing each element of a vector. Vector instructions provide for the iterative processing of successive V register elements. A vector operation begins by obtaining operands from the first element of one or more V registers and delivering the result to the first element of a V register. Successive elements are provided during each CP, and as each operation is performed, the result is delivered to successive elements of the result V register. Vector operation continues until the number of operations performed by the instruction equals a count specified by the contents of the Vector Length register (described in subsection 2.3). Since many vectors exceed 64 elements, longer vectors are processed as one or more 64-element segments and a possible remainder of less than 64 elements. The instruction issue control mechanism reserves the V registers that are involved in a functional unit operation. One, two, or three V registers can be involved, depending on the specific instruction. The functional unit is reserved at the same time as the V registers. The instruction sequence can then proceed to the next instruction and initiate concurrent activity as long as the resources reserved are not required. The i, j, and k designators in a vector instruction can have the same value; it is advised, however, that the i designator always has a unique value. In the case of identical source operands, the data is streamed from the same V register to both data paths. In the case of a destination register that is the same as a source register, the V register writing function takes priority over reading. When this occurs, the reading vector delivers all zero words to the functional unit. ## 2.3 VECTOR CONTROL REGISTERS The Vector Length (VL) register and the Vector Mask (VM) register provide control information needed in the performance of vector operations. ## 2.3.1 VECTOR LENGTH REGISTER The Vector Length (VL) register is a 6-bit special purpose register explicitly referenced in the Background Processor instructions. The VL register holds the vector length during a portion of the background computation. All vector operations capture the vector length at the time of instruction issue from the VL register. Vector registers always begin a read or write operation at the zero element position in the V register. Elements are read or written sequentially for the length of the current vector data. A short vector after a long vector leaves the old vector data in those positions not replaced with new data. Values allowed in the VL register are 0 through 63. A zero value is interpreted as 64. Background instructions 025 and 036 communicate explicitly with the VL register. #### 2.3.2 VECTOR MASK REGISTER The Vector Mask (VM) register is a 64-bit special purpose register explicitly referenced by the Background Processor instructions. The VM register merges vector data according to a set of precomputed Element flags. In effect, it provides a vehicle for implementing vector branch operations. One bit of the VM register is associated with each element in the 64-element vector registers. The high-order bit $(2^{63})$ of the vector mask corresponds to element 0 of the vector data. The bits of the mask then proceed in order to represent the following vector elements. The vector mask data can be formed by a vector operation in which each element is evaluated for a specific criterion. Instructions 030 through 033 perform these tests. The VM register is cleared at the beginning of these instruction sequences and then bits are entered one at a time as the vector stream passes the test station. The vector mask data can be used to merge two vector streams into a single result stream. Instructions 146 and 147 are used for this purpose. Elements of the j operand are selected when the mask contains 1 bits. Elements of the k operand are selected when the mask contains 0 bits. Instructions 034 and 114 move data between the VM register and an S register. ## 2.4 FUNCTIONAL UNITS Each Background Processor has a set of functional units to implement algorithms for the instruction set. A number of functional units can operate simultaneously. Each functional unit produces one result per CP. No information is retained in a functional unit for reference by subsequent instructions. A functional unit receives operands from registers and delivers the result to a register when the function has been performed. Functional units operate essentially in three-address mode. Nonvector functional units can accept operands as fast as the instructions can issue. A functional unit engaged in a vector operation remains busy for the duration and cannot participate in other operations. In this state, the functional unit is reserved. Other instructions requiring the same functional unit do not issue until the previous operation is completed. Only one functional unit of each type is available to the vector instruction hardware. When the vector operation completes, the reservation is dropped and the functional unit is then available for another operation. Each Background Processor has the following set of functional units: - Address Add - Address Multiply - Scalar Integer - Scalar Shift - Scalar Logical - Vector Integer - Vector Logical - Floating-point Add - Floating-point Multiply In addition, a Background Processor contains a Local Memory which is a buffer for the A, S, and V register data. ## 2.4.1 ADDRESS ADD FUNCTIONAL UNIT The Address Add unit performs 32-bit integer addition and subtraction of two A register operands. (Instruction 020 performs integer sums and 021 performs integer differences.) This unit can accept address operands as fast as the instructions can issue. ## 2.4.2 ADDRESS MULTIPLY FUNCTIONAL UNIT The Address Multiply unit performs 32-bit integer multiplication of two A register operands. (Instructions 022 and 023 perform integer products.) This unit can accept address operands as fast as the instructions can issue. #### 2.4.3 SCALAR INTEGER FUNCTIONAL UNIT The Scalar Integer unit performs 64-bit integer addition and subtraction of S register operands. (Instruction 104 performs integer sums and instruction 105 performs integer differences.) It also performs population count (instruction 106ij0), population count parity (instruction 106ij1), and leading zero (instruction 107). This unit can accept scalar operands as fast as the instructions can issue. #### 2.4.4 SCALAR SHIFT FUNCTIONAL UNIT The Scalar Shift unit shifts the entire 64-bit contents of an S register (instruction 110 left or 111 right) or the double 128-bit contents of two concatenated S registers (instruction 112 left or 113 right). This unit can accept scalar operands as fast as the instructions can issue. ## 2.4.5 SCALAR LOGICAL FUNCTIONAL UNIT The Scalar Logical unit manipulates bit-by-bit the 64-bit quantities obtained from S registers. (Instruction 100 performs logical products, instruction 101 performs logical products complemented, instruction 102 performs logical differences, and instruction 103 performs logical sums.) This unit can accept scalar operands as fast as the instructions can issue. #### 2.4.6 VECTOR INTEGER FUNCTIONAL UNIT The Vector Integer unit performs vector shifts (instruction 150 for left single, instruction 151 for right single, instruction 152 for left double, and instruction 153 for right double), vector integer arithmetic (instructions 160 and 161 for integer sums and instructions 162 and 163 for integer differences), vector population count (instruction 164ij0 for population count and instruction 164ij1 for population parity), vector leading zero count (instruction 165), and compressed iota (instruction 176). The unit can accept operand data each CP, and after a transit time delay, can deliver a result each CP. HR-2000 2-9 B #### 2.4.7 VECTOR LOGICAL FUNCTIONAL UNIT The Vector Logical unit manipulates bit-by-bit the 64-bit quantities from two V registers or from V registers and S registers (instructions 140 and 141 perform logical products, instructions 142 and 143 perform logical differences, and instructions 144 and 145 perform logical sums). The unit can accept operand data each CP, and after a transit time delay, can deliver a result each CP. #### 2.4.8 FLOATING-POINT ADD FUNCTIONAL UNIT The Floating-Point Add unit performs addition or subtraction of 64-bit operands in floating-point format for both scalar and vector operations. It also performs the conversion between integer and floating-point. See subsection 2.5.2, Floating-point Arithmetic, for a description of the instructions that use this unit. The unit is reserved for the time of a vector stream during execution of vector addition instructions. The unit can accept vector operand data each CP, and after a transit time delay, can deliver a result each CP. The unit can accept scalar references as fast as they issue if the unit is not processing vector data. #### 2.4.9 FLOATING-POINT MULTIPLY FUNCTIONAL UNIT The Floating-Point Multiply unit performs full multiplication of 64-bit operands in floating-point format for both scalar and vector operations. It also performs reciprocal approximation, reciprocal square root approximation, reciprocal iteration, and reciprocal square root iteration. See subsection 2.5.2, Floating-point Arithmetic, for a description of the instructions that use this unit. The unit is reserved for the time of a vector stream during execution of vector Floating-Point Multiply unit instructions. The unit can accept vector operand data each CP, and after a transit time delay, can deliver a result each CP. The unit can accept scalar multiply, reciprocal iteration, reciprocal square root iteration references as fast as they issue if the unit is not processing vector data. Scalar reciprocal approximation and reciprocal square root approximation references place a 4 CP reservation on the functional unit. ## 2.4.10 LOCAL MEMORY Each Background Processor contains 16,384 64-bit words of Local Memory. This memory holds scalar operands during a computation period. The Local Memory also can be used for temporary storage of vector elements when these elements are used more than once in a computation in the V registers. Instructions that use Local Memory are: - 044 and 046 read from Local Memory to A register - 045 and 047 write to Local Memory from A register - 054 and 056 read from Local Memory to S register - 055 and 057 write to Local Memory from S register - 074 read from Local Memory to V register - 075 write to Local Memory from V register ## 2.5 ARITHMETIC OPERATIONS Functional units in the Background Processor perform either twos complement integer arithmetic or floating-point arithmetic. #### 2.5.1 INTEGER ARITHMETIC All integer arithmetic, whether 32 bits or 64 bits, is twos complement. The Address Add and Address Multiply units perform 32-bit arithmetic. The Scalar Integer unit performs scalar 64-bit arithmetic and the Vector Integer unit performs vector 64-bit arithmetic. Integer representations of the integers 0, +1, and -1 in 32-bit and 64-bit format are shown using octal notation. | Integer | 32-bit Format | 64-bit Format | |---------|---------------|-----------------------------------------| | 0 | 0000000000 | 0000000000000000000000 | | +1 | 0000000001 | 000000000000000000000000000000000000000 | | -1 | 3777777777 | 1777777777777777777777777 | Multiplication of two scalar integer operands is accomplished by using the floating-point multiply instruction. Division is done by using an algorithm; the particular algorithm used depends on the number of bits in the quotient. ## 2.5.2 FLOATING-POINT ARITHMETIC Floating-point numbers are represented in a standard format throughout the Background Processor. This format is a packed representation of a binary coefficient and an exponent. The coefficient is a 48-bit signed fraction. Figure 2-2 shows the sign of the coefficient is separated from the rest of the coefficient. Since the coefficient is signed magnitude, it is not complemented for negative values. HR-2000 2-11 B Figure 2-2. Floating-point Data Format The exponent portion of the floating-point format is represented as a biased integer in bits $2^{62}$ through $2^{48}$ . The bias that is added to the exponents is $40000_8$ . The positive range of exponents is $40000_8$ through $57777_8$ . The negative range of exponents is $37777_8$ through $20000_8$ . Thus, the unbiased range of exponents is the following (the negative range is one larger): 2-200008 through 2+177778 In terms of decimal values, the floating-point format of the Background Processor allows the accurate expression of numbers to about 15 decimal digits in the approximate decimal range of $10^{-2466}$ through $10^{+2466}$ . A floating-point representation of the integers 0, +1, and -1 in normalized form is shown using octal notation for each of the three fields. | Integer | Floating-point Representation | |---------|-------------------------------| | 0 | 0 00000 000000000000000 | | +1 | 0 40001 4000000000000000 | | -1 | 1 40001 4000000000000000 | # Normalizing A nonzero floating-point number is normalized if the most significant bit of the coefficient is nonzero. This condition implies the coefficient has been shifted as far left as possible and the exponent adjusted accordingly. Therefore, the floating-point number has no leading zeros in the coefficient. The exception is that a normalized floating-point zero is all zeros. When a floating-point number is created by inserting an exponent of $40060_8$ into a 48-bit integer word, the result should be normalized before being used in a floating-point operation. Normalization can be accomplished by adding the unnormalized floating-point operand to 0 (see subsection Integer to Floating-point Conversion, later in this section). ## Range errors Exponent values of $60000_8$ and greater are considered to have overflowed the exponent range. Hardware tests are performed for these values to indicate floating-point range error. Exponent values less than $20000_8$ are considered to have underflowed the floating-point range. Such values are treated as if they had a zero value. The hardware does not indicate when a computation underflows the floating-point range. Whether or not range errors are enabled, when an overflow condition is detected by the hardware the result exponent is forced to an overflow value. Each floating-point operation forces a signature exponent as follows: | Floating-point add/subtract | 60000 <sub>8</sub> | |------------------------------------------|--------------------| | Floating-point multiply | 600018 | | Floating-point reciprocal approximation | 600028 | | Floating-point square root approximation | 60004 <sub>8</sub> | ## Floating-point addition The Floating-point Add unit forms the sum of two operands in floating-point format and delivers a result in floating-point format. The result is always normalized regardless of source operand status. Instructions 120, 170, and 171 use the Floating-point Add sequence. In the process of adding two floating-point operands, one operand coefficient is shifted right for exponent matching. The coefficient from this shifting operation is rounded up. A special test is made for all 0 bits in the result coefficient. When this occurs, the exponent field in the result is also cleared. A word of all zeros is delivered to the destination register. A special test is made for one or both operands with an overflow exponent. An error signal is sent to the Background Port Status register (see section 5) if range errors are enabled, and an overflow exponent $(60000_8)$ is forced in the result delivered to the destination register. # Floating-point subtraction The Floating-point Add unit forms the difference of two operands in floating-point format and delivers a result in floating-point format. Instructions 121, 172, and 173 use the floating-point subtraction sequence. HR-2000 2-13 B ## Floating-point to integer conversion The Floating-point Add unit forms an integer representation of a floating-point operand. This process is accomplished by adding the operand to a constant integer. Instructions 122 and 174 use this form of the floating-point add sequence. The maximum size of the resulting integer value is 48 bits. A positive or negative result is sign extended to form a 64-bit integer result. An operand with a floating-point value greater than a 48-bit integer is an error condition. An error signal is sent to the Background Port Status register if floating-point range errors are enabled, and a zero result is delivered to the destination register. # Integer to floating-point conversion The Floating-point Add unit forms a floating-point representation of an integer operand. This process is accomplished by adding the operand to a constant and using the floating-point normalize hardware to form the proper floating-point result. Instructions 123 and 175 use this form of the floating-point add sequence. The maximum allowable size of the integer operand is 48 bits, if greater no error is flagged. The bits above 48 bits are discarded during the operation. ## Floating-point product The Floating-point Multiply unit forms the product of two operands in floating-point format and delivers a result in floating-point format. If both operands are normalized, the result is also normalized. Instructions 124, 154, and 155 use this sequence. The 48 by 48 matrix of logical product bits is truncated 8 bit positions below the low-order result coefficient bit (see figure 2-3). Round bits are added to this lower field to give an equal population of high and low round errors for random operands. A round bias exists over narrow ranges of operands because of the 1-bit correction shift after the round operation. The following special cases are treated in floating-point multiplication for operands out of range: - 1. One or both operands have overflow exponent. - 2. Sum of operand exponents is an overflow. - 3. Sum of exponents is an underflow. - 4. Both exponents are all zeros. For instructions 124, 132, 133, 154, 166, and 167, bits $2^{-49}$ through $2^{-56}$ are used for rounding. Bits $2^{-50}$ and $2^{-51}$ are the round bits and bits $2^{-53}$ through $2^{-56}$ compensate for truncation. Figure 2-3. 48-by-48 Bit Matrix Used for Floating-point Product Cases 1 and 2 cause a Floating-point Error signal to be sent to the Background Port Status register if the floating-point range errors are enabled. The result delivered to the destination register is forced to an overflow exponent value $(60001_8)$ . Case 3 results in an all-zero word sent to the destination register. Case 4 computes the coefficients with no normalize correction. The resulting exponent for this case is 0, which aids multiple-precision and integer calculations. ## Reciprocal approximation The Floating-point Multiply unit forms an approximation to the reciprocal of a floating-point operand value. Instructions 132 and 166 use this sequence. The values from a table are used in a linear interpolation computation. The following example shows the form of this computation. ## Example: In this example, A is a reciprocal approximation for the high-order 12 bits of operand coefficient, B is the operand coefficient, and R is the better reciprocal approximation. Then the iteration step for interpolation is: $$R = 2A - A*A*B$$ The two approximations read from a table are 2A and -A\*A. The normal multiply mechanism is then used to form the product with the additional term included in the summing process. Two special cases occur in the reciprocal approximation sequence. - Operand exponent has overflow value. - Operand exponent has underflow value. Both cases cause an error signal to be sent to the Background Port Status register if the floating-point range error is enabled and cause the computational result exponent to be forced to an overflow value $(60002_8)$ . ## Reciprocal iteration \*\*\*\*\*\*\*\*\*\*\*\*\*\* #### CAUTION The reciprocal iteration instructions (126 and 156) should be used only with the reciprocal approximation instructions (132 and 166) and should only be used for one additional iteration. Operands not generated by the reciprocal approximation instructions may not deliver the expected result. \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* The Floating-point Multiply unit forms a floating-point number that is used in a second iteration for the reciprocal of a full-precision operand. The first iteration is formed in the reciprocal approximation previously described. The second iteration uses the same process to form a reciprocal approximation with 46 bits of coefficient accuracy. Instructions 126 and 156 use this sequence (see figure 2-4). The division algorithm that computes S1/S2 to full precision requires four operations. | 1. | S1 = a | Dividend | |----|---------------|------------------------------------------------------| | • | S2 = b | Divisor | | • | S3 = /HS2 | 1/ $b_1$ - Half-precision reciprocal | | 2. | S4 = S2 * IS3 | <pre>C = (2 - S2 * S3) - Correction factor</pre> | | 3. | S5 = S3 * FS4 | $b_2 = (1/b_1 * c) -$ reciprocal | | 4. | S6 = S1 * FS5 | $x = (a * 1/b_2) - \text{full}$ precision reciprocal | For instructions 126 and 156, bits $2^{-49}$ through $2^{-56}$ are used for rounding. Bits $2^{-50}$ and $2^{-51}$ are the round bits and bits $2^{-53}$ through $2^{-56}$ compensate for truncation. Figure 2-4. 48-by-48 Bit Matrix Used for Reciprocal Iteration ## Reciprocal square root approximation The Floating-point Multiply unit forms an approximation to the reciprocal square root of a floating-point operand value. Instructions 133 and 167 use this sequence. The values from the table are used in a linear interpolation computation. The following example shows the form of this computation. ## Example: In this example, A is a reciprocal square root approximation for the operand coefficient, B is the operand coefficient, and R is the better reciprocal square root approximation. The iteration step for interpolation is: $$R = (3A/2) - (A*A*A*B/2)$$ The two approximations read from the table are 3A/2 and -A\*A\*A/2. The normal multiply mechanism is then used to form the product with the additional term included in the summing process. Three special cases occur in the reciprocal square root approximation sequence. - 1. Operand exponent has overflow value. - 2. Operand exponent has value of 0 through 3. - 3. Operand is a negative value. Cases 1 and 3 cause an error signal to be sent to the Background Port Status register. All three cases cause the computational result exponent to be forced to an overflow value $(60004_8)$ . ## Reciprocal square root iteration \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* ### CAUTION The square root iteration instructions (127 and 157) should be used only with the reciprocal square root approximation instructions (133 and 167) and should only be used for one additional iteration. Operands not generated by the reciprocal square root approximation instructions may not deliver the expected result. \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* For instructions 127 and 157, bits $2^{-49}$ through $2^{-56}$ are used for rounding. Bits $2^{-50}$ and $2^{-51}$ are the round bits and bits $2^{-53}$ through $2^{-56}$ compensate for truncation. Figure 2-5. 48-by-48 Bit Matrix Used for Square Root Iteration The Floating-point Multiply unit forms a floating-point number which is used in a second iteration for the reciprocal square root of an operand. The first iteration is formed in the reciprocal square root approximation previously described. The second iteration uses the same process to form a reciprocal square root with 46 bits of coefficient accuracy. Instructions 127 and 157 use this sequence (see figure 2-5). The square root algorithm that computes the square root of S1 requires five operations. 1. S1 = X Find square root of X . S2 = \*QS1 y = 1/ sqrt(x) - Half-precision reciprocal square root approximation 2. S3 = 1 S4 = S1 : S3 Force x odd before doing the iteration 3. S5 = S4 \* FS2 x \* y 4. S6 = S2 \* QS5 z = (3 - x \* y \* y)/2 - Square root iteration correction factor 5. S7 = S5 \* FS6 Sqrt (x) = (x \* y) \* z - full precision square root | . – | |-------------| | _ | | | | | | <del></del> | | _ | | _ | | _ | | | | | | _ | | | | _ | | _ | | _ | | | | | | <b>-</b> | | | | _ | | | #### 3. BACKGROUND PROCESSOR SYMBOLIC MACHINE INSTRUCTIONS This section contains detailed information about individual instructions or groups of related instructions. Each instruction begins with boxed information consisting of the Cray Assembly Language (CAL) Version 2 syntax format, an operand (if required), a brief description of each instruction, and the machine instruction (octal code sequence defined by the f field). Following the boxed information is a more detailed description of the instruction and an example using the instruction. ## 3.1 SYMBOLIC INSTRUCTION FORMAT The following special characters can appear in the operand field of symbolic machine instructions and are used by the assembler in determining the operation to be performed. | Character | Description | |---------------|---------------------------------------------------| | | | | + | Integer sum of adjoining registers | | +F,+f | Floating-point sum of adjoining registers | | - | Integer difference of adjoining registers | | -F,-f | Floating-point difference of adjoining registers | | * | Integer product of adjoining registers | | *F,*f | Floating-point product of adjoining registers | | *I,*i | Floating-point reciprocal iteration of adjoining | | | registers | | <b>p*</b> ,Q* | Floating-point square root approximation | | <b>p*</b> ,Q* | Floating-point square root iteration of adjoining | | | registers | | /H,/h | Floating-point reciprocal approximation | | # | Use ones complement | | > | Shift value or form mask from left to right | | < | Shift value or form mask from right to left | | & | Logical product of adjoining registers | | ! | Logical sum of adjoining registers | | \ | Logical difference of adjoining registers | | CI,ci | Compressed iota | | F,f | Full load (64-bits) | | FIX, fix | Convert from floating-point to integer | | FLT, flt | Convert from integer to floating-point | | H,h | Half load (32-bits) | | L,1 | Left load (32-bits) | | M, m | Negative | | | - | | Character | Description | |-----------|-----------------------| | N,n | Nonzero | | P,p | Parcel load (16 bits) | | P,p | Population count | | P,p | Positive | | p,Q | Parity count | | S,s | Short load (6 bits) | | Z,z | Leading-zero count | | Z,z | Zero | ### 3.2 MACHINE INSTRUCTION FORMAT The Background Processors translate instructions in 16-bit parcels of data. These parcels are packed 4 per word in the Common Memory. The parcels are addressed as if the Common Memory had four times as many locations and the data were 16 bits long. Figure 3-1 illustrates the format of a 16-bit instruction parcel. | f | i | j | k | |---|---|---|---| | 7 | 3 | 3 | 3 | Figure 3-1. Instruction Parcel Format As shown in figure 3-1, the f designator is the operation code. The i, j, and k designators generally refer to V, S, or A registers in a three-address format. The i designator generally specifies the destination register for the functional computation. The j and k designators generally specify the source operands. Uppercase or lowercase designators for the registers are allowed in CAL. Registers can be entered in mixed case letters and have the same meaning. Mnemonics can be entered in all uppercase or all lowercase and have the same meaning. Both cases are used in the symbolic instruction descriptions. The instructions are listed in lowercase and the written descriptions in uppercase for visual clarity. Some instructions include additional parcels of constant data. An instruction can contain the following parcels of constant data depending on the specific instruction: - 1 (m<sub>1</sub>) - 2 $(m_1 \text{ and } m_2)$ - 4 $(m_1, m_2, m_3, and m_4)$ Single parcel constants generally address the Local Memory. Two parcel constants address Common Memory or enter a 32-bit value into an A or S register. Four parcel constants enter 64-bit values in the S registers. When instructions read constants from the following parcels in the instruction stream, the program address is advanced over these data parcels to point to the next instruction. The high-order data parcel is read first for multiparcel data. ## 3.3 INSTRUCTION DESCRIPTIONS The instruction descriptions begin with the octal code for the high-order 7 bits of the parcel (f designator). The three octal register designators (i, j, and k) then follow. An x appears in the description where a register's designator is ignored. CAL will insert a zero for every x. | - | |----------| | | | | | _ | | | | | | _ | | _ | | _ | | <u> </u> | | | | _ | | _ | | _ | | _ | | _ | | <b>-</b> | | _ | | _ | ## INSTRUCTIONS 000 - 001 | Result | Operand | Description | Machine<br>Instruction | |--------|---------|--------------------|------------------------| | err | | Error exit | 000x000 | | exit | | Normal exit | 000 <b>x</b> 01 | | exit | ехр | Normal exit | 000xjk | | | | Executes as 000xjk | 001xjk | Instructions 000 and 001 stop the current program sequence, place the Background Processor in idle mode, and set the Exit Mode and Idle Mode flags in the Background Port Status register. The 6-bit jk value is entered into the Background Port Status register. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|-----------|---------|---------| | | 1 | 10 · | 20 | 35 | | 1 | ļ | | | 1 | | 1000000 | - | err<br> | | 1 | | 000001 | i | <br> exit | | į | | <br> 000004 | <br> | <br> exit | <br> 4 | | | i<br>İ | i | İ | i | i | | Result | Operand | Description | Machine<br>Instruction | |------------------|----------------|-------------------------------------------------------|------------------------| | r,a <sub>i</sub> | a <sub>k</sub> | Register jump to $(a_k)$ with return address to $a_i$ | 002 <i>ixk</i> | | j | a <sub>k</sub> | Register jump to $(a_k)$ , value in $a_k$ erased | 002 <i>kxk</i> | Instruction 002 stops the current program sequence and begins a new sequence at a computed parcel address read from the $\mathbf{A}_k$ register. The parcel address for the next instruction in the current program sequence is entered into the $\mathbf{A}_i$ register. | Code Generated | Location 1 | Result<br>10 | Operand<br>20 | Comment<br>35 | |-----------------|------------|--------------|---------------|---------------| | 002102 | | <br> r,a1 | <br> a2 | | | <br> 002101<br> | | l<br> j<br> | <br> a1<br> | <br> | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|--------------------|--------------------------------------| | j | ехр | Unconditional jump | 003xxx m <sub>1</sub> m <sub>2</sub> | Instruction 003 stops the current program sequence and begins a new sequence at a specified constant parcel address read from the next 2 parcels in the instruction queue. ## For the expression: - A word address is not allowed. - An immobile relative attribute is not allowed. - A parcel address is forced if the expression has a value attribute. - If the expression is relocatable, it must be relative to either a mixed or code section targeted for Common Memory. | Code Generated | Location 1 | Result<br>10 | Operand<br>20 | Comment<br>35 | |---------------------|------------|--------------|---------------|---------------| | | | | | | | 003000 00000000012d | | | +43 | | | | | j | | | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|------------------------------------------------------------|--------------------------------------| | jcs | ехр | Jump to constant parcel if Semaphore clear; set Semaphore. | 004xxx m <sub>1</sub> m <sub>2</sub> | | jss | ехр | Jump to constant parcel if Semaphore set; set Semaphore. | 005xxx m <sub>1</sub> m <sub>2</sub> | Instructions 004 and 005 conditionally stop the current instruction sequence and begin a new sequence at a specified constant parcel address read from the next 2 parcels in the instruction queue. The branch is conditional on the state of the Semaphore flag assigned to this Background Processor. The Backgound Port Status register points to the Semaphore flag. The Semaphore flag is set for either instruction if it was not previously set. The Semaphore flag bit in the Backgound Port Status register is set if either instruction alters the state of the flag from 0 to 1. # For the expression: - A word address is not allowed. - An immobile relative attribute is not allowed. - A parcel address is forced if the expression has a value attribute. - If the expression is relocatable, it must be relative to either a mixed or code section targeted for Common Memory. | Code Generated | Location | Result | Operand | Comment | |---------------------|----------|--------|---------|---------| | | 1 | 10 | 20 | 35 | | | i | İ | | | | 004000 00000000025a | İ | jcs | 1+83 | i | | | -À | ĺ | | İ | | 005000 00000000025a | ĺ | liss | 83+1 | i | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|---------------|------------------------| | ssm | | Set Semaphore | xxx600 | Instruction 006 sets the Semaphore flag assigned to this Background Processor without regard to its previous state. The Semaphore flag bit in the Backgound Port Status register is set if the previous state of the Semaphore flag was a 0. The operating system program uses this instruction to restore Semaphore flag values at the time of job restart. | Code Generated | Location 1 | Result | Operand<br>20 | Comment 35 | | |----------------|------------|---------------|---------------|------------|---| | i<br> 006000 | Ï | <br> <br> ssm | İ | İ | | | İ | İ | | İ | İ | i | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-----------------|------------------------| | csm | | Clear Semaphore | 007xxx | Instruction 007 clears the Semaphore flag assigned to this Background Processor without regard to its previous value. When this instruction executes, the semaphore bit in the Backgound Port Status register is cleared. A Background Processor program may use this instruction to release access to a privileged area of Common Memory for other processors assigned to this job. This instruction issues without delay. Execution of the function may be delayed, however, by activity in the Common Memory port. The following instruction does not issue until the Common Memory quadrant buffers are clear. The delay ensures that any Common Memory write operations have been completed before another processor is allowed access to the privileged area. | Code Generated | Location 1 | Result | Operand<br>20 | Comment | |-----------------|------------|--------------|---------------|---------| | <br> 007000<br> | | <br> csm<br> | <br> | | #### INSTRUCTIONS 010 - 013 | Result | Operand | Description | Machine<br>Instruction | |--------|---------------------|-------------------------------|--------------------------------------| | jz | $a_k$ , exp | Branch if $(a_k)$ is zero | 010xxk m <sub>1</sub> m <sub>2</sub> | | jn | a <sub>k</sub> ,exp | Branch if $(a_k)$ is nonzero | 011xxk m <sub>I</sub> m <sub>2</sub> | | jp | a <sub>k</sub> ,exp | Branch if $(a_k)$ is positive | 012xxk m <sub>1</sub> m <sub>2</sub> | | jm | a <sub>k</sub> ,exp | Branch if $(a_k)$ is negative | 013xxk m <sub>1</sub> m <sub>2</sub> | Instructions 010 through 013 conditionally stop the current instruction sequence and begin a new sequence at a specified constant parcel address read from the next 2 parcels in the instruction queue. The contents of the $A_{k}$ register determine the branch condition. The current program sequence is continued if the branch criterion is not met. ## For the expression: - A word address is not allowed. - An immobile relative attribute is not allowed. - A parcel address is forced if the expression has a value attribute. - If the expression is relocatable, it must be relative to either a mixed or code section targeted for Common Memory. | Code Generated | Location | Result | Operand | Comment | |---------------------|----------|-----------|-----------|---------| | | 1 | 10 | 20 | 35 | | 010001 000000000000 | | <br> jz | <br> a1,0 | | | 011007 00000000000ь | 1 | l<br> jn | <br> a7,1 | | | 012005 00000000000c | | ljp | <br> a5,2 | | | 013002 00000000000d | | jm | <br> a2,3 | 1 | #### INSTRUCTIONS 014 - 017 | Result | Operand | Description | Machine<br>Instruction | |--------|---------------------|-------------------------------|--------------------------------------------| | jz | $s_j$ , exp | Branch if $(s_j)$ is zero | 014xjx m <sub>1</sub> m <sub>2</sub> | | jn | s <sub>j</sub> ,exp | Branch if $(s_j)$ is nonzero | 015xjx m <sub>1</sub> m <sub>2</sub> | | jp | s <sub>j</sub> ,exp | Branch if $(s_j)$ is positive | 016xjx m <sub>1</sub> m <sub>2</sub> | | jm | s <sub>j</sub> ,exp | Branch if $(s_j)$ is negative | 017 <i>xjx m<sub>1</sub> m<sub>2</sub></i> | Instructions 014 through 017 conditionally stop the current instruction sequence and begin a new sequence at a specified constant parcel address read from the next 2 parcels in the instruction queue. The contents of the $S_j$ register determine the branch condition as previously indicated. The current program sequence is continued if the branch criterion is not met. # For the expression: - A word address is not allowed. - An immobile relative attribute is not allowed. - A parcel address is forced if the expression has a value attribute. - If the expression is relocatable, it must be relative to either a mixed or code section targeted for Common Memory. | Code Generated | Location | Result | Operand | Comment | |---------------------|----------|--------------|-----------|---------| | | 1 | 10 | 20 | 35 | | 014010 00000000001a | [ | <br> jz | ` s1,4 | | | 015040 0000000001b | | <br> jn<br> | <br> s4,5 | | | 016060 00000000001c | | ı<br> jp<br> | <br> s6,6 | | | 017020 00000000001d | | l<br> jm | <br> s2,7 | | # INSTRUCTIONS 020 - 021 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|----------------------------------------------------|------------------------| | a <sub>i</sub> | aj+ak | Integer sum of $(a_j)$ and $(a_k)$ to $a_i$ | 020 <i>ijk</i> | | a <sub>i</sub> | aj-ak | Integer difference of $(a_j)$ and $(a_k)$ to $a_i$ | 021 <i>ijk</i> | Instructions 020 and 021 perform 32-bit integer arithmetic in the A registers. The operands are obtained from registers $A_j$ and $A_k$ , and the result is delivered to register $A_i$ . Instruction 020 forms the 32-bit integer sum. Instruction 021 forms the 32-bit integer difference. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------|-----------|---------| | | 1 | 10 | 20 | 35 | | <br> 020123 | | | 1-22 | İ | | 020123 | 1 | a1<br> | a2+a3<br> | [<br> | | 021123 | İ | a1 | a2-a3 | | | | 1 | l | | | ## INSTRUCTIONS 022 - 023 | 1 | Result | Operand | Description | Machine<br>Instruction | |----|--------|---------|-------------------------------------------------|------------------------| | aį | | aj*ak | Integer product of $(a_j)$ and $(a_k)$ to $a_i$ | 022 <i>ijk</i> | | | | | Executes the same as $022ijk$ | 023 <i>ijk</i> | Instruction 022 forms the integer product of two 32-bit integer operands. The operands are obtained from the ${\bf A}_j$ and ${\bf A}_k$ registers. The low-order 32-bits of the result data are delivered to the ${\bf A}_i$ register. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------|-----------|---------| | | | 10 | | | | 022123<br> | <br> | a1<br> | a2*a3<br> | i<br>I | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|--------------------------|------------------------| | aį | sj | Copy (s $_j$ ) to a $_i$ | 024 <i>ij</i> x | Instruction 024 reads a 64-bit word from the ${\bf S}_j$ register and enters the low-order 32 bits into the ${\bf A}_i$ register. | Code Generated | Location | Result | Operand<br>20 | Comment | |-----------------|----------|-------------|---------------|---------| | <br> 024120<br> | <br> | <br> a1<br> | <br> s2<br> | <br> | | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|-----------------------------|------------------------| | a <sub>i</sub> | vl | Copy (vl) to $\mathtt{a}_i$ | 025 <i>i</i> xx | Instruction 025 forms a 32-bit word from the data in the VL register. The low-order 6 bits are copied from the VL data. The high-order 26 bits are 0. The result data is delivered to the ${\rm A}_i$ register. | Code Generated | Location 1 | Result | Operand<br>20 | Comment<br>35 | |-----------------|------------|-------------|---------------|---------------| | <br> 025400<br> | <br> | <br> a4<br> | <br> vl<br> | | #### INSTRUCTIONS 026 - 027 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|-------------------------------------------------|------------------------| | a <sub>i</sub> | ехр | Load a <sub>i</sub> with a value | 026 <i>ijk</i> | | a <sub>i</sub> | exp,s | Load a $_i$ with a 6-bit value | 026 <i>ijk</i> | | a <sub>i</sub> | exp,s,p | Load $a_i$ with a 6-bit positive value | 026 <i>ijk</i> | | a <sub>i</sub> | ехр | Load a <sub>i</sub> with a value | 027 <i>ijk</i> | | a <sub>i</sub> | exp,s | Load $a_i$ with a 6-bit value | 027 <i>ijk</i> | | a <sub>i</sub> | exp,s,m | Load a <sub>i</sub> with a 6-bit negative value | 027 <i>ijk</i> | Instructions 026 and 027 form a 32-bit word from the jk data in the instruction parcel. The low-order 6 bits are copied from the instruction parcel. For instruction 026, the high-order 26 bits are zeros. For instruction 027, the high-order 26 bits are ones. The result data is delivered to the $\lambda_i$ register. The ${\rm A}_i$ exp intruction maps into either an 026, 027, 040, 041, or an 042 opcode. If all symbols within the expression have been previously defined within the currently enabled qualifier, CAL maps this instruction into the proper opcode with the fewest number of parcels into which the expression will fit. Otherwise, this instruction is mapped into the 042 opcode. CAL maps the ${\tt A}_i$ exp,S instruction into the 027 opcode if the expression is negative and has a relative attribute of absolute. Otherwise, this instruction is mapped into the 026 opcode. Instruction 026 loads the $A_i$ register with positive jk. Instruction 027 loads the $A_i$ register with negative jk. | Code Generated | Location | | Operand | Comment | |-------------------|-------------|-----------|-----------------|----------------------------| | | 1 | 10 | 20 | 35 | | 026001 | | <br> a0 | | ļ | | 020001 | | au<br> | 1 | 1 | | 026102 | Ì | a1 | 2,s | į | | 026104 | l<br>I | <br> a1 | <br> 4,s,p | l<br>I | | | į | İ | 1 | i | | 027177 | 1 | a1<br> | -1<br> | | | 027177 | i | <br> a1 | -1,s | İ | | 027106 | | <br> a1 | <br> 6,s,m | | | 01/100 | İ | | | | | | 1 | <u> </u> | | 1 | | 026501 | i | ı<br> a5 | l<br> possym,s | <br> | | 026101 | ļ | <br> a1 | | 1 | | 020101 | Ì | a1 | possym,s,p<br> | !<br> | | 027201 | 1 | a2 | possym,s,m | į | | 042500 0000000000 | 1 | <br> a5 | <br> possym | ; forward | | | ļ | | | ; reference | | 1 | <br> possym | <br> = | <br> 1 | ; symbol with | | | į į | ĺ | į | ; positive | | | i<br>i | | 1 | ; value | | 026401 | į | a4 | possym | ; backward | | | ļ<br>1 | | 1 | ; reference | | | ;<br> | | İ | i<br>İ | | 027376 | ļ | <br> a3 | | 1 | | 027370 | f<br> | | negsym,s<br> | | | 026776 | 1 | a7 | negsym,s,p | į | | 027076 | i | <br> a0 | <br> negsym,s,m | <br> | | | Ì | | 1 | į | | 042100 377777777 | р <u> </u> | a1 | negsym<br> | ; forward<br> ; reference | | | | | i | Ì | | -2 | negsym | = | -2 | ; symbol with | | | | | | ; negative<br> ; value | | 027376 | | | <br> | ĺ | | 041310 | | a3 | negsym | ; backward<br> ; reference | ## INSTRUCTIONS 030 - 033 | Result | Operand | Description | Machine<br>Instruction | |--------|-------------------|--------------------------------------------------|------------------------| | vm | v <sub>k</sub> ,z | Set vm from zero elements of $(v_k)$ | 030 <i>xxk</i> | | vm | v <sub>k</sub> ,n | Set vm from nonzero elements of $(v_{\pmb{k}})$ | 031 <i>xxk</i> | | vm | v <sub>k</sub> ,p | Set vm from positive elements of $(v_{\pmb{k}})$ | 032 <i>xxk</i> | | vm | ∨ <i>k</i> ,m | Set vm from negative elements of $(v_k)$ | 033 <i>xxk</i> | Instructions 030 through 033 create a vector mask in the VM register based on the results of testing the contents of the elements of register $V_{k}$ . The VM register is initially cleared, and a bit is entered in the VM register where elements of the vector stream meet the test criterion. The high-order bit position in the VM register corresponds to the first element of the vector. The bit positions are then assigned in order for the remainder of the vector stream. These instructions are performed in the Vector Logical unit. | Code Generated | Location | Result | Operand | Comment | |-----------------|----------|--------------|-----------|---------| | | 1 | 10 | 20 | 35 | | <br> 030001 | | <br> vm | <br> v1,z | | | <br> 031001<br> | | <br> vm<br> | v1,n | | | <br> 032001<br> | | vm | v1,p | İ | | 033001 | <br> | vm | v1,m<br> | i<br>I | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|----------------------|------------------------| | vm | sj | Copy $(s_j)$ to $vm$ | 034 <i>xjx</i> | Instruction 034 enters the VM register with a 64-bit word from the $\mathrm{S}_{j}$ register. | Code Generated | Location | Result<br>10 | Operand<br>20 | Comment<br>35 | | |-----------------|-----------|--------------|---------------|---------------|-----------| | <br> 034020<br> | <br> <br> | <br> vm<br> | <br> s2<br> | | <br> <br> | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|------------------------------------------|------------------------| | dri | | Disable halt on memory field range error | 035 <b>xx</b> 0 | | eri | | Enable halt on memory field range error | 035xx1 | | dfi | · | Disable halt on floating-point error | 035 <b>xx</b> 2 | | efi | | Enable halt on floating-point error | 035 <b>xx</b> 3 | Instruction 035 alters 2 status bits (bits 21 and 22) in the Background Port Status register depending on the value of the k designator in the instruction parcel. | Code Generated | Location | Result | Operand | Comment | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|----------|---------|----------| | The state of s | 1 | 10 | 20 | 35 | | | İ | 1 | İ | İ | | 035000 | | dri | | ! | | 035001 | <br> | <br> eri | }<br> | 1 | | | i | | İ | i | | 035002 | | dfi | ĺ | İ | | | <u> </u> | | ļ | <u> </u> | | 035003 | ļ | efi | | 1 | # INSTRUCTIONS 036 - 037 | Result | Operand | Description | Machine<br>Instruction | |--------|----------------|------------------------------|------------------------| | vl | a <sub>k</sub> | Copy (a <sub>k</sub> ) to vl | 036 <b>xxk</b> | | | | Executes the same as 036xxk | 037 <b>xxk</b> | Instruction 036 enters the low-order 6 bits of data from the ${\rm A}_k$ register into the VL register. A value of 0 in the VL register is interpreted as 64. | Code Generated | Location | Result | Operand<br>20 | Comment<br>35 | _ | |-----------------|----------|-------------|---------------|---------------|---| | <br> 036004<br> | | <br> vl<br> | <br> a4<br> | | | #### INSTRUCTIONS 040 - 041 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|-----------------------------------------------------|--------------------------------| | a <sub>i</sub> | ехр | Load a <sub>i</sub> with a value | 040 <i>i</i> xx m <sub>1</sub> | | a <sub>i</sub> | exp,p | Load a <sub>i</sub> with a 16-bit value | 040ixx m <sub>1</sub> | | a <sub>i</sub> | exp,p,p | Load a <sub>i</sub> with a 16-bit<br>positive value | 040 <i>i</i> xx m <sub>I</sub> | | aį | ехр | Load a; with a value | 041ixx m <sub>1</sub> | | a <sub>i</sub> | exp,p | Load a <sub>i</sub> with a 16-bit value | 041ixx m <sub>1</sub> | | a <sub>i</sub> | exp,p,m | Load a <sub>i</sub> with a 16-bit<br>negative value | 041 <i>ixx m<sub>I</sub></i> | Instructions 040 and 041 enter a 32-bit constant into the ${\bf A}_i$ register. The low-order 16 bits are read from the following parcel in the instruction queue. The $A_i$ exp intruction maps into either an 026, 027, 040, 041, or an 042 opcode. If all symbols within the expression have been previously defined within the currently enabled qualifier, CAL maps this instruction into the proper opcode with the fewest number of parcels into which the expression will fit. Otherwise, this instruction is mapped into the 042 opcode. CAL maps the ${\rm A}_i$ exp,P instruction into the 041 opcode if the expression is negative and has a relative attribute of absolute. Otherwise, this instruction is mapped into the 040 opcode. For instruction 040, the high-order 16 bits are zero-filled. For instruction 041, the high-order 16 bits are set to ones. | Code Generated | Location | Result | Operand | Comment | |--------------------|------------|-------------|------------------|----------------------------| | | 1 | 10 | 20 | 35 | | 040100 000174 | | <br> a1 | 124 | | | 040100 000007 | | <br> a1 | 1<br> 7,p | <br> | | 040100 000007 | | a1 | <br> 7,p,p | <br> | | 041100 177604 | į | <br> a1 | -124 | | | 041100 177604 | | <br> a1 | -124,p | <br> | | 041100 000007 | 1 | <br> a1 | 7,p,m | | | | | | | | | 026100 | | <br> a1 | 10 | !<br>! | | 040100 000000 | 1 | <br> a1 | 1<br> 0,p | | | 040600 004321 | | <br> a5<br> | possym,p | | | 040000 004321 | 1 | <br> a0<br> | possym,p,p | | | 041300 004321 | 1 | <br> a3 | possym,p,m | | | 042200 00000004321 | 1 | <br> a2 | <br> possym | ; forward<br> ; reference | | 4321 | | | <br> <br> o'4321 | İ | | 4321 | possym<br> | = | 0 4321 | ; symbol with ; positive | | | <br> | | | ; value<br> | | 040500 004321 | | a5<br> | possym<br> | ; backward<br> ; reference | # Examples (continued): | Code Generated | Location | Result | Operand | Comment | |--------------------|-------------------|--------------------|------------------|-----------------------------------------------| | | 1 | 10 | 20 | 35 | | | i | | İ | | | 027477 | 1 | a4 | -1 | | | 041400 177777 | | <br> a4 | -1,p | | | 041300 176544 | | a3 | negsym,p | | | 040700 176544 | <br> | a7 | negsym,p,p | | | 041000 176544 | | <br> a0 | <br> negsym,p,m | | | 042100 37777776544 | | a1 | <br> negsym<br> | <br> ; forward<br> ; reference | | -1234 | <br> negsym <br> | <br> <b>=</b><br> | <br> -0'1234<br> | <br> ; symbol with<br> ; negative<br> ; value | | 041300 176544 | | a3 | <br> negsym<br> | <br> ; backward<br> ; reference | #### INSTRUCTIONS 042 - 043 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|----------------------------------|--------------------------------------------| | a <sub>i</sub> | ехр | Load a <sub>i</sub> with a value | 042ixx m <sub>1</sub> m <sub>2</sub> | | aį | exp,h | Load a $_i$ with a 32-bit value | 042ixx m <sub>1</sub> m <sub>2</sub> | | | | Executes the same as 042ixx | 043 <i>ixx m<sub>1</sub> m<sub>2</sub></i> | Instruction 042 loads the $A_i$ register with a 32-bit constant read from the next 2 parcels in the instruction queue. The ${\rm A}_i$ exp instruction maps into either an 026, 027, 040, 041, or an 042 opcode. If all symbols within the expression have been previously defined within the currently enabled qualifier, CAL maps this instruction into the proper opcode with the fewest number of parcels into which the expression will fit. Otherwise, this instruction is mapped into the 042 opcode. | Code Generated | Location | | Operand | Comment | |--------------------|----------------------|-------------|-------------------------|--------------------------------------------------| | | <del></del> | 10 | 20 | 35 | | 042100 00004172107 | | <br> a1 | <br> 1111111 | | | 042100 00000000007 | | <br> a1 | <br> 7,h | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | | | | | | | | 026601 | | <br> a6 | 1 | | | 042600 00000000001 | | <br> a6 | 1,h | | | 042200 00007654321 | | <br> a2<br> | <br> possym<br> | ; forward <br> ; reference | | 7654321 | <br> possym<br> <br> | <br> =<br> | <br> o'7654321<br> <br> | <br> ; symbol with <br> ; positive <br> ; value | | 042500 00007654321 | <br> <br> <br> | <br> a5<br> | <br> possym<br> <br> | | # Examples (continued): | Code Generated | Location | Result | Operand | Comment | |--------------------|-------------|---------|-----------------|----------------| | | 1 | 10 | 20 | 35 | | 027376 | | <br> a3 | <br> -2 | į | | 027370 | i | 43 | -2 | | | 042300 3777777776 | į | a3 | -2,h | į | | 042100 37776543211 | 1 | <br> a1 | <br> negsym | <br> ; forward | | | 1 | | | ; reference | | -1234567 | <br> negsym | <br> = | <br> -o'1234567 | ; symbol with | | | ļ ļ | | ļ | ; negative | | | | | | ; value | | 042300 37776543211 | | <br> a3 | <br> negsym | ; backward | | | ! | | ļ. | ; reference | | | | | 1 | 1 | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|--------------------------------------------------------------------|------------------------| | aį | [exp] | Read from location <i>exp</i><br>in Local Memory to a <sub>i</sub> | 044ixx m <sub>1</sub> | Instruction 044 enters the ${\rm A}_i$ register with the low-order 32 bits of a data word in Local Memory. The Local Memory address is obtained from the following parcel in the instruction queue. If the expression has a relative attribute of relocatable, it must be relative to a Local Memory section. Local Memory section is defined in the Section Assignment subsection of the Pseudo Instruction section in CAL Assembler Version 2 Reference Manual, CRI publication SR-2003. If the expression is immobile or relocatable relative to a task common section, CAL issues a warning message. | Code Generated | Location | Result | Operand<br>20 | Comment<br>35 | |------------------------|----------|-------------|-----------------|---------------| | <br> 044100 000003<br> | | <br> a1<br> | ;<br> [1+2]<br> | | | Result | Operand | Description | Machine<br>Instruction | |--------|----------------|-------------------------------------------------|------------------------| | [exp] | a <sub>k</sub> | Write $(a_k)$ to location $exp$ in Local Memory | 045xxk m <sub>1</sub> | Instruction 045 writes one 64-bit word in Local Memory. The Local Memory address is obtained from the following parcel in the instruction queue. The data word is obtained by sign extending the content of the ${\rm A}_{\vec{k}}$ register through the high-order 32 bit positions of the 64-bit word. If the expression has a relative attribute of relocatable, it must be relative to a Local Memory section. Local Memory section is defined in the Section Assignment subsection of the Pseudo Instruction section in CAL Assembler Version 2 Reference Manual, CRI publication SR-2003. If the expression is immobile or relocatable relative to a task common section, CAL issues a warning message. | Code Generated | Location | Result | Operand<br>20 | Comment<br>35 | |------------------------|----------|----------------|---------------|---------------| | <br> 045001 000003<br> | <br> | <br> [1+2]<br> | <br> <br> a1 | | | | Result | Operand | Description | Machine<br>Instruction | |---|--------|-------------------|-------------------------------------------------------------------------|------------------------| | a | ì | [a <sub>k</sub> ] | Read from location $\mathtt{a}_{k}$ in Local Memory to $\mathtt{a}_{i}$ | 046 <i>ixk</i> | Instruction 046 enters the ${\rm A}_i$ register with the low-order 32 bits of a word in Local Memory. The Local Memory address is obtained from the ${\rm A}_k$ register. | Code Generated | Location 1 | Result<br>10 | Operand<br>20 | Comment 35 | |-----------------|------------|--------------|---------------|------------| | <br> 046102<br> | | <br> a1<br> | <br> [a2]<br> | | | Result | Operand | Description | Machine<br>Instruction | |-------------------|----------------|-------------------------------------------------|------------------------| | [a <sub>k</sub> ] | a <sub>j</sub> | Write $(a_j)$ to location $a_k$ in Local Memory | 047 <i>xjk</i> | Instruction 047 writes one 64-bit word in Local Memory. The Local Memory address is obtained from the $A_{k}$ register. The write data word is obtained by sign extending the contents of the $A_{j}$ register through the high-order 32 bit positions of the 64-bit word. | Code Generated | Location | Result | Operand<br>20 | Comment | |-----------------|----------|----------------|---------------|---------| | <br> 047012<br> | | <br> [a2]<br> | <br> a1<br> | | #### INSTRUCTIONS 050 - 052 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|-------------------------------------------|----------------------------------------------| | sį | ехр | Load s <sub>i</sub> with a value | 050ixx m <sub>1</sub> m <sub>2</sub> | | si | exp,h | Load s $_i$ with a 32-bit value | 050ixx m <sub>1</sub> m <sub>2</sub> | | si | exp,h,p | Load $s_i$ with a 32-bit positive value | 050ixx m <sub>1</sub> m <sub>2</sub> | | si | ехр | Load s <sub>i</sub> with a value | 051ixx m <sub>1</sub> m <sub>2</sub> | | si | exp,h | Load s $_i$ with a 32-bit value | 051ixx m <sub>1</sub> m <sub>2</sub> | | sį | exp,h,m | Load $s_i$ with a 32-bit negative value | 051ixx m <sub>1</sub> m <sub>2</sub> | | s <sub>i</sub> | exp,l | Load s $_i$ left side with a 32-bit value | 052 <i>ixx</i> m <sub>1</sub> m <sub>2</sub> | The $S_i$ exp instruction maps into either an 050, 051, 052, 053, 116, or a 117 opcode. If all the symbols within the expression have been previously defined within the currently enabled qualifier, CAL maps this instruction into the proper opcode with the fewest number of parcels into which the expression will fit. Otherwise, this instruction is mapped into the 053 opcode. CAL maps the $S_i$ exp,H instruction into the 051 opcode if the expression is negative and has a relative attribute of absolute. Otherwise, this instruction is mapped into the 050 opcode. Instructions 050 through 052 load a 64-bit value into the $\mathrm{S}_i$ register. Instruction 050 reads the low-order 32 bits from the next 2 parcels in the instruction queue. The high-order 32 bits are zero-filled. Instruction 051 reads the low-order 32 bits from the next 2 parcels in the instruction queue. The high-order 32 bits are filled with ones. Instruction 052 reads the high-order 32 bits of a constant from the next 2 parcels in the instruction queue. The low-order 32 bits are zero-filled. | Code Generated | Location | | Operand | Comment | |------------------------------------|-------------|--------------|-----------------|--------------------------------| | | 1 | 10 | 20 | 35 | | <br> 050100 0000 <b>4</b> 172107 | | <br> s1<br>! | <br> 1111111 | | | 050100 00000000007 | | <br> s1<br> | 7,h | | | 050100 00000000007 | | <br> s1<br> | 7,h,p | | | 051100 37773605671 | | <br> s1<br> | -1111111 | | | 051100 37773605671 | 1 | <br> s1<br> | -1111111,h | | | 051100 00000000007 | | <br> s1<br> | 7,h,m | | | 052100 00000000007 | <br> | <br> s1<br> | 7,1 | | | | 1 | :<br> <br> - | | | | 116403 | 1 | <br> s4<br> | 3 | | | 050400 00000000003 | 1 | <br> s4<br> | 3,h | | | 050700 00000004321 | 1 | <br> s7<br> | possym,h | | | 050700 00000004321 | i<br> | <br> s7<br> | possym,h,p | | | 051300 00000004321 | 1 | <br> s3<br> | possym,h,m | | | 053000<br>000000000000000000004321 | <br> <br> - | <br> s0<br> | <br> possym<br> | <br> ; forward<br> ; reference | | 4321 | possym | <br> =<br> | 0'4321 | ; symbol with | | | 1 | <br> | 1 | ; positive<br> ; value | | 050400 00000004321 | 1 | <br> s4<br> | <br> possym | ; backward | | | | <br> | 1 | ; reference | # Examples (continued): | Code Generated | Location | Result | Operand | Comment | |-----------------------------------|----------------|------------------|---------------------------|------------------------------------------| | | 1 | 10 | 20 | 35 | | 117775 | | <br> s7 | -3 | | | 051700 3777777775 | | <br> s7<br> | <br> -3,h | i<br> <br> | | 051200 37777776544 | | <br> s2<br> | negsym,h | 1 | | 050600 37777776544 | | <br> s6<br> | negsym,h,p | ;<br> <br> | | 051500 37777776544 | <br> | <br> s5<br> | negsym,h,m | | | 053100<br>17777777777777777776544 | <br> | s1<br> | negsym<br> <br> | ; forward<br> ; reference | | -6544 | negsym<br> | <br> =<br> | -o'1234<br> <br> | ; symbol with<br> ; negative<br> ; value | | 051400 37777776544 | | <br> s4<br> <br> | <br> negsym<br> <br> <br> | ; backward<br> ; reference<br> | | 052200 10000300000 | 1 | <br> s2 | 11.0 | | | 052300 30000300000 | <br> | <br> s3 | -1.0 | | | 052500 00000000001 | | <br> s5<br> | 1,1 | <br> ; force left<br> ; side opcode | | 053700<br>04000360000000000000000 | | <br> s7<br> | sym<br> | ; forward<br> ; reference | | 0400036000000000000000 | <br> sym | <br> =<br> | 6.0 | | | 052600 10000740000 | <br> <br> <br> | <br> s6<br> | <br> sym<br> <br> | <br> ; backward<br> ; reference | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-----------------------------------------|-----------------------------------------------------------------------| | sį | ехр | Load s $_i$ with a value | 053ixx<br>m <sub>1</sub> m <sub>2</sub> m <sub>3</sub> m <sub>4</sub> | | si | exp,f | Load s <sub>i</sub> with a 64-bit value | 053ixx<br>m <sub>1</sub> m <sub>2</sub> m <sub>3</sub> m <sub>4</sub> | The $S_i$ exp instruction maps into either an 050, 051, 052, 053, 116, or a 117 opcode. If all the symbols within the expression have been previously defined within the currently enabled qualifier, CAL maps this instruction into the proper opcode with the fewest number of parcels into which the expression will fit. Otherwise, this instruction is mapped into the 053 opcode. Instruction 053 loads the $S_i$ register with a 64-bit constant read from the following 4 parcels in the instruction queue. | Code Generated | Location | Result | Operand | Comment | |-----------------------------------------|----------|----------|------------------|----------------| | | 1 | 10 | 20 | 35 | | | | | | | | 053100 | İ | s1 | 1111111111111 | İ | | 0000000020126330410707 | | | 1 | İ | | | | | | 1 | | 053100 | | s1 | 7,f | 1 | | 0000000000000000000007 | | | † | 1 | | | | | 1 | 1 | | | | | | | | | | | | 1 | | 116607 | | s6 | 17 | 1 | | 0.53.200 | <br> | 1 2 | <br> 7 | 1 | | 053200 | <u> </u> | s2 | 7,f | 1 | | 000000000000000000000000000000000000000 | [<br> | • | 1 | 1 | | 053700 | ]<br> | ls7 | <br> sym | <br> ; forward | | 0001234567012345670123 | !<br> | 5 / | 1 2 3 1 11 | ; reference | | 0001234307012343070123 | <br> | | 1 | reference | | 1234567012345670123 | <br> sym | !<br> = | lo'1234567012345 | 670123 | | | - 1 - 1 | | | 1 | | 053000 | | s0 | sym | ; backward | | 0001234567012345670123 | | | | ; reference | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-----------------------------------------------|--------------------------------| | si | [exp] | Read from location <i>exp</i> in Local Memory | 054 <i>i</i> xx m <sub>1</sub> | Instruction 054 enters the $S_i$ register with a 64-bit data word from the Local Memory. The Local Memory address is obtained from the following parcel in the instruction queue. If the expression has a relative attribute of relocatable, it must be relative to a Local Memory section. Local Memory section is defined in the Section Assignment subsection of the Pseudo Instruction section in CAL Assembler Version 2 Reference Manual, CRI publication SR-2003. If the expression is immobile or relocatable relative to a task common section, CAL issues a warning message. | Code Generated | Location | Result | Operand<br>20 | Comment | |------------------------|----------|-------------|---------------|---------| | <br> 054100 000001<br> | <br> | <br> s1<br> | <br> [1]<br> | <br> | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-------------------------------------------------------------------|------------------------| | [exp] | sj | Write (s <sub>j</sub> ) to location <i>exp</i> in<br>Local Memory | 055xjx m <sub>l</sub> | Instruction 055 writes one 64-bit word into the Local Memory. The Local Memory address is obtained from the following parcel in the instruction queue. The 64-bit word is obtained from the $S_{\hat{i}}$ register. If the expression has a relative attribute of relocatable, it must be relative to a Local Memory section. Local Memory section is defined in the Section Assignment subsection of the Pseudo Instruction section in CAL Assembler Version 2 Reference Manual, CRI publication SR-2003. If the expression is immobile or relocatable relative to a task common section, CAL issues a warning message. | Code Generated | Location 1 | Result | Operand<br>20 | Comment<br>35 | |--------------------|------------|--------|---------------|---------------| | <br> 055010 000001 | | [1] | <br> s1 | | | Result | Operand | Description | Machine<br>Instruction | |--------|-------------------|---------------------------------------------------------|------------------------| | si | [a <sub>k</sub> ] | Read from location (a <sub>k</sub> ) in<br>Local Memory | 056 <i>ixk</i> | Instruction 056 enters the S $_i$ register with a 64-bit data word from Local Memory. The Local Memory address is obtained from the ${\bf A}_k$ register. | Code Generated | Location | Result | Operand<br>20 | Comment | |----------------|----------|---------|----------------|---------| | | | <br> s1 | <br> <br> [a2] | | | Result | Operand | Description | Machine<br>Instruction | |-------------------|---------|---------------------------------------------------|------------------------| | [a <sub>k</sub> ] | sį | Write $(s_i)$ to location $(a_k)$ in Local Memory | 057 <i>ixk</i> | Instruction 057 stores one 64-bit word in Local Memory. The Local Memory address is obtained from the ${\rm A}_k$ register. The 64-bit word is obtained from the ${\rm S}_i$ register. | Code Generated | Location 1 | Result | Operand<br>20 | Comment<br>35 | |-----------------|------------|---------------|---------------|---------------| | <br> 057102<br> | | <br> [a2]<br> | <br> s1<br> | | | Result | Operand | Description | Machine<br>Instruction | |--------|-------------|---------------------------------------------------------|------------------------| | si | $(a_j,a_k)$ | Read from Common Memory location $(a_j)+(a_k)$ to $s_i$ | 060 <i>ijk</i> | Instruction 060 reads one 64-bit word from Common Memory and enters it in the $S_i$ register. The relative Common Memory location is determined by adding the contents of register $A_i$ to the contents of register $A_k$ . | Code Generated | Location | Result | Operand<br>20 | Comment<br>35 | | |-----------------|------------|-------------|------------------|---------------|--| | <br> 060123<br> | }<br> <br> | <br> s1<br> | <br> (a2,a3)<br> | | | | Result | Operand | Description | Machine<br>Instruction | |----------------------|---------|----------------------------------------------------------|------------------------| | (aj,a <sub>k</sub> ) | sį | Write $(s_i)$ to Common Memory at location $(a_j)+(a_k)$ | 061 <i>ijk</i> | Instruction 061 stores one 64-bit word into Common Memory from the $S_i$ register. The relative Common Memory location is determined by adding the contents of register $A_i$ to the contents of register $A_k$ . | Location | Result | Operand | Comment | |----------|---------------|------------------------------|---------| | 1 | 10 | 20 | 35 | | ļ į | | İ | i | | | (a2,a3) | s1 | ! | | | Location<br>1 | Location Result 1 10 (a2,a3) | 1 10 20 | | Result | Operand | Description | Machine<br>Instruction | |--------|-------------------|----------------------------------------------------------------------|------------------------| | si | (a <sub>k</sub> ) | Read from Common Memory at location $(a_{\pmb{k}})$ to $s_{\pmb{i}}$ | 062 <i>ixk</i> | Instruction 062 reads one 64-bit word from Common Memory and enters it in the $\mathbf{S}_i$ register. The relative Common Memory location is obtained from the $\mathbf{A}_k$ register. | Code Generated | Location | Result<br>10 | Operand<br>20 | Comment<br>35 | |-----------------|----------|--------------|---------------|---------------| | <br> 062102<br> | <br> | <br> s1<br> | <br> (a2)<br> | | | Result | Operand | Description | Machine<br>Instruction | |-------------------|---------|----------------------------------------------------|------------------------| | (a <sub>k</sub> ) | sį | Write $(s_i)$ to Common Memory at location $(a_k)$ | 063 <i>ixk</i> | Instruction 063 writes one 64-bit word in the Common Memory. The relative Common Memory location is obtained from the ${\rm A}_k$ register. The 64-bit word is obtained from the ${\rm S}_i$ register. | Code Generated | Location | Result | Operand<br>20 | Comment<br>35 | |----------------|----------|-----------|---------------|---------------| | 063102 | | <br> (a2) | <br> s1 | | | 1 | 1 | | | 1 | | | Result | Operand | Description | Machine<br>Instruction | |---|----------------|-------------------|-----------------------------------------------------------|--------------------------------------| | 5 | <sup>5</sup> i | (a <i>k,exp</i> ) | Read from Common Memory at location $(a_k)$ +exp to $s_i$ | 064ixk m <sub>1</sub> m <sub>2</sub> | Instruction 064 reads one 64-bit word from Common Memory and enters it in the $S_i$ register. The relative Common Memory location is determined by adding the contents of register $A_k$ to a 32-bit constant from the next 2 parcels in the instruction queue. If the expression has a relative attribute of relocatable, it must be relative to a Common Memory section. Common Memory section is defined in the Section Assignment subsection of the Pseudo Instruction section in CAL Assembler Version 2 Reference Manual, CRI publication SR-2003. Also, the parcel must not have a parcel address attribute. An instruction that would normally translate into a 064ixk $m_1$ $m_2$ instruction that contains a zero expression can be converted by the assembler into a 062ixk instruction. For this conversion to occur, all symbols within the expression must be previously defined and must be defined within the currently enabled qualifier. Also the value of the expression must be zero and have an relative attribute of either absolute or relocatable relative to a stack section. | Code Generated | Location | Result | Operand | Comment | |--------------------|----------|--------|---------|---------| | | 1 | 10 | 20 | 35 | | | 1 | l | ĺ | j | | 064102 00000000001 | 1 | s1 | (a2,1) | 1 | | 0.50004 | | ! | | | | 062204 | | s2 | (a4,0) | | | | | | | | | Result | Operand | Description | Machine<br>Instruction | |----------|---------|------------------------------------------------------------|----------------------------------------------| | (ak,exp) | sį | Write $(s_i)$ to Common Memory at location $(a_k)$ + $exp$ | 065 <i>ixk</i> m <sub>1</sub> m <sub>2</sub> | Instruction 065 writes one 64-bit word into Common Memory. The relative Common Memory location is determined by adding the contents of the ${\rm A}_k$ register to a 32-bit constant from the next 2 parcels in the instruction queue. The 64-bit word is obtained from the ${\rm S}_i$ register. If the expression has a relative attribute of relocatable, it must be relative to a Common Memory section. Common Memory section is defined in the Section Assignment subsection of the Pseudo Instruction section in CAL Assembler Version 2 Reference Manual, CRI publication SR-2003. Also, the parcel must not have a parcel address attribute. An instruction that would normally translate into a 065ixk $m_I$ $m_2$ instruction that contains a zero expression can be converted by the assembler into a 063ixk instruction. For this conversion to occur, all symbols within the expression must be previously defined and must be defined within the currently enabled qualifier. Also the value of the expression must be zero and have an relative attribute of either absolute or relocatable relative to a stack section. | Code Generated | Location | Result | Operand | Comment | |--------------------|----------|-------------|---------|---------| | | 1 | 10 | 20 | 35 | | | 1 | 1 | ĺ | İ | | 065102 00000000001 | ŀ | (a2,1) | s1 | ! | | 063306 | 1 | <br> (a6,0) | <br> s3 | 1 | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-------------------------------------------------|--------------------------------------| | sį | (exp) | Read from Common Memory location $exp$ to $s_i$ | 066ixx m <sub>1</sub> m <sub>2</sub> | Instruction 066 reads one 64-bit word from Common Memory and enters it in the $\mathbf{S}_i$ register. The relative memory location is obtained from the next 2 parcels in the instruction queue. If the expression has a relative attribute of relocatable, it must be relative to a Common Memory section. Common Memory section is defined in the Section Assignment subsection of the Pseudo Instruction section in CAL Assembler Version 2 Reference Manual, CRI publication SR-2003. Also, the parcel must not have a parcel address attribute. If the expression is immobile or relocatable relative to a task common section, CAL issues a warning message. | Code Generated | Location 1 | Result<br>10 | Operand<br>20 | Comment<br>35 | _ | |-----------------------------|------------|--------------|----------------|---------------|---| | <br> 066100 00000000003<br> | | <br> s1<br> | <br> (1+2)<br> | | | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|--------------------------------------------------|--------------------------------------| | (exp) | si | Write $(s_i)$ to Common Memory at location $exp$ | 067ixx m <sub>1</sub> m <sub>2</sub> | Instruction 067 writes one 64-bit word in the Common Memory. The relative Common Memory location is obtained from the next 2 parcels in the instruction queue. The data word is obtained from the $\mathbf{S}_i$ register. If the expression has a relative attribute of relocatable, it must be relative to a Common Memory section. Common Memory section is defined in the Section Assignment subsection of the Pseudo Instruction section in CAL Assembler Version 2 Reference Manual, CRI publication SR-2003. Also, the parcel must not have a parcel address attribute. If the expression is immobile or relocatable relative to a task common section, CAL issues a warning message. | Code Generated | Location 1 | Result | Operand<br>20 | Comment<br>35 | |------------------------|------------|----------------|---------------|---------------| | <br> 067100 0000000003 | | <br> (1+2)<br> | <br> s1<br> | <br> | | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------------------------|--------------------------------------------------------------------------|------------------------| | v <sub>i</sub> | (a <sub>j</sub> ,a <sub>k</sub> ) | Read from Common Memory location $(a_j)$ incremented by $(a_k)$ to $v_i$ | 070 <i>ijk</i> | Instruction 070 reads a vector stream of 64-bit words from Common Memory and enters it into the $\rm V_{\it i}$ register. The contents of the VL register determines the length of the stream. The first address for the Common Memory reference is formed by adding the contents of the $A_j$ register to the Background Processor base address. The following addresses for the Common Memory reference are separated by constant increments or decrements (strides). The stride is read from register $A_k$ . $A_k$ can contain positive, zero, or negative values. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|------------|-------------------|---------| | 1070123 | | | <br> <br> (a2,a3) | | | U / U I Z 3 | i | I<br>I A T | (a2,a3)<br> | l<br>I | | Result | Operand | Description | Machine<br>Instruction | |----------------------|---------|------------------------------------------------------------------------|------------------------| | (aj,a <sub>k</sub> ) | vi | Write $(v_i)$ to Common Memory location $(a_j)$ incremented by $(a_k)$ | 071 <i>ijk</i> | Instruction 071 writes a vector stream of 64-bit words from the $\rm V_{\it i}$ register into Common Memory. The contents of the VL register determines the length of the stream. The first address for the Common Memory reference is formed by adding the contents of the $A_j$ register to the Background Processor base address. The following addresses for the Common Memory reference are separated by constant increments or decrements (strides). The stride is read from register $A_k$ . $A_k$ can contain positive, zero, or negative values. | Code Generated | Location | Result | Operand<br>20 | Comment<br>35 | | |----------------|----------|-------------------|---------------|---------------|----------------| | 071123 | | <br> <br> (a2,a3) | <br> v1 | | <br> <br> <br> | | 1 | į | | | ĺ | i | | Result | Operand | Description | Machine<br>Instruction | |--------|--------------|------------------------------------------------------------|------------------------| | vi | $(a_k, v_j)$ | Gather from Common Memory locations $(a_k)+(v_j)$ to $v_i$ | 072 <i>ijk</i> | Instruction 072 reads a vector stream of 64-bit words from Common Memory into the $V_{\hat{I}}$ register. The contents of the VL register determines the length of the stream. The relative Common Memory location is computed separately for each element of the vector. The contents of the $\mathbf{A}_k$ register is read at the beginning of instruction execution and held in the Common Memory port. The contents of the $\mathbf{V}_j$ register is streamed to the Common Memory port. The high-order 32 bits of this data are discarded. The low-order 32 bits are used as components in the address calculation. The first address for the Common Memory reference is formed by adding the first element of $V_j$ data to $A_k$ data and the Background Processor base address. The following addresses for the Common Memory reference are formed by adding the following elements of $V_j$ data to the $A_k$ data and the Background Processor base address. | Code Generated | Location<br>1 | Result<br>10 | Operand<br>20 | Comment 35 | | |----------------|---------------|--------------|---------------|------------|---| | <br> 072132 | <br> | <br> v1 | <br> (a2,v3) | | | | 1 | | | 1 | | ı | | Result | Operand | Description | Machine<br>Instruction | |-----------------------------------|----------------|----------------------------------------------------------|------------------------| | (a <sub>k</sub> ,v <sub>j</sub> ) | v <sub>i</sub> | Scatter $(v_i)$ to Common Memory locations $(a_k)+(v_j)$ | 073 <i>ijk</i> | Instruction 073 stores a vector stream of 64-bit words into Common Memory from the $V_{\it i}$ register. The contents of the VL register determines the length of the stream. The relative Common Memory location is computed separately for each element of this vector stream. The contents of the $\mathbf{A}_k$ register is read at the beginning of instruction execution and held in the Common Memory port. The contents of the $\mathbf{V}_j$ register is streamed to the Common Memory port. The high-order 32 bits of this data stream are discarded. The low-order 32 bits are used as components in the address calculation. The first address for the Common Memory reference is formed by adding the first element of $V_j$ data to $A_k$ data and the Background Processor base address. The following addresses for the Common Memory reference are formed by adding the following elements of $V_j$ data to the $A_k$ data and the Background Processor base address. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|---------|---------|---------| | | 1 | 10 | 20 | 35 | | | İ | | İ | i | | 073132 | 1 | (a2,v3) | v1 | | | 1 | } | l | 1 | 1 | | Result | Operand | Description | Machine<br>Instruction | |--------|-------------------|--------------------------------------------------|------------------------| | vi | [a <sub>k</sub> ] | Read from Local Memory location $(a_k)$ to $v_i$ | 074 <i>ixk</i> | Instruction 074 reads a stream of 64-bit words from Local Memory at consecutive locations. The initial Local Memory address is obtained from the $\mathbf{A}_k$ register. The data stream is entered into the $\mathbf{V}_i$ register. The contents of the VL register determines the length of the stream. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------------|----------------|---------| | 074102 | | <br> <br> v1 | <br> <br> [a2] | | | | i | , · -<br> | [ ] | i | | Result | Operand | Description | Machine<br>Instruction | |-------------------|----------------|------------------------------------------------|------------------------| | [a <sub>k</sub> ] | v <sub>i</sub> | Write $(v_i)$ to Local Memory location $(a_k)$ | 075 <i>ixk</i> | Instruction 075 stores a vector stream of 64-bit words into Local Memory at consecutive locations. The initial Local Memory address is obtained from the $\mathbf{A}_k$ register. The $\mathbf{V}_i$ register contains the data stream, and the contents of the VL register determines the length of the stream. | Code Generated | Location | Result<br>10 | Operand<br>20 | Comment<br>35 | | |-----------------|----------|---------------|---------------|---------------|--| | <br> 075102<br> | | <br> [a2]<br> | <br> v1<br> | | | # INSTRUCTIONS 076 - 077 | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-------------------------|------------------------| | pass | | Pass | 076 <b>xxx</b> | | pass | ехр | Pass | 076 <i>ijk</i> | | | | Executes same as 076xxx | 077xxx | Instructions 076 and 077 issue without functional activity. | Code Generated | Location | Result | Operand | Comment | | |----------------|----------|-----------|---------|---------|-----| | | 1 | 10 | 20 | 35 | _ | | <br> 076000 | | <br> pass | | | 1 | | <br> 076001 | ! | <br> pass | 1 | | - 1 | | | | | i | | - 1 | ### INSTRUCTIONS 100 - 103 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------------------------------|------------------------------------------------------------|------------------------| | sį | sj&sk | Logical product of $(s_j)$ and $(s_k)$ to $s_i$ | 100 <i>ijk</i> | | sį | #s <sub>k</sub> &s <sub>j</sub> | Logical product of $(s_j)$ and complement $(s_k)$ to $s_i$ | 101 <i>ijk</i> | | si | sj\sk | Logical difference of $(s_j)$ and $(s_k)$ to $s_i$ | 102 <i>ijk</i> | | s <sub>i</sub> | sj!sk | Logical sum of $(s_j)$ and $(s_k)$ to $s_i$ | 103 <i>ijk</i> | | s <sub>i</sub> | sj | S register copy $(j=k)$ | 103 <i>ijj</i> | Instructions 100 through 103 perform scalar logical operations. The operands are obtained from registers $S_j$ and $S_k$ , and the result is returned to register $S_i$ . Instructions 100 and 101 read two 64-bit scalar operands and form the bit-by-bit logical product. Instruction 101 complements the $\mathbf{S}_k$ data before the logical product is formed. Instruction 102 reads two 64-bit scalar operands and forms the bit-by-bit logical difference. Instruction 103 reads two 64-bit scalar operands and forms the bit-by-bit logical sum. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|---------|-------------|----------| | | 1 | 10 | 20 | 35 | | | İ | | į | İ | | 100123 | ļ ļ | s1 | s2&s3 | ļ. | | 101132 | [ ] | <br> s1 | <br> #s2&s3 | l<br>I | | 101132 | <br> | 51 | #520S3 | ł | | 102123 | i | s1 | s2\s3 | i | | | | | 1 | 1 | | 103123 | 1 | s1 | s2!s3 | ! | | 103122 | ]<br> | <br> s1 | <br> s2 | l<br>i | | 100111 | 1 | 3 = | 34 | <u> </u> | ### INSTRUCTIONS 104 - 105 | Result | Operand | Description | Machine<br>Instruction | |--------|---------|--------------------------------------------------|------------------------| | si | sj+sk | Integer sum of $(s_j)+(s_k)$ to $s_i$ | 104 <i>ijk</i> | | si | sj-sk | Integer difference of $(s_j)$ - $(s_k)$ to $s_i$ | 105 <i>ijk</i> | Instructions 104 and 105 perform integer arithmetic. The operands are obtained from registers $S_j$ and $S_k$ , and the result is returned to register $S_i$ . Instruction 104 reads two 64-bit scalar operands and forms the integer $\operatorname{sum}$ . Instruction 105 reads two 64-bit scalar operands and forms the integer difference. | Code Generated | Location | Result | Operand<br>20 | Comment<br>35 | |-----------------|----------|-------------|----------------|---------------| | 1104123 | | <br> s1 | <br> s2+s3 | | | <br> 105123<br> | <br> | <br> s1<br> | <br> s2-s3<br> | <br> | #### INSTRUCTIONS 106 - 107 | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------|----------------------------------------------------------------|------------------------| | s <sub>i</sub> | ps j | Population count of $(s_j)$ to $s_i$ | 106 <i>ij</i> 0 | | s <sub>i</sub> | qs <sub>j</sub> | Population count parity of (s <sub>j</sub> ) to s <sub>i</sub> | 106 <i>ij</i> 1 | | si | zsj | Leading zero count of $(s_j)$ to $s_i$ | 107 <i>ij</i> x | Instruction 106ij0 reads a 64-bit operand from the $S_j$ register and forms a count of the number of 1 bits in the operand. This count is delivered as a positive integer to the $S_j$ register. Instruction 106ij1 counts the number of bits set to 1 in the $S_j$ register. Then the low-order bit, showing the odd/even state of the result, is transferred to the low-order bit position of the $S_j$ register. The high-order 63 bits are cleared. The actual population count is not transferred. Instruction 107 reads a 64-bit operand from the $S_j$ register and forms a count of the number of leading zeros in the operand. The operand is considered a field of 64 individual bits in this operation. The resulting count can have the values 0 through 64. The result is delivered to the $S_i$ register as a positive integer. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|-------------|----------|---------| | | 1 | 10 | 20 | 35 | | <br> 106120 | !<br>! | <br> s1 | <br> ps2 | | | <br> 106121 | | <br> s1<br> | <br> qs2 | | | 107120 | | <br> s1 | <br> zs2 | | | 1 | I | | l | | #### INSTRUCTIONS 110 - 111 | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|------------------------| | s <sub>i</sub> | s <sub>i</sub> <exp< td=""><td>Shift <math>(s_i)</math> left <math>exp=64-jk</math> places to <math>s_i</math></td><td>110<i>ijk</i></td></exp<> | Shift $(s_i)$ left $exp=64-jk$ places to $s_i$ | 110 <i>ijk</i> | | s <sub>i</sub> | s <sub>i</sub> >exp | Shift $(s_i)$ right $exp=jk$ places to $s_i$ | 111 <i>ijk</i> | Instructions 110 and 111 shift 64-bit values in an S register by an amount specified by jk. Instruction 110 reads a 64-bit operand from the $S_i$ register, shifts the data to the left, and returns it to the $S_i$ register. The number of bit positions in the shift count is a constant from the instruction parcel. This constant has a value 64 minus the low-order 6 bits in the parcel. The range of this constant is 1 through 64. The CAL assembler allows, however, a range of 0 through 64. When 0 is specified, CAL changes the opcode from 110 to 111 and inserts zero into the jk field. Thus, as expected, $S_i$ is shifted zero bits. The data is shifted left in an open-ended manner. That is, zero bits are inserted from the right as bits shift off to the left. A shift count of 64 results in a word of all zeros. Instruction 111 reads a 64-bit operand from the $S_i$ register, shifts the data to the right, and returns it to the $S_i$ register. The number of bit positions in the shift count is a constant from the instruction parcel. This constant has a value equal to the low-order 6 bits in the parcel. The range of this constant is 0 through 63. The CAL assembler allows, however, a range of 0 through 64. When 64 is specified, CAL changes the opcode from 111 to 110 and inserts zero into the jk field. Thus, as expected, $S_i$ is zeroed. The data is shifted right in an open-ended manner. That is, zero bits are inserted from the left as bits shift off to the right. | Code Generated | Location | Result | Operand | Comment | |-----------------|------------|---------|-------------------|---------| | | 1 | 10 | 20 | 35 | | <br> 110177 | | <br> s1 | <br> s1<1 | | | 1111100 | | s1 | s1<0 | | | <br> 111302 | <br> | <br> s3 | <br> s3>2 | | | <br> 110300<br> | | <br> s3 | <br> s3>d'64 | | | <br> 110300<br> | . <b>!</b> | s3<br> | <br> s3>o'100<br> | | ### INSTRUCTIONS 112 - 113 | Result | Operand | Description | Machine<br>Instruction | |----------------|-------------------------------------------------|-----------------------------------------------------------------------------------------------|------------------------| | s <sub>i</sub> | s <sub>i</sub> ,s <sub>j</sub> <a<sub>k</a<sub> | Shift ( $s_i$ and $s_j$ ) left ( $a_k$ ) places to $S_i$ | 112 <i>ijk</i> | | s <sub>i</sub> | sj,si>ak | Shift ( $\mathbf{s}_i$ and $\mathbf{s}_j$ ) right ( $\mathbf{a}_k$ ) places to $\mathbf{s}_i$ | 113 <i>ijk</i> | Instructions 112 and 113 shift 128-bit values formed from two S registers. The data is shifted in an open-ended manner. That is, as bits shift off one end of the register, zeros are inserted in the other end. Instruction 112 reads two 64-bit operands from registers $S_i$ and $S_j$ . The data is concatenated in a 128-bit field with the low-order bit of $S_i$ next to the high-order bit of $S_j$ data. Instruction 113 reads two 64-bit operands from registers $S_i$ and $S_j$ . The data is concatenated in a 128-bit field with the low-order bit of $S_j$ next to the high-order bit of $S_i$ data. The result field is taken from the 64-bit window corresponding to the original $\mathbf{S}_i$ data. The shift count is read from the $\mathbf{A}_k$ register. The A register contents is treated as a 32-bit positive integer. Shift counts greater than or equal to 128 result in a zero data field, a shift count of 64 results in the $\mathbf{S}_j$ data, and a shift count of 0 results in the original $\mathbf{S}_i$ data. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|-----------|----------------------------------|---------| | | 1 | 10 | 20 | 35 | | | | <br> | | | | 112123 | Ī | <br> _ 1 | 1-1 -2 2 | ļ | | 112123 | 1 | s1 | s1,s2 <a3< td=""><td></td></a3<> | | | | 1 | | 1 | | | 113123 | | s1 | s2,s1>a3 | 1 | | | | 1 | i | i | | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|---------------------------------|------------------------| | s <sub>i</sub> | ∨m | Transmit (vm) to s <sub>i</sub> | 114 <i>i</i> xx | Instruction 114 reads the 64-bit mask from the VM register and enters it into the $\mathbf{S}_{\hat{I}}$ register. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------|---------|---------| | | 1 | 10 | 20 | 35 | | 1 | ĺ | İ | İ | j | | 114100 | 1 | s1 | vm | 1 | | I | | | 1 | | | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-----------------------------------|------------------------| | sį | rt | Transmit real-time count to $s_i$ | 115 <i>i</i> x x | Instruction 115 reads the 64-bit real-time clock and enters the count into the $\mathbf{S}_{i}$ register. | Code Generated | Location<br>1 | Result | Operand<br>20 | Comment<br>35 | |-----------------|---------------|-------------|---------------|---------------| | <br> 115100<br> | | <br> s1<br> | <br> rt<br> | <br> | #### INSTRUCTIONS 116 - 117 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|-------------------------------------------------------|----------------------------------| | si | ехр | Load s <sub>i</sub> with a value | 116 <i>ijk</i> | | s <sub>i</sub> | | Load $s_i$ with a 6-bit value Load $s_i$ with a 6-bit | 116 <i>ijk</i><br>116 <i>ijk</i> | | sį | ехр | positive value Load s <sub>i</sub> with a value | 117 <i>ijk</i> | | si | exp,s | Load s <sub>i</sub> with a 6-bit value | 117 <i>ijk</i> | | s <sub>i</sub> | exp,s,m | Load $s_i$ with a 6-bit negative value | 117 <i>ijk</i> | The $S_i$ exp instruction maps into either an 050, 051, 052, 053, 116, or a 117 opcode. If all the symbols within the expression have been previously defined within the currently enabled qualifier, CAL maps this instruction into the proper opcode with the fewest number of parcels into which the expression will fit. Otherwise, this instruction is mapped into the 053 opcode. CAL maps the $S_i$ exp,S instruction into the 117 opcode if the expression is negative and has a relative attribute of absolute. Otherwise, this instruction is mapped into the 116 opcode. Instructions 116 and 117 form a 64-bit word from the jk data in the instruction parcel. The low-order 6 bits are copied from the instruction parcel. The result is delivered to the $\mathbf{S}_i$ register. For instruction 116, the high-order bits are zeros. For instruction 117, the high-order bits are ones. | Code Generated | Location | | Operand | Comment | |-------------------------------------|-----------------|-------------------|----------------------|------------------------------------------| | | 1 | 10 | 20 | 35 | | 116101 | | <br> s1 | 1 | | | 116102 | 1 | <br> s1 | <br> 2,s | | | 116104 | | s1 | 4,s,p | | | 117177 | | s1 | -1 | | | 117177 | | s1 | -1,s | | | 117106 | | s1 | 6,s,m | | | | | | | | | 116404 | ;<br> <br> | s4<br> | possym,s | | | 116004 | İ | s0 | possym,s,p | | | 117504 | İ | s5<br> | possym,s,m | | | 053100<br>0000000000000000000000000 | <br>4 | s1<br> | possym | ; forward<br> ; reference | | 4 | possym<br> <br> | <br> =<br> <br> - | 4<br> <br> <br> | ; symbol with<br> ; positive<br> ; value | | 116704 | <br> <br> <br> | <br> s7<br> <br> | possym<br> <br> <br> | ; backward<br> ; reference<br> | | 117675 | | <br> s6<br> | <br> negsym,s | | | 116375 | | s3 | negsym,s,p | | | 117275 | | s2<br> | negsym,s,m | | | 053700<br>17777777777777777777777 | <br> | <br> s7<br> | negsym<br> | ; forward<br> ; reference | | -3 | negsym<br> <br> | <br> =<br> <br> | -3<br> <br> | ; symbol with<br> ; negative<br> ; value | | 117175 | ;<br> <br> | s1<br> | negsym<br> | ; backward<br>; reference | #### INSTRUCTIONS 120 - 121 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------------------------------|-----------------------------------------------------------|------------------------| | s <sub>i</sub> | s <sub>j</sub> +fs <sub>k</sub> | Floating-point sum of $(s_j)$ and $(s_k)$ to $s_i$ | 120 <i>ijk</i> | | s <sub>i</sub> | s <sub>j</sub> -fs <sub>k</sub> | Floating-point difference of $(s_j)$ and $(s_k)$ to $s_i$ | 121 <i>ijk</i> | Instructions 120 and 121 perform floating-point arithmetic operations. Instruction 120 forms the 64-bit floating-point sum of two 64-bit floating-point operands read from registers $S_j$ and $S_k$ . The result is delivered to the $S_j$ register. Instruction 121 forms the 64-bit floating-point difference of two 64-bit floating-point operands. The minuend is read from the $\mathbf{S}_j$ register and the subtrahend from the $\mathbf{S}_k$ register. The result is delivered to the $\mathbf{S}_i$ register. Subsection 2.4.8, Floating-point Add Functional unit, describes special case treatment of instructions 120 and 121. | Code Generated | Location 1 | Result | Operand<br>20 | Comment | |----------------|------------|-------------|---------------|---------| | 120123 | | <br> s1 | <br> s2+fs3 | | | <br> 121123 | | <br> s1<br> | <br> s2-fs3 | | ### INSTRUCTIONS 122 - 123 | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------|---------------------------------------------------------------------|------------------------| | s <sub>i</sub> | fix,s <sub>k</sub> | Convert $(s_k)$ from floating point to integer and enter into $s_i$ | 122 <i>i</i> xk | | si | flt,s <sub>k</sub> | Convert $(s_k)$ from integer to floating point and enter into $s_i$ | 123 <i>ixk</i> | Instructions 122 and 123 perform conversions between floating-point and integer (fixed-point) formats. Instruction 122 reads a floating-point operand from the $\mathbf{S}_{k}$ register and delivers an integer result to the $\mathbf{S}_{i}$ register. The conversion from floating-point to integer is accomplished by adding the operand to a constant in the Floating-point Add unit. The result is then sign extended to form a 64-bit integer. Instruction 123 reads an integer operand from the $\mathbf{S}_k$ register and delivers a floating-point result to the $\mathbf{S}_i$ register. The conversion from integer to floating-point is accomplished by adding the operand to a constant in the Floating-point Add unit. Subsection 2.4.8, Floating-point Add Functional unit, describes special case treatment of instructions 122 and 123. | Code Generated | Location<br>1 | Result<br>10 | Operand<br>20 | Comment<br>35 | |-----------------|---------------|--------------|---------------|---------------| | <br> 122102<br> | | s1 | <br> fix,s2 | <br> | | 123102 | | s1 | flt,s2 | <br> | ### INSTRUCTIONS 124 - 125 | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------|--------------------------------------------------------|------------------------| | s <sub>i</sub> | sj*fs <sub>k</sub> | Floating-point product of $(s_j)$ and $(s_k)$ to $s_i$ | 124 <i>ijk</i> | | | | Executes same as 124 <i>ijk</i> | 125 <i>ijk</i> | Instruction 124 forms the 64-bit floating-point product of two 64-bit floating-point operands. The operands are read from registers $S_i$ and $S_k$ . The result is delivered to the $S_i$ register. Subsection 2.4.9, Floating-point Multiply Functional unit, describes special case treatment of instruction 124. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|---------|----------------------|---------| | | 1 | 10 | 20 | 35 | | <br> 124123 | | <br> s1 | <br> s2 <b>*</b> fs3 | | | 1 | 1 | | 1 | ł | ### INSTRUCTIONS 126 - 127 | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------|------------------------------------------------------------------|------------------------| | s <sub>i</sub> | sj*isk | Reciprocal iteration of $2-(s_j)*(s_k)$ to $s_i$ | 126 <i>ijk</i> | | si | sj*qs <sub>k</sub> | Reciprocal square root iteration of $[3-(s_j)*(s_k)]/2$ to $s_i$ | 127 <i>ijk</i> | Instruction 126 forms the 64-bit floating-point quantity used in the reciprocal iteration algorithm. The operands are read from registers $\mathbf{S}_j$ and $\mathbf{S}_k$ . The result is delivered to the $\mathbf{S}_i$ register. Instruction 127 forms a floating-point quantity used in the reciprocal square root iteration algorithm. The operands are read from registers $S_j$ and $S_k$ . The result is delivered to the $S_j$ register. See subsection 2.4.9, the Floating-point Multiply Functional unit, for a description of this sequence. \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* #### CAUTION Instruction 126 should be used only with the reciprocal approximation instruction (132), and instruction 127 should be used only with the reciprocal square root approximation instruction (133). \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* | Code Generated | Location | Result | Operand | Comment | |--------------------|----------|------------|-------------|---------------------------| | | 1 | 10 | 20 | 35 | | i<br>İ | i | Ì | 1 | İ | | 126123 | ! | s1 | s2*is3 | | | <br> 127112 | l<br>I | <br> s1 | <br> s1*qs2 | 1 | | | | | | | | | İ | ĺ | 1 | Ì | | | Divide | e Sequence | 1 | | | [<br>[ | | l<br>1 | j<br>I | I<br>I | | 052100 10001300000 | i | s1 | 16. | 1 | | | ! | | 1 | ! | | 052200 10000700000 | ļ | s2<br> | 4. | | | 132320 | i | ı<br> s3 | /hs2 | ; reciprocal | | | İ | ĺ | Ì | ; approx. | | 1126422 | | | 1-24:-2 | | | 126423<br> | 1 | s4<br> | s2*is3 | ; correction<br> ; factor | | | ì | <br> | i | | | 124534 | 1 | s5 | s3*fs4 | ; reciprocal | | 1124615 | | | | | | 124615<br> | i<br>i | s6<br> | s1*fs5 | ; quotient | ## INSTRUCTIONS 130 - 131 | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------|--------------------------------------------------|------------------------| | s <sub>i</sub> | a <sub>k</sub> | Transmit $(a_k)$ to $s_i$ with no sign extension | 130 <i>ixk</i> | | si | +a <sub>k</sub> | Transmit $(a_k)$ to $s_i$ with sign extension | 131 <i>ixk</i> | Instructions 130 and 131 read a 32-bit operand from the ${\bf A}_k$ register and transmit it to the ${\bf S}_i$ register. Instruction 130 zero-fills the high-order 32 bits, creating a 64-bit result. Instruction 131 fills the high-order 32 bits with copies of bit $2^{31}$ , creating a 64-bit result. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|-----------|----------|---------| | | 1 | 10 | 20 | 35 | | | ĺ | <u> </u> | | | | 130102 | i | s1 | la2 | i | | | i | , ~ -<br> | 0- | i | | 131102 | <u> </u> | s1 | <br> +a2 | l<br>I | #### INSTRUCTIONS 132 - 133 | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-------------------------------------------------------------------------|------------------------| | si | /hsj | Floating-point reciprocal approximation of $(s_j)$ to $s_i$ | 132 <i>ij</i> x | | sį | *qsj | Floating-point reciprocal square root approximation of $(s_j)$ to $s_i$ | 133 <i>ijx</i> | Instruction 132 forms a floating-point first approximation to the reciprocal of a floating-point operand. The operand is read from the $S_j$ register, and the result is delivered to the $S_j$ register. Instruction 133 forms a floating-point first approximation to the reciprocal square root of a floating-point operand. The operand is read from the $S_j$ register, and the result is delivered to the $S_j$ register. See subsection 2.4.9, Floating-point Multiply Functional unit, for details of the sequence. | Code Generated | Location | Result | Operand | Comment | |--------------------|----------|----------------------|----------------------|--------------------| | | 1 | 10 | 20 | 35 | | | i | İ | İ | į l | | 132120 | ! | s1 | /hs2 | | | 133120 | | <br> s1 | <br> *qs2 | | | | 1 | [ | | | | | | ।<br>Root Seque<br>। | ence<br>I | 1 | | | <br> | !<br> | | | | 052100 10001300000 | İ | s1 | 116. | ! ! | | <br> 133210 | | <br> s2 | <br> <b>*</b> qs1 | <br> ; square root | | | İ | ! | ļ , | ; approx. | | <br> 124312 | <br> | <br> s3 | <br> s1 <b>*</b> fs2 | <br> ; half-prec. | | | į | İ | ! | ; square root | | <br> 127423 | i<br>I | <br> s4 | <br> s2 <b>*</b> qs3 | <br> ; square root | | İ | ! | ! | ! | ; iteration | | <br> 124534 | | <br> s5 | <br> s3*fs4 | <br> ; square root | | <del>-</del> | i | İ | İ | <u> </u> | # INSTRUCTIONS 134 - 137 | Result | Operand | Description | Machine<br>Instruction | |--------|---------|--------------|------------------------| | | | Pass<br>Pass | 134xxx<br>135xxx | | | | Pass | 136xxx | | | | Pass | 137xxx | Instructions 134 through 137 issue without functional activity. The assembler does not use these instructions. See the 076 opcode. ## INSTRUCTIONS 140 and 141 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|--------------------------------------------------|------------------------| | v <sub>i</sub> | sj&∨k | Logical products of $(s_j)$ and $(v_k)$ to $v_i$ | 140 <i>ijk</i> | | v <sub>i</sub> | vj&vk | Logical products of $(v_j)$ and $(v_k)$ to $v_i$ | 141 <i>ijk</i> | Instruction 140 reads a stream of vector elements from the $\mathbf{V}_k$ register, processes the data in the Vector Logical unit, and delivers a stream of result elements to register $\mathbf{V}_i$ . Data is read from the $\mathbf{S}_j$ register and is held in the Vector Logical unit during the streaming operation. Instruction 141 reads two sets of vector elements, processes them in the Vector Logical unit, and delivers result elements to register $\mathbf{V}_i$ . The source streams are from the $\mathbf{V}_j$ and $\mathbf{V}_k$ registers. For both instructions, the VL register determines the number of operations performed. Each element of the vector is processed independent of the other elements in the stream. A bit-by-bit logical product is formed between the two source operands. The resulting 64 logical products are then delivered as one element to the destination stream. | Code Generated | Location 1 | Result<br>10 | Operand<br>20 | Comment<br>35 | |-----------------|------------|--------------|----------------|---------------| | 1140123 | | <br> v1 | <br> s2&v3 | | | <br> 141123<br> | | <br> v1<br> | <br> v2&v3<br> | | #### INSTRUCTIONS 142 and 143 | Result | Operand | Description | Machine<br>Instruction | |----------------|-------------------|-----------------------------------------------------|------------------------| | v <sub>i</sub> | sj\v <sub>k</sub> | Logical differences of $(s_j)$ and $(v_k)$ to $v_i$ | 142 <i>ijk</i> | | v <sub>i</sub> | vj\vk | Logical differences of $(v_j)$ and $(v_k)$ to $v_i$ | 143 <i>ijk</i> | | v <sub>i</sub> | 0 | Clear v <sub>i</sub> | 1431111 | ## † Special syntax form Instruction 142 reads a stream of vector elements from register $\mathbf{V}_k$ , processes the data in the Vector Logical unit, and delivers a stream of result elements to the $\mathbf{V}_i$ register. Data is read from the $\mathbf{S}_j$ register and is held in the Vector Logical unit during the streaming operation. Instruction 143 reads two streams of vector elements, processes them in the Vector Logical unit, and delivers a stream of result elements to register $V_i$ . The source streams are from registers $V_i$ and $V_k$ . For both instructions, the VL register determines the operation length. Each element of the vector stream is processed independent of the other elements in the stream. A bit-by-bit logical difference is formed between the two source operands. The resulting 64 logical differences are delivered as one element to the destination stream. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|----------|------------|---------| | | 1 | 10 | 20 | 35 | | 142123 | 1 | <br> v1 | <br> s2\v3 | | | 143123 | 1 | <br> v1 | <br> v2\v3 | | | 143666 | <br> | <br> v6 | 10 | | | | | | | 1 | #### INSTRUCTIONS 144 and 145 | Result | Operand | Description | Machine<br>Instruction | |----------------|----------------|----------------------------------------------|------------------------| | vi | sj!vk | Logical sums of $(s_j)$ and $(v_k)$ to $v_i$ | 144 <i>ijk</i> | | v <sub>i</sub> | sj | Copy (s $_j$ ) to v $_i$ | 144 <i>iji</i> † | | vi | $v_j!v_k$ | Logical sums of $(v_j)$ and $(v_k)$ to $v_i$ | 145 <i>ijk</i> | | v <sub>i</sub> | v <sub>j</sub> | v register copy $(j=k)$ | 145 <i>ijj</i> | ## \* Special syntax form Instruction 144 reads a stream of vector elements from register $\mathbf{V}_k$ , processes the data in the Vector Logical unit, and delivers a stream of result elements to the $\mathbf{V}_i$ register. Data is read from the $\mathbf{S}_j$ register and is held in the Vector Logical unit during the streaming operation. Instruction 145 reads two streams of vector elements, processes them in the Vector Logical unit, and delivers a stream of result elements to register $V_i$ . The source streams are from registers $V_i$ and $V_k$ . For both instructions, the VL register determines the operation length. Each element of the vector stream is processed independent of the other elements in the stream. A bit-by-bit logical sum is formed between the two source operands. The resulting 64 logical sums are delivered as one element to the destination stream. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------|------------|----------| | | 1 | 10 | 20 | 35 | | Ì | İ | | İ | İ | | 144123 | 1 | v1 | s2!v3 | 1 | | 1144121 | | | | ! | | 144121 | <b>i</b> | v1<br> | s2 | <u> </u> | | 145123 | | v1 | <br> v2!v3 | <br> | | İ | i | | i | İ | | 145122 | | v1 | v2 | İ | | 1 | 1 | | | 1 | #### **INSTRUCTION 146** | Result | Operand | Description | Machine<br>Instruction | |--------|-----------------------|------------------------------------------------------------|------------------------| | vi | sj!v <sub>k</sub> &vm | Transmit $(s_j)$ if vm bit=1; $(v_k)$ if vm bit=0 to $v_i$ | 146 <i>ijk</i> | Instruction 146 reads a stream of vector elements in sequence from the $\mathbf{V}_{\pmb{k}}$ register, processes the data in the Vector Logical unit, and delivers a stream of result elements to the $\mathbf{V}_i$ register. Data is read from the $\mathbf{S}_j$ register and is held in the Vector Logical unit during the streaming operaton. The contents of the VL register determine the vector stream length. The VM register works as a control mechanism to select either the S register data or the vector element data as each element arrives at the Vector Logical unit. A bit of VM register data is associated with each element. The high-order bit of VM data is associated with the first vector element. The following bits of VM register data correspond with the following vector elements. The S register data is selected as a result element if the VM register contains a 1 in the designated element position. The $\mathbf{V}_k$ register element is selected as a result element if the VM register contains a 0 in the designated element position. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------|----------|---------| | | 1 | 10 | 20 | 35 | | [ | ĺ | ĺ | ĺ | İ | | 146123 | | v1 | s2!v3&vm | 1 | | 1 | | | | 1 | ## INSTRUCTION 147 | Result | Operand | Description | Machine<br>Instruction | |--------|----------|--------------------------------------------------------------|------------------------| | vi | vj!vk&vm | Transmit $(v_j)$ if vm bit=1; $(v_k)$ if vm bit=0 to $v_i$ . | 147 <i>ijk</i> | Instruction 147 reads two streams of vector elements, processes them in the Vector Logical unit, and delivers a stream of result elements to the $\mathbf{V}_i$ register. The source streams are from registers $\mathbf{V}_j$ and $\mathbf{V}_k$ . The contents of the VL register determine the length of each vector stream. The VM register works as a control mechanism to select either the $V_j$ data or the $V_k$ data as each element pair arrives at the Vector Logical unit. A bit of VM register data is associated with each element. The high-order bit of VM data is associated with the first vector element. The following bits of VM register data correspond with the following vector elements. The $V_j$ data is selected as a result element if the VM register contains a 1 in the designated element position. The $V_k$ register element is selected as a result element if the VM register contains a 0 in the designated element position. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|---------|---------------|---------| | | 1 | 10 | 20 | 35 | | <br> 147123 | | <br> v1 | <br> v2!v3&vm | ]<br> | | i | i | i<br>i | i | i | #### INSTRUCTIONS 150 and 151 | Result | Operand | Description | Machine<br>Instruction | |----------------|------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|------------------------| | v <sub>i</sub> | vj <ak< td=""><td>Shift <math>(v_j)</math> left <math>(a_k)</math> bits with zero-fill, results to <math>v_i</math></td><td>150<i>ijk</i></td></ak<> | Shift $(v_j)$ left $(a_k)$ bits with zero-fill, results to $v_i$ | 150 <i>ijk</i> | | v <sub>i</sub> | vj>ak | Shift $(v_j)$ right $(a_k)$ bits with zero-fill, results to $v_i$ | 151 <i>ijk</i> | Instructions 150 and 151 read a stream of vector elements in sequence from the $V_j$ register, process the data in the Vector Integer unit, and deliver a stream of result elements to the $V_i$ register. Data is read from the $A_k$ register and is held in the Vector Integer unit during the streaming operation. The contents of the VL register determine the vector stream length. Instruction 150 shifts data to the left and instruction 151 shifts data to the right. Each element of the vector stream is processed independent of the other elements in the stream. Each element is shifted by the number of bit positions indicated by the $A_{k}$ register value. Zero bits are inserted as bits shift off. The contents of the ${\bf A}_{k}$ register is treated as a 32-bit positive integer. Shift counts equal to or greater than 64 cause a zero data field. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|---------|-------------------------------------------|---------| | | 1 10 | 10 | 0 20 | 35 | | 150123 | | <br> v1 | <br> v2 <a3< td=""><td> <br/> </td></a3<> | <br> | | 151123 | <br> | <br> v1 | <br> v2>a3 | 1 | #### INSTRUCTIONS 152 and 153 | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------------|----------------------------------------------------|------------------------| | v <sub>i</sub> | vj,vj <sup>(a</sup> k | Double shift $(v_j)$ left $(a_k)$ places to $V_i$ | 152 <i>ijk</i> | | v <sub>i</sub> | vj,vj>ak | Double shift $(v_j)$ right $(a_k)$ places to $v_i$ | 153 <i>ijk</i> | Instructions 152 and 153 process the elements of data from the $V_j$ register in pairs for this sequence. Each element is concatenated with the following element and the resulting 128-bit field is shifted by the number of bit positions in the $A_k$ register data. A 64-bit field from the original element window is then delivered to the destination vector stream. Instruction 152 shifts data to the left. The first element of $V_j$ data is positioned in the high-order 64 bits of the 128-bit shift field. The second element of $V_j$ data is positioned in the low-order 64 bits of the 128-bit shift field. The 128-bit field then shifts left by the amount of the shift count. A first result element is read from that portion of the 128-bit field originally occupied by the first element of data. The second element of $V_j$ data is then positioned in the higher portion of the 128-bit shift field. The third element of $V_j$ data is entered in the low-order 64 bits of the field. This 128-bit field is then shifted left by the amount of the shift count. A second result element is read from the high-order 64 bits of the 128-bit field originally occupied by the second element of data. This process continues until the last element of data is entered in the high-order 64 bits of the 128-bit shift field. A zero field is entered in the low-order 64 bits. This 128-bit field is then shifted left by the amount of the shift count. The last result element is read from the upper portion of the shift field. The ${\bf A}_k$ register contents is treated as a 32-bit positive integer. Shift counts greater than 128 result in a zero data field. Zero bits are inserted at the right end of the 128-bit shift field as bits are shifted off to the left. #### INSTRUCTIONS 152 and 153 (continued) Instruction 153 shifts data to the right. The first element of $V_j$ data is positioned in the low-order 64 bits of the 128-bit shift field. The high-order 64 bits of the 128-bit shift field is cleared. The 128-bit field then shifts to the right by the amount of the shift count. A first result element is read from the low-order 64 bits of the 128-bit field originally occupied by the first element of data. The second element of $V_j$ data is then positioned in the lower portion of the 128-bit shift field. The first element of $V_j$ data is entered in the high-order 64 bits of the field. This 128-bit field is then shifted right by the amount of the shift count. A second result element is read from the low-order 64 bits of the 128-bit field originally occupied by the second element of data. This process continues until the last element of data is entered in the low-order 64 bits of the 128-bit shift field. The preceding element is entered in the high-order 64 bits. This 128-bit field is then shifted right by the amount of the shift count. The last result element is read from the low-order 64 bits of the field. The ${\bf A}_{k}$ register contents is treated as a 32-bit positive integer. Shift counts greater than 128 result in a zero data field. Zero bits are inserted at the left end of the 128-bit shift field as bits are shited off to the right. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|---------|----------------------------------------|---------| | | 1 | 10 | 20 | 35 | | <br> 152123 | 1 | <br> v1 | <br> v2,v2 <a3< td=""><td> </td></a3<> | | | 1<br> 153123 | | <br> v1 | <br> v2,v2>a3 | <br> | | | | | 1 | 1 | ## **INSTRUCTION 154** | Result | Operand | Description | Machine<br>Instruction | |--------|--------------------|--------------------------------------------------------|------------------------| | vi | sj*fv <sub>k</sub> | Floating-point product of $(s_j)$ and $(v_k)$ to $v_i$ | 154 <i>ijk</i> | Instruction 154 reads a stream of vector elements in sequence from the $\mathbf{V}_k$ register, processes the data in the Floating-point Multiply unit, and delivers a stream of result elements to the $\mathbf{V}_i$ register. Data is read from the $\mathbf{S}_j$ register and is held in the Floating-point Multiply unit during the streaming operation. The contents of the VL register determine the vector stream length. Each element of the vector stream is processed independent of the other elements in the stream. The Floating-point Multiply unit forms the 64-bit floating-point product of the arriving vector element and the scalar operand held in the unit. The result element is delivered to the $V_i$ register. See subsection 2.4.9, Floating-point Multiply Functional unit, for details and special case treatment. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------|-----------------|---------| | | 1 | 10 | 20 | 35 | | 1 | İ | | 1 | 1 | | 154123 | | v1 | s2 <b>*</b> fv3 | 1 | | 1 | 1 | 1 | | 1 | #### INSTRUCTION 155 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|--------------------------------------------------------|------------------------| | v <sub>i</sub> | vj*f∨k | Floating-point product of $(v_j)$ and $(v_k)$ to $v_i$ | 155 <i>ijk</i> | Instruction 155 reads two streams of vector elements, processes them in the Floating-point Multiply unit, and delivers a result stream to the $V_i$ register. The source streams are from registers $V_j$ and $V_k$ . The VL register determines the length of each vector stream. Each element of the vector stream is processed independent of the other elements in the stream. The Floating-point Multiply unit forms the 64-bit floating-point product of the arriving vector elements. The result element is delivered to the $\mathbf{V}_i$ register. See subsection 2.4.9, Floating-point Multiply Functional unit, for details and special case treatment. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------|---------|---------| | | 1 | 10 | 20 | 35 | | 1 | Ì | | į | İ | | 155123 | | v1 | v2*fv3 | Ì | | 1 | 1 | | | 1 | #### INSTRUCTIONS 156 and 157 | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|------------------------------------------------------------------|------------------------| | v <sub>i</sub> | vj*ivk | Reciprocal iteration of $2-(v_j)*(v_k)$ to $v_i$ | 156 <i>ijk</i> | | v <sub>i</sub> | vj*qvk | Reciprocal square root iteration of $[3-(v_j)*(v_k)]/2$ to $v_i$ | 157 <i>ijk</i> | Instructions 156 and 157 read two streams of vector elements, process them in the Floating-point Multiply unit, and deliver a result stream to the $V_i$ register. The source streams are from registers $V_j$ and $V_k$ . The contents of the VL register determine the length of each vector stream. For instruction 156, the Floating-point Multiply unit forms a 64-bit floating-point quantity used in the reciprocal iteration algorithm from each pair of arriving vector elements. For instruction 157, the Floating-point Multiply unit forms a 64-bit floating-point quantity used in the reciprocal square root iteration algorithm from each pair of arriving elements. See subsection 2.4.9, Floating-point Multiply Functional unit, for details and special case treatment. | Code Generated | Location | Result | Operand<br>20 | Comment | |-----------------|----------|-------------|-----------------|---------| | <br> 156123 | İ | <br> v1 | <br> v2*iv3 | į | | <br> 157123<br> | | <br> v1<br> | <br> v2*qv3<br> | | ## INSTRUCTIONS 160 and 161 | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------------------|----------------------------------------------|------------------------| | v <sub>i</sub> | sj+∨k | Integer sums of $(s_j)$ and $(v_k)$ to $v_i$ | 160 <i>ijk</i> | | v <sub>i</sub> | <sup>∨</sup> j+ <sup>∨</sup> k | Integer sums of $(v_j)$ and $(v_k)$ to $v_i$ | 161 <i>ijk</i> | Instruction 160 reads a stream of vector elements from the $\mathbf{V}_k$ register, processes the data in the Vector Integer unit, and delivers a stream of result elements to the $\mathbf{V}_i$ register. Data is read from the $\mathbf{S}_j$ register and is held in the Vector Integer unit during the streaming operation. Instruction 161 reads two streams of vector elements, processes them in the Vector Integer unit, and delivers a stream of result elements to the $\mathbf{V}_i$ register. The source streams are from registers $\mathbf{V}_i$ and $\mathbf{V}_k$ . For both instructions, the VL register determines the vector stream length. Each element of the vector stream is processed independent of the other elements in the stream. The Vector Integer unit forms the integer sum of the two operands. The result is delivered as one element of the destination stream. | Code Generated | Location 1 | Result | Operand<br>20 | Comment<br>35 | |-----------------|------------|-------------|----------------|---------------| | 160123 | | <br> v1 | <br> s2+v3 | İ | | <br> 161123<br> | | <br> v1<br> | <br> v2+v3<br> | <br> | ## INSTRUCTIONS 162 and 163 | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------|------------------------------------------------------------------|------------------------| | v <sub>i</sub> | sj-vk | Integer differences of (s $_j$ ) and (v $_k$ ) to v $_i$ | 162 <i>ijk</i> | | vi | $ v_{j}-v_{k} $ | Integer differences of $(v_j)$ and $(v_k)$ to $v_i$ | 163 <i>ijk</i> | | v <sub>i</sub> | -v <sub>k</sub> | Copies twos complement of $(\mathbf{v}_{k})$ to $\mathbf{v}_{i}$ | 163 <i>iik</i> † | <sup>†</sup> Special syntax form Instruction 162 reads a stream of vector elements from $\mathbf{V}_{k}$ register, processes the data in the Vector Integer unit, and delivers a stream of result elements to the $\mathbf{V}_{i}$ register. Data is read from the $\mathbf{S}_{j}$ register and is held in the Vector Integer unit during the streaming operation. Instruction 163 reads two streams of vector elements, processes them in the Vector Integer unit, and delivers a stream of result elements to the $\mathbf{V}_i$ register. The source streams are from registers $\mathbf{V}_i$ and $\mathbf{V}_k$ . For both instructions, the VL register determines the vector stream length. Each element of the vector stream is processed independent of the other elements in the stream. The Vector Integer unit forms the integer difference of the two operands. The result is delivered as one element of the destination stream. | Code Generated | Location 1 | Result | Operand<br>20 | Comment 35 | |----------------|------------|----------------------|---------------|------------| | 1 162123 | | <br> <br> v1 | <br> s2-v3 | | | <br> 163123 | i · | <br> <br> v1 | <br> v2-v3 | i<br>I | | <br> 163774 | | <br> <del>v</del> 7 | <br> -v4 | 1 | | | | | | | #### INSTRUCTIONS 164 - 165 | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------|---------------------------------------------|------------------------| | v <sub>i</sub> | pvj | Population counts of $(v_j)$ to $v_i$ | 164 <i>ij</i> 0 | | v <sub>i</sub> | qv <sub>j</sub> | Population count parity of $(v_j)$ to $v_i$ | 164 <i>ij</i> 1 | | v <sub>i</sub> | zv <sub>j</sub> | Leading zero count of $(v_j)$ to $v_i$ | 165 <i>ijx</i> | Instruction 164 reads a stream of vector elements in sequence from the $V_j$ register, processes the data in the Vector Integer unit, and delivers a stream of result elements to the $V_i$ register. The contents of the VL register determine the vector stream length. Each element of the vector stream is processed independent of the other elements in the stream. The Vector Integer unit counts the number of 1 bits in each vector element and delivers the count as a positive integer to the result stream. Instruction 164ij0 counts the number of bits set to 1 in each element of $V_j$ and enters the results into corresponding elements of $V_i$ . The results are entered into the low-order 7 bits of each $V_i$ element; the remaining high-order bits of each $V_i$ element are zeroed. Instruction 164ij1 counts the number of bits set to 1 in each element of $V_j$ . The least significant bit of each result shows whether the result is an odd or even number. Only the least significant bit of each result is transferred to the least significant bit position of the corresponding element of register $V_i$ . The remainder of the result is set to zeros. The actual population count results are not transferred. Instruction 165ijx reads a stream of vector elements in sequence from the $V_j$ register, processes the data in the Vector Integer unit, and delivers a stream of result elements to the $V_i$ register. The contents of the VL register determine the vector stream length. Each element of the vector stream is processed independent of the other elements in the stream. The Vector Integer unit counts the number of leading zeros in each element. The element is considered as a field of 64 individual bits in this operation. This count is delivered as a positive integer to the result stream. | Code Generated | Location | Result | Operand | Comment | |----------------|------------|---------|-----------|---------| | | 1 | 10 | 20 | 35 | | 164120 | | <br> v1 | <br> pv2 | | | 164121 | 1<br> <br> | <br> v1 | <br> qv2 | | | 165120 | | <br> v1 | <br> zv2 | i | | | | | l | 1 | #### INSTRUCTIONS 166 - 167 | Result | Operand | Description | Machine<br>Instruction | |----------------|------------------|--------------------------------------------------------------------------|------------------------| | vi | /hv <sub>k</sub> | Floating-point reciprocal approximations of $(v_k)$ to $v_i$ | 166 <i>ixk</i> | | v <sub>i</sub> | *qv <sub>k</sub> | Floating-point reciprocal square root approximations of $(v_k)$ to $v_i$ | 167 <i>ixk</i> | Instruction 166 and 167 read a stream of vector elements in sequence from the $\mathbf{V}_{k}$ register, process the data in the Floating-point Multiply unit, and deliver a stream of result elements to the $\mathbf{V}_{i}$ register. The contents of the VL register determines the length of the vector stream. See subsection 2.4.9, Floating-point Multiply Functional unit, for details of this sequence. For instruction 166, the Floating-point Multiply unit forms a floating-point quantity which is a first approximation to the reciprocal of the arriving vector element. For instruction 167, the Floating-point Multiply unit forms a floating-point quantity which is a first approximation to the reciprocal square root of the arriving vector element. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------|--------------|---------| | | 1 | 10 | 20 | 35 | | | j | İ | İ | İ | | 166102 | 1 | v1 | /hv2 | ĺ | | | I | l | 1 | 1 | | 167103 | | v1 | <b>*</b> qv3 | 1 | | | | | | 1 | ## INSTRUCTIONS 170 - 171 | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------|----------------------------------------------------|------------------------| | $v_i$ | $s_{j}$ +f $v_{k}$ | Floating-point sum of $(s_j)$ and $(v_k)$ to $v_i$ | 170 <i>ijk</i> | | v <sub>i</sub> | vj+fv <sub>k</sub> | Floating-point sum of $(v_j)$ and $(v_k)$ to $v_i$ | 171 <i>ijk</i> | Instruction 170 reads a stream of vector elements in sequence from the $\mathbf{V}_{k}$ register, processes the data in the Floating-point Add unit, and delivers a stream of result elements to the $\mathbf{V}_{i}$ register. Data is read from the $\mathbf{S}_{j}$ register and is held in the Floating-point Add unit during the streaming operation. Instruction 171 reads two streams of vector elements, processes them in the Floating-point Add unit, and delivers a result stream to the $V_i$ register. The source streams are from registers $V_j$ and $V_k$ . For both instructions, the contents of the VL register determine the vector stream length. Each element of the vector stream is processed independent of the other elements in the stream. The Floating-point Add unit forms the 64-bit floating-point sum of the two operands. The result is delivered to register $V_i$ . See subsection 2.4.8, Floating-point Add Functional unit, for details and special case treatment. | Code Generated | Location | Result<br>10 | Operand<br>20 | Comment | |----------------|------------|--------------|-----------------|---------| | 170123 | <br> | <br> v1 | <br> s2+fv3<br> | | | 171123<br> | !<br> <br> | <br> v1<br> | <br> v2+fv3<br> | !<br> | #### INSTRUCTIONS 172 - 173 | Result | Operand | Description | Machine<br>Instruction | |------------------|---------------------------------|-----------------------------------------------------------|------------------------| | $ \mathbf{v}_i $ | s <sub>j</sub> -fv <sub>k</sub> | Floating-point difference of $(s_j)$ and $(v_k)$ to $v_i$ | 172 <i>ijk</i> | | $ \mathbf{v}_i $ | v <sub>j</sub> -fv <sub>k</sub> | Floating-point difference of $(v_j)$ and $(v_k)$ to $v_i$ | 173 <i>ijk</i> | | v <sub>i</sub> | -fv <sub>k</sub> | Copy normalized negative of $(v_k)$ to $v_i$ | 173 <i>iik</i> † | <sup>†</sup> Special syntax form Instruction 172 reads a stream of vector elements in sequence from the $\mathbf{V}_{k}$ register, processes the data in the Floating-point Add unit, and delivers a stream of result elements to the $\mathbf{V}_{i}$ register. Data is read from the $\mathbf{S}_{j}$ register and is held in the Floating-point Add unit during the streaming operation. Instruction 173 reads two streams of vector elements, processes them in the Floating-point Add unit, and delivers a result stream to the $V_i$ register. The source streams are from registers $V_i$ and $V_k$ . For both instructions, the contents of the VL register determine the vector stream length. Each element of the vector stream is processed independent of the other elements in the stream. The Floating-point Add unit forms the 64-bit floating-point difference of the two operands. The result is delivered to register $\mathbf{V}_i$ . See subsection 2.4.8, Floating-point Add Functional unit, for details and special case treatment. | Code Generated | Location | Result | Operand | Comment | |-------------------|----------|---------|-------------|---------| | | 1 | 10 | 20 | 35 | | <br> 172123 | | <br> v1 | <br> s2-fv3 | ! | | !<br> 173123<br>! | <br> | v1 | <br> v2-fv3 | | | 1173556 | | v5 | -fv6 | | #### INSTRUCTIONS 174 - 175 | Result | Operand | Machine<br>Instruction | | |----------------|--------------------|-------------------------------------------------|----------------| | v <sub>i</sub> | fix,v <sub>k</sub> | Integer form of floating-point $(v_k)$ to $v_i$ | 174ixk | | v <sub>i</sub> | flt,v <sub>k</sub> | Floating-point form of integer $(v_k)$ to $v_i$ | 175 <i>ixk</i> | Instructions 174 and 175 read a stream of vector elements in sequence from the $\mathbf{V}_k$ register, process the data in the Floating-point Add unit, and deliver a stream of result elements to the $\mathbf{V}_i$ register. The contents of the VL register determine the vector stream length. Instruction 174 performs the conversion from floating-point to integer format by adding the operand to a constant in the Floating-point Add unit. The result is sign extended to form a 64-bit integer. Instruction 175 performs the conversion from integer to floating-point format by adding the operand to a constant in the Floating-point Add unit. The result is delivered to the $V_i$ register. See subsection 2.4.8, Floating-point Add Functional unit, for details and special case treatment. | Code Generated | Location | Result | Operand<br>20 | Comment<br>35 | |------------------|----------|---------|---------------|---------------| | <br> <br> 174102 | | <br> v1 | <br> fix,v2 | | | <br> 175102 | | <br> v1 | <br> flt,v2 | | | | 1 | | | 1 | | Result | Operand | Description | Machine<br>Instruction | |--------|----------------------|--------------------------------------------------|------------------------| | vi | ci,sj&s <sub>k</sub> | Enter $v_i$ with compressed iota $s_j$ and $s_k$ | 176 <i>ijk</i> | | | | Executes same as 176ijk | 177 ххх | Instruction 176 forms a vector from two scalar operands. The first scalar operand is a 64-bit mask from the $S_j$ register. The second scalar operand is a 32-bit vector stride from the $S_k$ register. The stride is taken from the low-order 32 bits of the $S_k$ register data. The Vector Integer unit forms a 64-element iota vector from the stride. This is a vector whose first element has a zero value, and whose subsequent elements are spaced by the stride increment. The sequence of element values is as follows: $$0*S_k$$ , $1*S_k$ , $2*S_k$ , $3*S_k$ , $4*S_k$ , $5*S_k$ , and so on The two scalar operands are captured and held in the Vector Integer unit. The $\mathbf{S}_{k}$ value is repeatedly added to the accumulated sum to form the iota vector. The 64-bit mask is shifted to the left 1 bit position per clock period. The Vector Integer unit then compresses the iota vector, using the mask data, and delivers the resulting vector to register $\mathbf{V}_{i}$ . An element of the iota vector is delivered to the result vector where there is a 1 bit in the mask. An element of the iota vector is skipped, and the position compressed, where there is a 0 bit in the mask. The resulting vector has the same number of elements as there were 1 bits in the mask. The first mask bit tested is the high-order bit. Bits are then tested in order to the low-order bit. A zero test is made on the remaining mask bits to stop the sequence. Execution time is then variable depending on the mask contents. | Code Generated | Location | Result | Operand | Comment | |----------------|----------|--------|----------|---------| | | 1 | 10 | 20 | 35 | | 1475400 | ! | | į | i | | 176123 | | v1 | ci,s2&s3 | 1 | | 1 | | | 1 | 1 | #### 4. COMMON MEMORY Common Memory contains 128 or 256 Mwords of dynamic memory. The dynamic memory consists of 128 banks. Each 72-bit word consists of 64-data bits and 8 error correction bits. Common Memory is organized into quadrants with 32 banks in each quadrant. Each memory quadrant has a data path to each of four Common Memory ports. A Background Processor and a foreground communication channel are connected to each Common Memory port. Total memory bandwidth is 64 Gbits/s. Total memory capacity is up to 17 Gbits. The Foreground Processor, Background Processors, external I/O devices, and disk controllers share Common Memory. Common Memory contains program code for the Background Processors, data for problem solution, and Foreground Processor system tables. ## 4.1 MEMORY ADDRESSING A word in memory is addressed by 32 bits. The low-order 2 bits select the quadrants and the next 5 bits select the bank. Figure 4-1 shows the format of the memory address for Common Memory. ## 4.2 MEMORY ACCESS The Background Processors are locked into a phased access time scheme with the memory quadrants through the Common Memory ports. Through its Common Memory port, a Background Processor can access any given quadrant but only in the processor's own phase time, that is, every fourth clock period (CP). If a Background Processor requests a quadrant out of its phase time, the request is delayed until the correct time. Figure 4-1. Memory Address for Common Memory For example, assume the Background Processors are A through D, and the quadrants are 0 through 3. Also assume processor A is locked into quadrant 0 at phase time 0. If processor A references quadrant 0 at phase time 1, it must wait until the next phase time 0 (CP 4) to have access to memory in that quadrant. Memory banks in a quadrant share a data path to each Common Memory port. Because of the phased access time between the quadrants and the Common Memory ports, however, only 1 bank accesses the path in a given 4-CP time slot. Because 2 banks never compete for the same data path in the same time slot, each bank functionally has an independent path to each of the four Common Memory ports. ## 4.3 MEMORY CONFLICTS To prevent memory conflicts, each memory bank has two Bank Busy flags. Each bank is divided logically into two pseudo banks. This enables quicker access to the not busy half of the bank. When a bank has been accessed it sets both of its busy flags. A long count busy applies to the pseudo bank that is actually busy, while a short count busy applies to the pseudo bank that is not. If the bank is busy, the quadrant sends a rejected signal to the requesting memory port. The requesting port retries the data. ## 4.4 MEMORY BACKUP Memory back-up occurs when too many memory references arrive at a single memory quadrant. Each Common Memory port has four quadrant buffers, one for each quadrant. Each buffer can hold two memory references for its memory quadrant. Therefore, references can continue to the memory port when the reference is not in the proper phase time. When a quadrant buffer in a memory port is filled, and another reference to that quadrant is made, the memory port begins a back-up procedure. The memory port back-up procedure stops instruction issue for the associated Background Processor if that processor is making a memory reference. Vector streams initiated in the Background Processor and associated with a Common Memory reference are held. After all references have been submitted for retry, stop issue is released allowing additional references to issue. A conflict during the retry process causes the back-up procedure to begin again at the point the conflict occurred; which could be the original back-up reference or another reference buffered during backup. #### NOTE Special timing exists for execution of Background Processor instruction 072 (the gather instruction). This instruction allows addresses in any sequence with respect to the low-order 2 bits, quadrant select. Without special treatment of this instruction, the data could arrive at the Vector Destination register out of order. Therefore, the hardware forces a maximum memory reference pattern of four references and 12 null references which averages to one reference every 4 CPs. ## 4.5 MEMORY ERROR CORRECTION A single-error correction/double-error detection (SECDED) network is used between the Background Processors and memory. SECDED ensures that data written into memory is returned to the Background Processors with consistent precision. Using SECDED, the single error alteration is automatically corrected if a single bit of a data word is altered before the data word is passed to the computer. If 2 bits of the same data word are altered, the double error is detected but not corrected. In either case, the Background Processors can be interrupted, depending on interrupt options selected, to allow processing of the error. For 3 or more bits in error, results are ambiguous. The 8 check bits and the data word are stored in memory at the same location. When read from memory, the 64-bit matrix, shown in figure 4-2, generates a new set of check bits, which are compared with the old check bits that were stored in memory. The resulting 8 comparison bits are called syndrome (S) bits. The states of these S bits are symptomatic of any error that occurred (1 = no compare). If all syndrome bits are 0, no memory error is assumed. Any change of state of a single bit in memory causes an odd number of S bits to be set to 1. A double error (an error in 2 bits) appears as an even number of S bits set to 1. The x's in the matrix of figure 2-3 determine which syndrome bit is affected by a failing memory word bit. For example, if memory word bit $2^{63}$ fails, S bits 1 through 7 are forced to ones. Each memory word bit and the S bits have a unique pattern of S bits, which identifies a failure of that bit. The matrix is designed so that: • If all syndrome bits are 0, no error is assumed. - If only 1 syndrome bit is 1, the associated check bit is in error. - If more than 1 syndrome bit is 1 and the parity of all syndrome bits is odd, then a single correctable error is assumed to have occurred. The syndrome bits can be decoded to identify the bit in error. - If 3 or more memory bits are in error, the parity of all syndrome bits is odd and results are ambiguous. - If more than 1 syndrome bit is 1 and the parity of all syndrome bits SO through S7 is even, then a double error (or an even number of bit errors) occurred within the data bits or check bits. | | | | CI | HECK | вүт | E | | | | | | | | | | | | | | | | | | | |-------------|-----------------|-----------------|-----------------|------|-----------------|-----|-----------------|-----|-----------------|-----------------|------------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----| | | 271 | 2 <sup>70</sup> | 2 <sup>69</sup> | 268 | 267 | 266 | 2 <sup>65</sup> | 264 | 263 | <b>2</b> 62 | 2 <sup>6</sup> 1 | 260 | 2 <sup>59</sup> | 258 | 2 <sup>57</sup> | 2 <sup>56</sup> | 2 <sup>55</sup> | 2 <sup>54</sup> | 2 <sup>53</sup> | 2 <sup>52</sup> | 2 <sup>51</sup> | 2 <sup>50</sup> | 249 | 248 | | check bit o | | | | | | | | x | | | | | | | | | x | x | x | x | х | × | x | x | | check bit 1 | | | | | | | х | | x | x | х | x | х | х | x | x | | | | | | | | | | check bit 2 | | | | | | × | | | x | x | x | x | x | x | x | x | × | x | x | x | x | x | x | x | | check bit 3 | | | | | x | | | | x | x | x | x | x | x | x | x | x | x | x | x | x | х | x | x | | check bit 4 | | | | x | | | | | × | | x | | x | | x | | x | | x | | x | | x | | | check bit 5 | | | x | | | | | | x | x | | | x | х | | | × | x | | | x | x | | | | check bit 6 | | x | | | | | | | × | x | x | x | | | | | x | x | x | x | | | | | | check bit 7 | x | | | | | | | | x | | | x | | x | x | | х | | | x | | x | x | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 247 | 246 | 2 <sup>45</sup> | 244 | 2 <sup>43</sup> | 242 | 241 | 240 | 2 <sup>39</sup> | 2 <sup>38</sup> | 2 <sup>37</sup> | 2 <sup>36</sup> | 2 <sup>35</sup> | 2 <sup>34</sup> | 2 <sup>33</sup> | 2 <sup>32</sup> | 2 <sup>31</sup> | 230 | 2 <sup>29</sup> | 2 <sup>28</sup> | 2 <sup>27</sup> | 2 <sup>26</sup> | 2 <sup>25</sup> | 224 | | | x | x | x | х | x | x | x | x | × | x | x | x | х | х | x | x | x | | x | | x | | x | | | | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | × | x | | | x | х | | | | | | | | | | | | | × | x | x | x | х | x | x | x | × | x | x | x | | | | | | | x | x | x | x | x | х | x | x | | | | | | | | | × | | | x | | х | x | | | | x | | x | | x | | x | | x | | x | | x | | x | | | | | | | | | | | | x | x | | | x | x | | | х | x | | | x | x | | | × | x | x | x | x | x | x | x | | | x | x | x | x | | | | | x | x | x | х | | | | | x | x | x | x | x | x | x | x | | | x | | | x | | x | x | | х | | | x | | x | x | | x | x | x | x | x | x | x | x | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2 <sup>23</sup> | 2 <sup>22</sup> | 221 | 220 | 219 | 218 | 217 | 216 | 215 | 214 | 213 | 212 | 211 | 210 | 2 <sup>9</sup> | 28 | 2 <sup>7</sup> | 26 | 2 <sup>5</sup> | 24 | 2 <sup>3</sup> | 22 | 21 | 20 | | | х | | x | | x | | x | | x | | x | | x | | x | | × | | x | | x | | x | | | | x | x | | | x | × | | | x | x | | | x | x | | | × | x | | | x | x | | | | | x | x | x | x | | | | | x | x | х | x | | | | | × | x | x | x | | | | | | | x | | | х | | x | x | | x | | | x | | x | x | | × | | | x | | x | x | | | | x | x | x | x | x | x | x | x | x | x | x | x | x | x | × | x | × | x | x | x | x | x | x | x | | | | | | | | | | | x | × | x | x | x | x | x | x | × | x | x | x | x | x | x | x | | | x | x | x | x | x | x | x | x | | | | | | | | | x | x | x | x | x | x | x | x | | | x | x | x | x | x | x | x | x . | x | × | x | x | x | x | x | x | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Figure 4-2. Error Correction Matrix 1270 #### 5. FOREGROUND SYSTEM The CRAY-2 computer system contains a foreground system to control and monitor system operations. The Foreground Processor contains the following: - Either two or four high-speed synchronous communication channels to interconnect the Background Processors, Foreground Processor, disk controllers, HSX controllers, and External I/O controllers - Foreground channel ports - Either two or four Common Memory ports to control data transfer between Common Memory and the Foreground Processor, disk storage units (DSUs), HSX controllers, and the External I/O controllers - Either two or four Background Processor ports to allow the Foreground Processor to monitor and control the Background Processors - Up to 40 I/O devices can be attached - Disk controllers to control up to 36 DSUs - External I/O controllers to connect the CRAY-2 computer system mainframe to external devices at 6 Mbyte/s (Front-end Interface) or 12 Mbyte/s (HYPERchannel) - HSX controllers to connect the CRAY-2 computer system mainframe to high-speed external devices at 100 Mbyte/s - A Foreground Processor to supervise overall system activity and respond to requests for interaction among the system members - A maintenance control console to deadstart the CRAY-2 computer system mainframe and monitor system operation ## 5.1 FOREGROUND COMMUNICATION CHANNELS Either two or four high-speed communication channels in the foreground system link the Common Memory, Background Processors, Foreground Processor, disk controllers, HSX controllers, and External I/O controllers. The Foreground Processor supervises the channels. Data blocks are generally 512 Common Memory words. Each channel accesses one Common Memory port and one Background Processor port. Each channel in the system can have up to four External I/O controllers and two HSX controllers. Disk controllers are generally divided equally among the channels. The disk controller configuration can be adjusted, however, for special system requirements. A channel interconnects the Foreground Processor, disk controllers, External I/O controllers, HSX controllers, a Background Processor port, and a Common Memory port in a continuous channel loop. Figure 5-1 shows a configuration of a single channel loop. Figure 5-1. Channel Loop Each member of the loop is called a channel node. Each channel node receives data on the path during each clock period and transmits that data to the next node in the following clock period. Data can then move about the loop from any transmitting node to any receiving node. ## 5.2 FOREGROUND CHANNEL PORTS Two independent sets of channel ports exist in the Foreground Processor: Common Memory ports and Background Processor ports. The Common Memory ports contain controls and status information for transfer of data to and from Common Memory. The Background Processor ports contain controls and status information used by the Foreground Processor to control the Background Processors. #### 5.2.1 COMMON MEMORY PORTS The foreground system contains either two or four Common Memory ports. One Common Memory port is associated with each of the Background Processors. A foreground channel is associated with each of the Common Memory ports. The Foreground Processor makes Common Memory requests through the Common Memory port for those foreground devices on the same channel. Background Processor Common Memory requests have priority over foreground system requests. There is one exception, the refresh has priority over the background operand references. The Common Memory port accepts requests according to the following priority scheme, from highest to lowest priority. - 1. Background Processor operand references - 2. Background Processor instruction references - 3. Foreground channel transfer references #### 5.2.2 BACKGROUND PROCESSOR PORTS Each Background Processor has a Background Processor port connecting it to one of the channels in the foreground system. This port allows the Foreground Processor to control the operation of the Background Processor. #### 5.3 DISK STORAGE UNITS The Foreground Processor spends considerable time transferring data between the DSUs and Common Memory. The system has provision for up to 36 DSUs. Control for these units is on an individual DSU basis so that all 36 DSUs can operate concurrently. #### 5.3.1 DISK SYSTEM ORGANIZATION The disk storage system on the CRAY-2 computer system has the option of operating in a synchronous mode with all DSUs running in parallel in a lock step mode. For this approach to be practical, the buffer size for individual disk references must be of the order of 100,000 words. A system configuration with 16 DSUs can illustrate the synchronous mode of operation. The Foreground Processor is given a disk address consisting of a pseudo-track number. This number is the cylinder and head group for a disk file with no flaws. A table look-up converts this pseudo-track into a physical track for each DSU. All DSUs are positioned in parallel. The Foreground Processor reads angular position for each disk surface to determine the sector currently under the recording head. It then begins a data stream from Common Memory to disk surfaces, choosing the portion of the Common Memory buffer appropriate for the current angular position of each DSU. Data to 15 of the DSUs is moved directly from the Common Memory buffer. Data for the 16th DSU is a logical difference data stream using the word-by-word data from the desired file. All 16 DSUs write one track of data as the basic reservation unit. On data readback, the 16th DSU is read concurrently with the other 15 DSUs. If the cyclic redundancy code (CRC) detectors indicate no data errors, the 16th DSU data is discarded. If an error has occurred, it can be corrected with minor CPU overhead and no time loss in the data stream. The correction process recreates the missing data by using the word-by-word logical difference of the 15 DSU's supplying good data. The overhead introduced by this arrangement is one DSU for every 15 DSUs used. The following three benefits occur: - The data rate is 15 times faster than a single DSU data transfer. - The DSU rotational latency has been reduced to 1/2 of a sector time. - A DSU can fail completely due to a head crash or motor failure with no loss of data and little time loss. A DSU failure in this system can be corrected during system operation by removing the defective DSU, and replacing it with another unit. The new unit can then be brought online by running a background job that takes approximately 2.5 minutes of disk system time to record the faulty unit data from the data on the other 15 DSUs. ## 5.4 EXTERNAL I/O CONTROLLER The CRAY-2 computer system mainframe is connected to a front-end computer system through a controller in the foreground system. The External I/O controller can support a 6 Mbyte per second channel or a 12 Mbyte per second channel. Each channel loop can hold up to four External I/O controllers. Each controller contains a 512 64-bit word buffer. The data block can be of arbitrary word length up to this limit. #### 5.5 HSX CONTROLLER The HSX channel controller connects high-speed external devices to the CRAY-2 computer system. The HSX channel controller is a 100 Mbyte/s full duplex channel. A foreground channel loop can hold up to two HSX channels. The HSX channel controller is made up of two independent parts, an input channel and an output channel. Each part contains two alternating 512 64-bit word buffers. The data blocks can be of arbitrary length. ### 5.6 FOREGROUND PROCESSOR The Foreground Processor supervises system operation by responding to Background Processor requests and sequencing Channel Communication signals. The user programs reside in the Common Memory in a protected area and are executed in Background Processors. The Foreground Processor code is loaded at deadstart from a diskette at the maintenance control console. (See subsection 5.6, Maintenance Control Console, for a description.) The code is firmware and is not altered during the system operation. \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* ## CAUTION A Foreground Processor program code error is as fatal to system operation as a hardware failure. \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* The primary functions of the Foreground Processor program are real-time response to various signals from a variety of sources in the foreground system. As many as 50 simultaneous real-time sequences can be operating in an interleaved manner in the Foreground Processor. Many of these responses must be of the order of a microsecond or less. The Foreground Processor contains the following sections: - Instruction Memory - Local Data Memory - Arithmetic functions - Real-time clock - Error checking - Instruction issue mechanism - Instruction set The Foreground Processor performs arithmetic functions on 32-bit integers. The following functions are performed: - Add - Subtract - Shift left open ended - Shift right open ended - Logical product - Logical difference - Logical sum A detailed description of the Foreground Processor and its functional units is beyond the scope of this manual. The Foreground Processor is transparent to the user of the CRAY-2 computer system. ## 5.7 MAINTENANCE CONTROL CONSOLE The maintenance control console deadstarts the system and exchanges data with the Foreground Processor. Instructions for execution in the Foreground Processor are loaded into the Foreground Instruction Memory at deadstart from a diskette at the maintenance control console. This memory is a Read-only Memory during system operation. Data for supervision of the system is maintained in Common Memory and is moved to the Foreground Processor Local Memory as required. # **APPENDIX SECTION** | | _ | |---|----------| | | . — | | | _ | | | _ | | | _ | | | _ | | | _ | | | — | | | <u> </u> | | | _ | | | _ | | | _ | | | _ | | • | _ | | | _ | | | _ | | | _<br>_ | | | _ | | | _ | #### A. SYMBOLIC MACHINE INSTRUCTIONS LISTED BY FUNCTIONALITY Instructions are listed in numerical order and explained in section 3. The octal machine code can be used to cross-reference instructions in this appendix to their descriptions in section 3. See section 2 for descriptions of functional units. ## A.1 SYMBOLIC NOTATION This appendix lists the symbolic machine instructions by functionality. Instructions are described in the following functional categories: - Branch instructions - Pass instructions - Semaphore instructions - Register entry instructions - Inter-register transfer instructions - Memory transfer instructions - Integer arithmetic operation instructions - Floating-point arithmetic operation instructions - Logical operation instructions - Bit count instructions - Shift operation instructions ## A.2 BRANCH INSTRUCTIONS Instructions that perform conditional branches, unconditional jumps, or exits are listed in this group. | | Register E | ntry Inst | ructions | | Integer Arithmetic Operations | | | | | | | |----------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|------------------------------------------------|----------------|------------------------------------|------------|----------------|-------------------|------------------|----------------------------------------------|-----------------------------------| | a <sub>i</sub> | exp | s <sub>i</sub> | exp | aj | a <sub>1</sub> +a <sub>k</sub> | 1. | a <sub>i</sub> | a <sub>1</sub> -a | ı <sub>k</sub> | a <sub>i</sub> | a <sub>1</sub> *a <sub>k</sub> | | a <sub>i</sub> | exp,s | si | exp,s | si | s <sub>1</sub> +s <sub>k</sub> | | 51 | sj-s | | | , | | a <sub>i</sub> | exp,s,p | si | <i>exp</i> ,s,p | v <sub>i</sub> | s <sub>1</sub> +v <sub>k</sub> | | v <sub>i</sub> | sj-v | | [ | | | a <sub>1</sub> | exp,s,m | si | exp,s,m | v <sub>i</sub> | vj+vk | | v <sub>i</sub> | v <sub>j</sub> -v | | v <sub>i</sub> | ci,s <sub>1</sub> &s <sub>k</sub> | | a <sub>i</sub> | <i>exp</i> ,p | si | exp,h | <u> </u> | | | | | | | | | a <sub>i</sub> | exp,p,p | s <sub>i</sub> | exp,h,p | | | | | | | | | | a <sub>i</sub> | exp,p,m | si | exp,h,m | | | | Float | ing Poir | nt Opera | ations | | | a <sub>i</sub> | exp,h | s <sub>1</sub> | exp,l | | | | | | | | | | | 0 | si | exp,f | 1 | | | | | | | | | v <sub>i</sub> | U | | | s | s <sub>i</sub> +fs <sub>k</sub> | 1: | 51 | s <sub>1</sub> -f | sk | si | s;*fsk | | | | _ | _ | v <sub>i</sub> | s <sub>j</sub> +fv <sub>k</sub> | | v <sub>i</sub> | sj-f | | v <sub>1</sub> | s <sub>j</sub> *fv <sub>k</sub> | | | Inter Reg | ister Tra | ansfers | v <sub>1</sub> | v <sub>1</sub> +fv <sub>k</sub> | | v <sub>i</sub> | v <sub>j</sub> -f | | v <sub>1</sub> | v <sub>j</sub> *fv <sub>k</sub> | | _ | _ | 1 | _ | - | , | | - | , | | - | , | | a <sub>i</sub> | sj | v <sub>i</sub> | s <sub>j</sub> | si | s <sub>i</sub> *is <sub>k</sub> | ], | s <sub>i</sub> | fix, | sk | si | s;*qsk | | s <sub>i</sub> | s <sub>j</sub> | v <sub>1</sub> | v <sub>j</sub> | v <sub>i</sub> | v <sub>j</sub> *iv <sub>k</sub> | | v <sub>1</sub> | fix, | | v, | v <sub>j</sub> *qv <sub>k</sub> | | | • | v <sub>1</sub> | -v <sub>k</sub> | - | , . | | - | | | - | , | | a <sub>1</sub> | v1 | v <sub>i</sub> | -fv <sub>k</sub> | sı | /hs <sub>j</sub> | - 1, | 5 <sub>1</sub> | flt, | St | si | *qs <sub>j</sub> | | si | vm | 1 | - · K | 1 - | • | l l | _ | | | - | • | | s <sub>i</sub> | rt | | | v <sub>i</sub> | /hv <sub>k</sub> | 1 | v <sub>1</sub> | flt, | · ∨k | v <sub>i</sub> | *qv <sub>k</sub> | | - | | | | | | | | | | | · · · | | s <sub>i</sub> | a <sub>k</sub> | v1 | a <sub>k</sub> | dfi | | | | | efi | | | | s <sub>i</sub> | <sup>+a</sup> k | m | sj | | | | | | | | | | | Bit Coun | t Instruc | ctions | | | | Log | jical Ope | erations | 3 | | | si | ps <sub>†</sub> | v <sub>i</sub> | pv <sub>j</sub> | si | s <sub>1</sub> &s <sub>k</sub> | 1: | s <sub>i</sub> | s <sub>1</sub> !s | 5 <sub>2</sub> | s <sub>i</sub> | s <sub>i</sub> \s <sub>k</sub> | | s <sub>i</sub> | qs <sub>j</sub> | v <sub>i</sub> | qv <sub>j</sub> | v <sub>i</sub> | s <sub>1</sub> &v <sub>k</sub> | | v <sub>i</sub> | s <sub>j</sub> !\ | | v <sub>1</sub> | s <sub>j</sub> \v <sub>k</sub> | | s <sub>i</sub> | zs, | v <sub>i</sub> | zv <sub>j</sub> | v <sub>i</sub> | v <sub>1</sub> &v <sub>k</sub> | | v <sub>i</sub> | v <sub>1</sub> !v | | $\begin{vmatrix} \mathbf{v_i} \end{vmatrix}$ | v <sub>1</sub> \v <sub>k</sub> | | -1 | 223 | 1 | / | `1 | · / x | | - 1 | ., | Α. | 1 | · J ·· x | | | | | | s <sub>i</sub> | #s <sub>k</sub> &s <sub>j</sub> | | | | | vm | v <sub>k</sub> ,z | | | Shift | Instructi | lons | 1 | • | | | | | vm | v <sub>k</sub> ,n | | | | | | v <sub>i</sub> | s <sub>j</sub> !v <sub>k</sub> &vm | | | | | vm | v <sub>k</sub> ,p | | si | s <sub>i</sub> <exp< td=""><td>si</td><td>s<sub>i</sub>&gt;exp</td><td>vi</td><td>vj!vk&amp;vm</td><td></td><td></td><td></td><td></td><td>vm</td><td>v<sub>k</sub>,m</td></exp<> | si | s <sub>i</sub> >exp | vi | vj!vk&vm | | | | | vm | v <sub>k</sub> ,m | | v <sub>i</sub> | $v_j^{-a_k}$ | v <sub>i</sub> | $v_j > a_k$ | | - | 1 | | | | | | | si | s <sub>1</sub> ,s <sub>j</sub> <a<sub>k</a<sub> | si | s <sub>i</sub> ,s <sub>i</sub> >a <sub>k</sub> | | Pass In | nstruction | s | | | Semaphore | Instructions | | v <sub>i</sub> | $v_1, v_1 < a_k$ | v <sub>i</sub> | $v_j, v_j > a_k$ | | | | | | | | | | - | , , . | | , , | pass | | pass | ex | CP | csm | | ssm | | | Memor | y Transfe | ers | | | | Br | anch Ins | structio | ons | J | | aį | [exp] | [exp] | a <sub>k</sub> | jz | a <sub>k</sub> ,exp | | | 1 | jz | s <sub>j</sub> ,exp | | | a <sub>i</sub> | [a <sub>k</sub> ] | [a <sub>k</sub> ] | a <sub>j</sub> | jn | a <sub>k</sub> ,exp | | | | jn | s <sub>1</sub> ,exp | | | °1<br>Si | [exp] | [exp] | s <sub>j</sub> | qt | a <sub>k</sub> ,exp | | | | jp | s <sub>1</sub> ,exp | | | s <sub>i</sub> | [a <sub>k</sub> ] | [a <sub>k</sub> ] | s <sub>i</sub> | jm | a <sub>k</sub> ,exp | | | | jm | s <sub>1</sub> ,exp | | | v <sub>i</sub> | [a <sub>k</sub> ] | [a <sub>k</sub> ] | v <sub>i</sub> | 1- | AE | | | | - | J | | | - 1 | r~K1 | , ~ K, | -1 | jcs | exp | | | | 1 | a <sub>k</sub> | | | si | (exp) | (exp) | s <sub>i</sub> | jss | exp | | | | r,a <sub>i</sub> | a <sub>k</sub> | | | s <sub>i</sub><br>S <sub>i</sub> | ( <i>exp</i> ) | (a <sub>k</sub> ) | s <sub>i</sub> | 1,35 | | | | | | -A | | | 5 <sub>1</sub><br>5 <sub>1</sub> | (a <sub>k</sub> ,exp) | (a <sub>k</sub> ,e | _ | į | exp | | | | | | | | - | | | _ | ľ | CAP | | | | | | | | s <sub>i</sub> | (a <sub>j</sub> ,a <sub>k</sub> ) | (a, a, | | err | | | | | exit | | | | v <sub>i</sub> | (a <sub>j</sub> ,a <sub>k</sub> ) | (aj,aj | | e | | | | | exit | exp | | | v <sub>i</sub> | (a <sub>k</sub> ,v <sub>j</sub> ) | (a <sub>k</sub> ,v | j) v <sub>i</sub> | | | | | | CAIL | evh | | | dri | | eri | | | | | | | | | | 1295 Figure A-1. CRAY-2 Computer System Symbolic Machine Instructions ## A.2.1 CONDITIONAL BRANCHES | Result | Operand | Description | Machine<br>Instruction | |--------|---------------------|------------------------------------------------------------------------|-----------------------------------------------| | jz | a <sub>k</sub> ,exp | Branches if $(a_k)$ is zero | 010xxk m <sub>1</sub> m <sub>2</sub> | | jn | a <sub>k</sub> ,exp | Branches if $(a_k)$ is nonzero | 011xxk m <sub>1</sub> m <sub>2</sub> | | jр | a <sub>k</sub> ,exp | Branches if $(a_k)$ is positive | 012xxk m <sub>1</sub> m <sub>2</sub> | | jm | a <sub>k</sub> ,exp | Branches if (a <sub>k</sub> ) is negative | 013xxk m <sub>1</sub> m <sub>2</sub> | | jz | s <sub>j</sub> ,exp | Branches if $(s_j)$ is zero | 014xjx m <sub>1</sub> m <sub>2</sub> | | jn | s <sub>j</sub> ,exp | Branches if $(s_j)$ is nonzero | 015xjx m <sub>1</sub> m <sub>2</sub> | | jp | s <sub>j</sub> ,exp | Branches if $(s_j)$ is positive | 016 <i>хјх т<sub>1</sub> т<sub>2</sub></i> | | jm | s <sub>j</sub> ,exp | Branches if (s <sub>j</sub> ) is negative | 017 <i>xj</i> x m <sub>1</sub> m <sub>2</sub> | | jcs | ехр | Jumps to constant parcel if Semaphore flag clear; sets Semaphore flag. | 004xxx m <sub>1</sub> m <sub>2</sub> | | jss | ехр | Jump to constant parcel if Semaphore flag is set; sets Semaphore flag. | 005xxx m <sub>1</sub> m <sub>2</sub> | ## A.2.2 UNCONDITIONAL JUMPS | Result | Operand | Description | Machine<br>Instruction | |------------------|----------------|-----------------------------------------------------------|--------------------------------------| | j | ехр | Unconditional jump | 003xxx m <sub>1</sub> m <sub>2</sub> | | r,a <sub>i</sub> | a <sub>k</sub> | Register jump to $(a_{k})$ with return address to $a_{i}$ | 002ixk | | j | a <sub>k</sub> | Register jump to $(a_k)$ , value is $a_k$ erased | 002 <i>kxk</i> | ## A.2.3 EXITS | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-------------|------------------------| | err | | Error exit | 000 <b>x</b> 000 | | exit | | Normal exit | 000x01 | | exit | ехр | Normal exit | 000xjk | ## A.3 PASS INSTRUCTIONS | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-------------|------------------------| | pass | | Pass | 076 <b>xxx</b> | | pass | ехр | Pass | 076 <i>ijk</i> | # A.4 SEMAPHORE INSTRUCTIONS | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-----------------------|------------------------| | ssm | | Sets Semaphore flag | 006ххх | | csm | : | Clears Semaphore flag | 007xxx | ## A.5 REGISTER ENTRY INSTRUCTIONS Instructions that load the A or S registers are listed in this group. ## A.5.1 ENTRIES INTO A REGISTERS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|------------------------------------------|--------------------------------------------------------------------------------------------------------------------------| | a <sub>i</sub> | ехр | Loads $a_i$ with a value | 026ijk or<br>027ijk or<br>040ijk m <sub>1</sub> or<br>041ijk m <sub>1</sub> or<br>042ijk m <sub>1</sub> m <sub>2</sub> † | | a <sub>i</sub> | exp,s | Loads $a_i$ with a 6-bit value | 026 <i>ijk</i> or<br>027 <i>ijk</i> †† | | a <sub>i</sub> | exp,s,p | Loads $a_i$ with a 6-bit positive value | 026 <i>ijk</i> ††† | | a <sub>i</sub> | exp,s,m | Loads $a_i$ with a 6-bit negative value | 027 <i>ijk</i> ††† | | aį | exp,p | Loads a <sub>i</sub> with a 16-bit value | 040 $i$ xx $m_I$ or 041 $i$ xx $m_I$ <sup>††</sup> | | a <sub>i</sub> | exp,p,p | Loads $a_i$ with a 16-bit positive value | 040 <i>i</i> xx m <sub>1</sub> ††† | | a <sub>i</sub> | exp,p,m | Loads a; with a 16-bit negative value | 041 <i>i</i> xx <i>m<sub>I</sub></i> ††† | | a <sub>i</sub> | exp,h | Loads a <sub>i</sub> with a 32-bit value | 042ixx m <sub>1</sub> m <sub>2</sub> ††† | <sup>†</sup> Forces one of five opcodes <sup>††</sup> Forces one of two opcodes <sup>†††</sup> Forces a single opcode ## A.5.2 ENTRIES INTO S REGISTERS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | sį | ехр | Loads $s_i$ with a value | 050ixx m <sub>1</sub> m <sub>2</sub> or<br>051ixx m <sub>1</sub> m <sub>2</sub> or<br>052ixx m <sub>1</sub> m <sub>2</sub> or<br>053ixx<br>m <sub>1</sub> m <sub>2</sub> m <sub>3</sub> m <sub>4</sub> or<br>116ijk or<br>117ijk <sup>†</sup> | | si | exp,s | Loads $s_i$ with a 6-bit value | 116 <i>ijk</i> or<br>117 <i>ijk</i> †† | | sį | exp,s,p | Loads $s_i$ with a 6-bit positive value | 116 <i>ijk</i> ††† | | s <sub>i</sub> | exp,s,m | Loads $s_i$ with a 6-bit negative value | 117 <i>ijk</i> ††† | | si | exp,h | Loads $s_i$ with a 32-bit value | 050 $ixx m_1 m_2$ or 051 $ixx m_1 m_2$ <sup>††</sup> | | sį | exp,h,p | Loads $s_i$ with a 32-bit positive value | 050ixx m <sub>1</sub> m <sub>2</sub> ††† | | s <sub>i</sub> | exp,h,m | Loads $s_i$ with a 32-bit negative value | 051 <i>i</i> xx m <sub>1</sub> m <sub>2</sub> ††† | | si | exp,l | Loads s <sub>i</sub> left side with a<br>32-bit value | 052ixx m <sub>1</sub> m <sub>2</sub> ††† | | s <sub>i</sub> | exp,f | Loads s $_i$ with a 64-bit value | 053ixx<br>m <sub>1</sub> m <sub>2</sub> m <sub>3</sub> m <sub>4</sub> ††† | <sup>†</sup> Forces one of six opcodes <sup>††</sup> Forces one of two opcodes <sup>†††</sup> Forces a single opcode ## A.5.3 ENTRIES INTO V REGISTERS | Result | Operand | Description | Machine<br>Instruction | |----------|---------|--------------|------------------------| | $\vee_I$ | 0 | Clear v $_i$ | 143 <i>iii</i> † | <sup>†</sup> Special syntax form ## A.6 INTER-REGISTER TRANSFER INSTRUCTIONS Instructions in this group provide for transferring the contents of one register to another register. In some cases, the register contents can be complemented, converted to floating-point format, or sign extended as a function of the transfer. ## A.6.1 TRANSFERS TO A REGISTERS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|----------------------------|------------------------| | a <sub>i</sub> | sj | Copies (s $_j$ ) to a $_i$ | 024 <i>ijx</i> | | a <sub>i</sub> | vl | Copies (vl) to a $_{m i}$ | 025 <i>i</i> xx | ## A.6.2 TRANSFERS TO S REGISTERS | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------|------------------------------------------------|------------------------| | si | sj | Copies (s $_j$ ) to s $_i$ ( $j=k$ ) | 103 <i>ijj</i> | | sį | a <sub>k</sub> | Copies $(a_k)$ to $s_i$ with no sign extension | 130 <i>ixk</i> | | si | +a <sub>k</sub> | Copies $(a_k)$ to $s_i$ with sign extension | 131 <i>ixk</i> | | si | ∨m | Copies (vm) to $s_i$ | 114 <i>i</i> xx | | s <sub>i</sub> | rt | Copies real-time count to $s_i$ | 115 <i>i</i> xx | ## A.6.3 TRANSFERS TO V REGISTERS | Result | Operand | Description | Machine<br>Instruction | |----------------|------------------|----------------------------------------------------------------|------------------------| | $v_i$ | $s_j$ | Copy (s $_j$ ) to v $_i$ | 144 <i>iji</i> † | | v <sub>i</sub> | ∨ <i>j</i> | Copies $(v_j)$ to $v_i$ $(j=k)$ | 145 <i>ijj</i> | | v <sub>i</sub> | -v <sub>k</sub> | Copies twos complement of $(v_k)$ to $v_i$ | 163 <i>iji</i> † | | v <sub>i</sub> | -fv <sub>k</sub> | Copy normalized negative of $(\mathbf{v}_k)$ to $\mathbf{v}_i$ | 173 <i>iik</i> † | <sup>†</sup> Special syntax form #### A.6.4 TRANSFER TO VECTOR MASK REGISTER The following syntax and its special form transmit the contents of register $S_j$ to the VM register. The VM register is zeroed if the j designator is 0; the special form accommodates this case. This instruction can be used in conjunction with the vector merge instructions where an operation is performed depending on the VM register contents. | Result | Operand | Description | Machine<br>Instruction | |--------|---------|------------------------|------------------------| | vm | sj | Copies $(s_j)$ to $vm$ | 034хјх | #### A.6.5 TRANSFER TO VECTOR LENGTH REGISTER The following syntax and its special form enters the low-order 7 bits of the contents of register $A_k$ into the VL register. The VL register contents determine the number of operations performed by a vector instruction. Since a Vector register has 64 elements, from 1 to 64 operations can be performed. The number of operations is (VL) modulo 64. A special case exists such that when (VL) modulo 64 is 0, then the number of operations performed is 64. In this manual, a reference to register $V_i$ implies operations involving the first n elements where n is the vector length unless a single element is explicitly noted as in the instructions $S_i \ V_j$ , $A_k$ and $V_i$ , $A_k \ S_j$ . | Result | Operand | Description | Machine<br>Instruction | |--------|----------------|--------------------------------|------------------------| | vl | a <sub>k</sub> | Copies (a <sub>k</sub> ) to vl | 036 <b>xxk</b> | Vector operations controlled by the VL register contents begin with element $\mathbf{0}$ of the Vector registers. ## A.7 MEMORY TRANSFER INSTRUCTIONS This category includes instructions that transfer data between registers and memory. ## A.7.1 STORES Several instructions store data from registers into memory. ## Local Memory writes | Result | Operand | Description | Machine<br>Instruction | |-------------------|----------------|--------------------------------------------------------------------|-------------------------------| | [exp] | a <sub>k</sub> | Writes (a <sub>k</sub> ) to location <i>exp</i><br>in Local Memory | 045xxk m <sub>1</sub> | | [a <sub>k</sub> ] | $a_j$ | Writes $(a_j)$ to location $a_k$ in Local Memory | 047 <i>xjk</i> | | [exp] | $s_j$ | Writes (s <sub>j</sub> ) to location <b>exp</b><br>in Local Memory | 055 <b>xjx</b> m <sub>I</sub> | | [a <sub>k</sub> ] | si | Writes (s $_i$ ) to location a $_k$ in Local Memory | 057 <i>ixk</i> | | [a <sub>k</sub> ] | vi | Writes $(v_i)$ to Local Memory location $(a_k)$ | 075 <i>ixk</i> | ## Common Memory writes | Result | Operand | Description | Machine<br>Instruction | |-----------------------------------|----------------|-------------------------------------------------------------------------|----------------------------------------------| | (exp) | sį | Writes (s $_i$ ) to Common Memory at location $exp$ | 067ixx m <sub>1</sub> m <sub>2</sub> | | (a <sub>k</sub> ) | sį | Writes $(s_i)$ to Common Memory at location $(a_k)$ | 063 <i>ixk</i> | | (a <sub>k</sub> ,exp) | sį | Writes $(s_i)$ to Common Memory at location $(a_k) + exp$ | 065 <i>ixk</i> m <sub>1</sub> m <sub>2</sub> | | (aj,a <sub>k</sub> ) | sį | Writes $(s_i)$ to Common Memory at location $(a_j)+(a_k)$ | 061 <i>ijk</i> | | (aj,a <sub>k</sub> ) | $\forall i$ | Writes $(v_i)$ to Common Memory location $(a_j)$ incremented by $(a_k)$ | 071 <i>ijk</i> | | (a <sub>k</sub> ,v <sub>j</sub> ) | v <sub>i</sub> | Scatters $(v_i)$ to Common Memory locations $(a_k)+(v_j)$ | 073 <i>ijk</i> | ## A.7.2 LOADS Several instructions can be used to load data from memory into registers. Local Memory reads | Result | Operand | Description | Machine<br>Instruction | |----------------|-------------------|-----------------------------------------------------------------------|--------------------------------| | a <sub>i</sub> | [exp] | Reads from location $exp$ in Local Memory to $a_i$ | 044 <i>i</i> xx m <sub>1</sub> | | a <sub>i</sub> | [a <sub>k</sub> ] | Reads from location to $a_{\pmb{k}}$ in Local Memory to $a_{\pmb{i}}$ | 046 <i>ixk</i> | | s <sub>i</sub> | [exp] | Reads from location $exp$ in Local Memory to $s_i$ | 054 <i>ixx</i> m <sub>1</sub> | | si | [a <sub>k</sub> ] | Reads from location to $a_{\pmb{k}}$ in Local Memory to $s_{\pmb{i}}$ | 056 <i>ixk</i> | | v <sub>i</sub> | [a <sub>k</sub> ] | Reads from Local Memory location $(a_k)$ to $v_i$ | 074 <i>ixk</i> | ## Common Memory reads | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------------------------|----------------------------------------------------------------|----------------------------------------------| | s į | (exp) | Reads from Common Memory location $exp$ to $s_i$ | 066 <i>ixx</i> m <sub>1</sub> m <sub>2</sub> | | s <sub>i</sub> | (a <sub>k</sub> ) | Reads from Common Memory at location $(a_k)$ to $s_i$ | 062 <i>ixk</i> | | si | (a <sub>k</sub> ,exp) | Reads from Common Memory at location $(a_k)$ +exp to $s_i$ | 064ixk m <sub>1</sub> m <sub>2</sub> | | si | (aj,a <sub>k</sub> ) | Reads from Common Memory location $(a_j)+(a_k)$ to $s_i$ | 060 <i>ijk</i> | | v <sub>i</sub> | (aj,a <sub>k</sub> ) | Reads from Common Memory location $(a_j)$ incremented by $a_k$ | 070 <i>ijk</i> | | v <sub>i</sub> | (a <sub>k</sub> ,v <sub>j</sub> ) | Gathers from Common Memory locations $(a_k)+(v_j)$ to $v_i$ | 072 <i>ijk</i> | ## Memory Range Error flags | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-------------------------------------------|------------------------| | dri | | Disables halt on memory field range error | 035xx0 | | eri | | Enables halt on memory field range error | 035 <b>xx</b> 1 | ## A.8 INTEGER ARITHMETIC OPERATION INSTRUCTIONS Integer arithmetic operations obtain operands from registers and return results to registers. No direct memory references are allowed. #### A.8.1 INTEGER SUMS | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------------------|----------------------------------------------|------------------------| | a <sub>i</sub> | aj+ak | Integer sum of $(a_j)$ and $(a_k)$ to $a_i$ | 020 <i>ijk</i> | | si | s <sub>j</sub> +s <sub>k</sub> | Integer sum of $(s_j)$ and $(s_k)$ to $s_i$ | 104 <i>ijk</i> | | v <sub>i</sub> | sj+∨k | Integer sums of $(s_j)$ and $(v_k)$ to $v_i$ | 160 <i>ijk</i> | | v <sub>i</sub> | ∨j+∨k | Integer sums of $(v_j)$ and $(v_k)$ to $v_i$ | 161 <i>ijk</i> | #### A.8.2 INTEGER DIFFERENCES | Result | Operand | Description | Machine<br>Instruction | |----------------|-------------------|---------------------------------------------------------|------------------------| | a <sub>i</sub> | aj-a <sub>k</sub> | Integer difference of (a $_j$ ) and (a $_k$ ) to a $_i$ | 021 <i>ijk</i> | | si | sj-sk | Integer difference of $(s_j)$ and $(s_k)$ to $s_i$ | 105 <i>ijk</i> | | v <sub>i</sub> | sj-vk | Integer differences of $(s_j)$ and $(v_k)$ to $v_i$ | 162 <i>ijk</i> | | v <sub>i</sub> | ∨j-∨k | Integer differences of $(v_j)$ and $(v_k)$ to $v_i$ | 163 <i>ijk</i> | ## A.8.3 INTEGER PRODUCTS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|-------------------------------------------------|------------------------| | a <sub>i</sub> | aj*ak | Integer product of $(a_j)$ and $(a_k)$ to $a_i$ | 022 <i>ijk</i> | # A.9 FLOATING-POINT ARITHMETIC INSTRUCTIONS All floating-point arithmetic operations use registers as the source of operands and return results to registers. ## A.9.1 FLOATING-POINT SUMS | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------|-----------------------------------------------------|------------------------| | sį | sj+fs <sub>k</sub> | Floating-point sum of $(s_j)$ and $(s_k)$ to $s_i$ | 120 <i>ijk</i> | | v <sub>i</sub> | sj+f∨k | Floating-point sums of $(s_j)$ and $(v_k)$ to $v_i$ | 170 <i>ijk</i> | | V <sub>i</sub> | vj+f∨k | Floating-point sums of $(v_j)$ and $(v_k)$ to $v_i$ | 171 <i>ijk</i> | ## A.9.2 RECIPROCAL ITERATIONS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|-----------------------------------------------------|------------------------| | sį | sj*isk | Reciprocal iteration step, $2-(s_j)*(s_k)$ to $s_i$ | 126 <i>ijk</i> | | v <sub>i</sub> | ∨j*i∨k | Reciprocal iteration step, $2-(v_j)*(v_k)$ to $s_i$ | 156 <i>ijk</i> | ## A.9.3 RECIPROCAL APPROXIMATIONS | Result | Operand | Description | Machine<br>Instruction | |----------------|------------------|-------------------------------------------------------------|------------------------| | si | /hsj | Floating-point reciprocal approximation of $(s_j)$ to $s_i$ | 132 <i>ij</i> x | | v <sub>i</sub> | /hv <sub>j</sub> | Floating-point reciprocal approximation of $(v_k)$ to $v_i$ | 166 <i>ixk</i> | ## A.9.4 FLOATING-POINT DIFFERENCES | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------|-----------------------------------------------------------|------------------------| | s <sub>i</sub> | sj-fs <sub>k</sub> | Floating-point difference of $(s_j)$ and $(s_k)$ to $s_i$ | 121 <i>ijk</i> | | v <sub>i</sub> | sj-fv <sub>k</sub> | Floating-point difference of $(s_j)$ and $(v_k)$ to $v_i$ | 172 <i>ijk</i> | | v <sub>i</sub> | vj-fv <sub>k</sub> | Floating-point difference of $(v_j)$ and $(v_k)$ to $v_i$ | 173 <i>ijk</i> | ## A.9.5 INTEGER TO FLOATING-POINT CONVERSIONS | Result | Operand | Description | Machine<br>Instruction | |--------|--------------------|----------------------------------------------------------------------|------------------------| | sį | fix,s <sub>k</sub> | Converts $(s_k)$ from floating-point to integer and enter into $s_i$ | 122 <i>ixk</i> | | V i | fix,v <sub>k</sub> | Integer form of floating-point $(v_k)$ to $v_i$ | 174 <i>ixk</i> | ## A.9.6 FLOATING-POINT TO INTEGER CONVERSIONS | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------|----------------------------------------------------------------------|------------------------| | si | flt,s <sub>k</sub> | Converts $(s_k)$ from integer to floating-point and enter into $s_i$ | 123 <i>ixk</i> | | v <sub>i</sub> | flt,v <sub>k</sub> | Floating-point form of integer $(v_k)$ to $v_i$ | 175 <i>ixk</i> | ## A.9.7 FLOATING-POINT PRODUCTS | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------------|---------------------------------------------------------|------------------------| | sį | sj*fsk | Floating-point product of $(s_j)$ and $(s_k)$ to $s_i$ | 124 <i>ijk</i> | | v <sub>i</sub> | sj*fv <sub>k</sub> | Floating-point products of $(s_j)$ and $(v_k)$ to $v_i$ | 154 <i>ijk</i> | | v <sub>i</sub> | ∨j*f∨k | Floating-point products of $(v_j)$ and $(v_k)$ to $v_i$ | 155 <i>ijk</i> | ## A.9.8 SQUARE ROOT ITERATIONS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|-------------------------------------------------------|------------------------| | s <sub>i</sub> | sj*qsk | Square root iteration of $[3-(s_j)*(s_k)]/2$ to $s_i$ | 127 <i>ijk</i> | | v <sub>i</sub> | ∨j*q∨k | Square root iteration of $[3-(v_j)*(v_k)]/2$ to $v_i$ | 157 <i>ijk</i> | ## A.9.9 SQUARE ROOT APPROXIMATIONS | Result | Operand | Description | Machine<br>Instruction | |----------------|--------------|-----------------------------------------------|------------------------| | si | *qsj | Square root approximation of $(s_j)$ to $s_i$ | 133 <i>ij</i> x | | v <sub>i</sub> | *qv <i>k</i> | Square root approximation of $(v_k)$ to $v_i$ | 167 <i>ixk</i> | ## A.9.10 FLOATING-POINT ERRORS | Result | Operand | Description | Machine<br>Instruction | |--------|---------|-----------------------------------------|------------------------| | dfi | | Disables halt on floating-point error | 035xx2 | | efi | | Enables halt on floating-point<br>error | 035xx3 | ## A.10 LOGICAL OPERATION INSTRUCTIONS Instructions which perform logical products, logical sums, vector streaming, logical differences, vector mask, or compressed iota are listed in this group. ## A.10.1 LOGICAL PRODUCTS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------------------------------|---------------------------------------------------------------|------------------------| | sį | sj&s <sub>k</sub> | Logical product of $(s_j)$ and $(s_k)$ to $s_i$ | 100 <i>ijk</i> | | s <sub>i</sub> | #s <sub>k</sub> &s <sub>j</sub> | Logical product of $(s_j)$ and complement of $(s_k)$ to $s_i$ | 101 <i>ijk</i> | | v <sub>i</sub> | sj&∨k | Logical product of $(s_j)$ and $(v_k)$ to $v_i$ | 140 <i>ijk</i> | | v <sub>i</sub> | ∨j&∨k | Logical product of $(v_j)$ and $(v_k)$ to $v_i$ | 141 <i>ijk</i> | ## A.10.2 LOGICAL SUMS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------|----------------------------------------------|------------------------| | s <sub>i</sub> | sj!sk | Logical sum of $(s_j)$ and $(s_k)$ to $s_j$ | 103 <i>ijk</i> | | v <sub>i</sub> | sj!vk | Logical sums of $(s_j)$ and $(v_k)$ to $v_j$ | 144 <i>ijk</i> | | v <sub>i</sub> | ∨j!∨k | Logical sums of $(v_j)$ and $(v_k)$ to $v_i$ | 145 <i>ijk</i> | ## A.10.3 VECTOR STREAMING | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------------|---------------------------------------------------------------|------------------------| | vi | sj!vk&vm | Transmits $(s_j)$ if vm bit=1; $(v_k)$ if vm bit=0 to $v_i$ . | 146 <i>ijk</i> | | v <sub>i</sub> | vj!v <sub>k</sub> &vm | Transmits $(v_j)$ if vm bit=1; $(v_k)$ if vm bit=0 to $v_i$ . | 147 <i>ijk</i> | ## A.10.4 LOGICAL DIFFERENCES | Result | Operand | Description | Machine<br>Instruction | |----------------|-------------------|----------------------------------------------------|------------------------| | si | sj\sk | Logical difference of $(s_j)$ and $(s_k)$ to $s_i$ | 102 <i>ijk</i> | | v <sub>i</sub> | sj\v <sub>k</sub> | Logical difference of $(s_j)$ and $(v_k)$ to $v_i$ | 142 <i>ijk</i> | | v <sub>i</sub> | ∨j\∨k | Logical difference of $(v_j)$ and $(v_k)$ to $v_i$ | 143 <i>ijk</i> | ## A.10.5 VECTOR MASK | Result | Operand | Description | Machine<br>Instruction | |--------|--------------------|----------------------------------------------------|------------------------| | vm | V <sub>k</sub> , 2 | Sets vm from zero elements of $(v_k)$ | 030xxk | | vm | v <sub>k</sub> ,n | Sets vm from nonzero elements of $(v_k)$ | 031xxk | | ∨m | V <sub>k</sub> ,p | Sets vm from positive elements of $(\mathbf{v}_k)$ | 032 <i>xxk</i> | | vm | ∨ <sub>k</sub> ,m | Sets vm from negative elements of $(v_{\pmb{k}})$ | 033 <i>xxk</i> | ## A.10.6 COMPRESSED IOTA | Result | Operand | Description | Machine<br>Instruction | |----------------|----------|-------------------------------------------------------|------------------------| | v <sub>i</sub> | ci,sj&sk | Enters $v_i$ with compressed iota $(s_j)$ and $(s_k)$ | 176 <i>ijk</i> | ## A.11 BIT COUNT INSTRUCTIONS | Result | Operand | Description | Machine<br>Instruction | |----------------|-----------------|------------------------------------------------|------------------------| | si | psj | Population count of $(s_j)$ to $s_i$ | 106 <i>ij</i> 0 | | v <sub>i</sub> | ₽Vj | Population count of $(v_j)$ to $v_i$ | 164 <i>ij</i> 0 | | s <sub>i</sub> | qsj | Population count of parity of $(s_j)$ to $s_i$ | 106 <i>ij</i> 1 | | v <sub>i</sub> | qv <sub>j</sub> | Population count of parity of $(v_j)$ to $v_i$ | 164 <i>ij</i> 1 | | sį | zsj | Leading zero count of $(s_j)$ to $s_i$ | 107 <i>ij</i> x | | v <sub>i</sub> | zvj | Leading zero count of $(v_j)$ to $v_i$ | 165 <i>ijx</i> | # A.12 SHIFT INSTRUCTIONS Instructions which perform left or right shifts are listed in this group. A.12.1 LEFT SHIFTS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|------------------------| | sį | s <sub>i</sub> <exp< td=""><td>Shifts <math>(s_j)</math> left <math>exp=64-jk</math> places to <math>s_i</math></td><td>110<i>ijk</i></td></exp<> | Shifts $(s_j)$ left $exp=64-jk$ places to $s_i$ | 110 <i>ijk</i> | | v <sub>i</sub> | vj <sup><a< sup="">k</a<></sup> | Shifts $(v_j)$ left $(a_k)$ bits with zero-fill. Results to $v_i$ . | 150 <i>ijk</i> | | si | s <sub>i</sub> ,s <sub>j</sub> <a<sub>k</a<sub> | Shifts ( $s_i$ and $s_j$ ) left $a_k$ places to $s_i$ | 112 <i>ijk</i> | | v <sub>i</sub> | vj,vj <ak< td=""><td>Double shift <math>(v_j)</math> left <math>a_k</math> places to <math>v_i</math></td><td>152<i>ijk</i></td></ak<> | Double shift $(v_j)$ left $a_k$ places to $v_i$ | 152 <i>ijk</i> | ## A.12.2 RIGHT SHIFTS | Result | Operand | Description | Machine<br>Instruction | |----------------|---------------------|----------------------------------------------------------------------|------------------------| | si | s <sub>i</sub> >exp | Shifts $(s_i)$ right $exp=jk$ places to $s_i$ | 111 <i>ijk</i> | | v <sub>i</sub> | vj>ak | Shifts $(v_j)$ right $(a_k)$ bits with zero-fill. Results to $v_i$ . | 151 <i>ijk</i> | | si | sj,si>ak | Shifts ( $s_j$ and $s_i$ ) right $a_k$ places to $s_i$ | 113 <i>ijk</i> | | v <sub>i</sub> | vj,vj>ak | Double shift $(v_j)$ right $a_k$ places to $v_i$ | 153 <i>ijk</i> | | _ | |-----| | - | | - | | ~ · | | _ | | _ | | - | | _ | | _ | | _ | | _ | | - | # **READER'S COMMENT FORM** | CRAY-2 Computer System Fun | ctional Description | HR-2000 B | |-----------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------| | | elp us provide you with better documentation. Pe e blank space for additional comments. | lease take a moment to | | <ul><li>2) Your experience with Cray comp</li><li>3) Your occupation: compute</li></ul> | : 0-1 year 1-5 years 5+ years uter systems: 0-1 year 1-5 years sr programmer non-computer professional please specify): in a class as a tutorial or introduction | | | How you used this manual: - | in a classas a tutorial or introduction<br>for troubleshooting | _ as a reference guide | | Using a scale from 1 (poor) to 10 (ex | ccellent), please rate this manual on the following | g criteria: | | 5) Accuracy 6) Completeness 7) Organization | <ul><li>8) Physical qualities (binding, printing)</li><li>9) Readability</li><li>10) Amount and quality of examples</li></ul> | <del></del> | | manual. If you have discovered any | n additional sheet if necessary, for your other co<br>inaccuracies or omissions, please give us the p<br>a quick reply to your comments and questions. | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Name | Address | | | itte | City | | | Company<br>Telephone | State/ Country<br>Zip Code | | | Today's Date | | | NO POSTAGE NECESSARY IF MAILED IN THE UNITED STATES # **BUSINESS REPLY CARD** FIRST CLASS PERMIT NO 6184 ST PAUL, MN POSTAGE WILL BE PAID BY ADDRESSEE Attention: PUBLICATIONS 1345 Northland Drive Mendota Heights, MN 55120