The history of the RIM10B loader is included at the end of this
   document.
   
   This program is the bootstrap loader that was used to do a "cold
   start" on a PDP-10 (model KA or KI) via the paper-tape reader. It
   reads 36-bit words from 6 consecutive 8-bit paper-tape frames, loads
   them into core memory, and verifies checksums. The entire program is
   less than 16 words long, and therefore can fit into locations 000000
   through 000017 (octal) - the locations used by the accumulators when
   the "FM Enable" switch is off. (FM = Fast Memory = 16 accumulators, 36
   bits each, implemented as flip-flops instead of core memory.)
   
                                 RIM10B Loader
                                       
                        RIM10B  ; Causes RIM10B loader to be punched
00/   777762,,0                 XWD     -16,0
01/   710600,,60        ST:     CONO    PTR,60
02/   541400,,4         ST1:    HRRI    A,RD+1
03/   710740,,10        RD:     CONSO   PTR,10
04/   254000,,3                  JRST   .-1
05/   710470,,7                 DATAI   PTR,@TBL1-RD+1(A)
06/   256010,,7                 XCT          TBL1-RD+1(A)
07/   256010,,12                 XCT         TBL2-RD+1(A)
10/   364400,,0         A:      SOJA    A,      ; Magic occurs here ****
11/   312740,,16        TBL1:   CAME    CKSM,ADR
12/   270756,,1                  ADD    CKSM,1(ADR)
13/   331740,,16                SKIPL   CKSM,ADR
14/   254200,,1         TBL2:    JRST   4,ST
15/   253700,,3         AOBJN   ADR,RD
16/   254000,,2         ADR:    JRST    ST1
17/                     CKSM=ADR+1

   Here is an example of a two-word program as output by RIM10B
17/   777776,,777               LOC     1000    ; Set starting address
20/   201740,,3777      START:  MOVEI   17,4000-1
21/   505740,,777600            HLRI    17,-200
22/   707677,,4576                              ; Sum of previous 3 words
23/   254000,,1000              END     START

                                   Analysis
                                       
   RIM
          When the Read-In Mode (RIM) switch is pressed on the console of
          a KA or KI, it sends a reset pulse down the I/O bus, sets the
          PC flags to zero, and executes "DATAI D,0" (where D is the
          device code selected by a set of 7 switches, the paper tape
          reader is device 104). The DATAI reads in an IOWD, which has
          the negative word count in the left half and starting address
          minus one in the right half. The CPU then repeatedly executes
          "BLKI D,0" until the left half of location 0 reaches zero.
          ("BLKI D,X" increments both halves of location X, reads in a
          word from device D, and stores it the address that the right
          half of location X now points to.)
          
   00/ XWD -16,0
          Transfer 16 octal (14 decimal) words, starting at location 1.
          
   01/ST: CONO PTR,60
          Start paper tape reader in binary mode
          
   02/ST1: HRRI A,RD+1
          Reset finite-state machine to looking for IOWD
          
State RD+1 = Looking for IOWD or JRST

   03/RD: CONSO PTR,10
          Read paper tape reader status, skip if "DONE" bit is set
          
   04/ JRST .-1
          Not set, keep looping until the bit does get set
          
   05/ DATAI PTR,@TBL1-RD+1(A)
          Index register A has RD+1, indexing TBL1-RD+1+RD+1 is TBL1+2,
          which is the SKIPL CKSM,ADR instruction, therefore the
          effective address is ADR. Store the IOWD in ADR.
          
   06/ XCT TBL1-RD+1(A)
          Same effective address, "SKIPL CKSM,ADR" loads the IOWD into
          accumulator CKSM, and skips next instruction because its
          negative.
          
   07/ XCT TBL2-RD+1(A)
          Not executed first time around. At the end of the tape, a JRST
          instruction will be read in instead of an IOWD. (JRST is opcode
          254, which is postitive). TBL2-RD+1+RD+1 is TBL2+2, which is
          ADR. The JRST instruction which was just read in is executed,
          and that causes the PC to jump to the beginning of the program.
          
   10/A: SOJA A,RD+1
          Set the PC to RD+1, subtract one from index register A (so it
          now has RD in the right half, then jump to the original address
          (RD+1).
          
          Note: This is a self-modifying instruction. The CPU, however,
          remembers the effective address that the instruction used to
          have.
          
   04/ JRST .-1
          Jump to location 3.
          
State RD+0 = Reading in data words

   03/RD: CONSO PTR,10
          Read paper tape reader status, skip if "DONE" bit is set
          
   04/ JRST .-1
          Not set, keep looping until the bit does get set
          
   05/ DATAI PTR,@TBL1-RD+1(A)
          Index register A has RD+0, indexing TBL1-RD+1+RD+0 is TBL1+1,
          which is the ADD CKSM,1(ADR) instruction, therefore the
          effective address is one greater than what ADR points to. Store
          the data in memory.
          
   06/ XCT TBL1-RD+1(A)
          Same effective address, "ADD CKSM,1(ADR)" adds the word read in
          to the additive checksum in accumulator CKSM.
          
   07/ XCT TBL2-RD+1(A)
          The address is TBL2-RD+1+RD+0 which is TBL2+1. That location
          has "AOBJN ADR,RD". Add one to both halves of accumulator ADR.
          If the result is still negative, loop back to RD (location 3).
          If non-negative, continue on at location 10.
          
   10/A: SOJA A,RD+0
          Set the PC to RD+0, subtract one from index register A (so it
          now has RD-1 in the right half, then jump to the original
          address (RD+0).
          
          Note: This is a self-modifying instruction. The CPU, however,
          remembers the effective address that the instruction used to
          have.
          
State RD-1 = Reading in checksum

   03/RD: CONSO PTR,10
          Read paper tape reader status, skip if "DONE" bit is set
          
   04/ JRST .-1
          Not set, keep looping until the bit does get set
          
   05/ DATAI PTR,@TBL1-RD+1(A)
          Index register A has RD-1, indexing TBL1-RD+1+RD-1 is TBL1+0,
          which is the CAME CKSM,ADR instruction, therefore the effective
          address is ADR. Store the expected checksum in ADR.
          
   06/ XCT TBL1-RD+1(A)
          Same effective address, "CAME CKSM,ADR" compares the calculated
          checksum in accumulator CKSM with the expected checksum stored
          in memory location ADR. Skip the next instruction if they're
          equal.
          
   07/ XCT TBL2-RD+1(A)
          The address is TBL2-RD+1+RD-1 which is TBL2+0. That location
          has "JRST 4,ST" which is a HALT instruction. If the previous
          compare instruction failed, set the program counter to ST and
          halt the CPU. This allows the operator to back up the paper
          tape reader and try again. If the CAME succeeded, this HALT is
          not executed.
          
   10/A: SOJA A,RD+1
          Set the PC to RD+1, subtract one from index register A (so it
          now has RD-2 in the right half, then jump to the original
          address (RD+1). This jumps to location ST1, which resets the
          finite-state machine.
          
Dispatch table for finite-state machine

   11/TBL1: CAME CKSM,ADR
          In state RD-1, read expected checksum into ADR, then compare
          calculated checksum with expected checksum.
          
   12/ ADD CKSM,1(ADR)
          In state RD+0, store data word into memory, then add data word
          into running checksum.
          
   13/ SKIPL CKSM,ADR
          In state RD+1, store IOWD or JRST in ADR, then load that word
          into accumulator CKSM and skip if the word is negative.
          
   14/TBL2: JRST 4,ST
          If the checksum comparison fails, halt the CPU, with ST in the
          PC.
          
   15/ AOBJN ADR,RD
          In state RD+0, increment the IOWD and jump to RD if more to go.
          
   16/ADR: JRST ST1
          This is the last word of the RIM10B loader. When the hardware
          read-in process is completed, this instruction is executed to
          start the program.
          
   17/CKSM=ADR+1
          The additive checksum is calculated using this accumulator.
          
Storing bootstrap in core memory

   The FM ENB switch enables Fast Memory, causing references to the
   accumulators (locations 00 through 17) to go to RAM instead of core
   memory. When FM ENB is off, the above bootstrap can be toggled into
   locations 01 through 16. (Locations 00 and 17 need not be
   initialized.)
     _________________________________________________________________
   
                                     Notes
                                       
History of the RIM10B loader

   From: Bob Clements
   To: inwap@best.com
   Subject: Re: RIM10B bootstrap loader for the PDP-10
   Newsgroups: alt.folklore.computers,alt.sys.pdp10
   
   Hi, Joe,
   
   As I said in a previous article about RUNOFF, I'm avoiding USENET
   postings until I get a little spare time to fake my address to avoid
   email spam. Feel free to post this if you omit my email address except
   in the form I put on the last line.
>>The true magic of the PDP-10 instruction set was in the RIM-10B loader.
>>The program is 14 instructions long and uses 2 accumlators for data.
>>It reads in 36-bit words from an 8-bit paper tape reader, deposits the
>>data in memory, and verifies the checksums of the program it is loading.

   Add "and can be restarted on a block boundary in case of a checksum
   error", which was another requirement of the task.
>Somewhere I read the story of how it was created; a really bright hacker
>was tricked into creating it when his colleagues kept saying that it
>couldn't be done.  Anyone have the details?
>       -Joe

   No trickery. Just the challenge of doing it.
   
   I think I posted this some years ago. If anyone has an old copy they
   can compare my current fading memory to the old version. Here's how I
   remember it now.
   
   This loader was written in an all-out brainstorming effort. It
   happened at DEC, Maynard, building 5 rather than at TMRC or Project
   MAC where other such fests happened.
   
   Somehow the challenge came up of writing a paper tape loader that
   would not require the use of any fixed memory locations. The idea was
   that any program might be loaded in pieces, and you wouldn't want to
   clobber any previous part with storage/code used by the loader. Also,
   to take dumps of a dead program, we didn't want to clobber any core.
   This should fit entirely in the ACs.
   
   A few previous attempts had been done, but they all took somewhat more
   than sixteen words. Finally, a bunch of serious bit bummers decided to
   work on it and get it solved.
   
   My memory may be wrong, but I think the group that worked on it
   included Alan Kotok, Tom Eggers, Dave Gross and myself, and a couple
   more whom I'm not so sure of. Maybe Peter Hurley? Maybe Tom Hastings?
   
   Anyway, we came up with a LOT of ways of doing it in fifteen
   instructions plus the two registers to hold the checksum and the AOBJN
   pointer. Seventeen words in all. We considered using location 40 for
   the 17th word, but that didn't feel fair.
   
   Then, out of the blue, Dave Gross came in with this wonderful hack,
   using two indexed XCT instructions, which was totally unlike anything
   else we had tried. It used thirteen instructions plus the registers,
   so it could fit in the ACs WITH the count word needed by the RIM
   hardware! Register zero was actually a spare. Most other attempts had
   used it for the checksum.
   
   There was great rejoicing, and the quiet, reserved Dave Gross actually
   looked quite pleased with himself.
   
   Bob Clements, K1BC, my-last-name at BBN dot COM. (w) +1 617 USE K1BC
   
Disclaimer

   From: JCGreen@ix.netcom.com (John C Green Jr)
   
   By the time the 1971 edition of "PDP-10 Reference Handbook" was
   published there had been so many questions asked by people using it as
   an example of good programming technique that this comment was added
   in the margin:
   
     This loader is written for min-
     imum size and is quite com-
     plex. Do not approach it as a
     simple programming example.
     
Note from Dave

   From: "Dave Gross HLO2-2/B10, pole G13, dtn 225-4317 31-Mar-1998 1403
   -0500"
   Subject: RE: History of Rim10b loader
   Date: Tue, 31 Mar 98 14:03:56 EST
   
   I am the Dave Gross mentioned by Bob Clements in the History of the
   rim10b loader page. I don't remember Peter Hurley or Tom Hastings
   working on the code, but there was someone else. I'm not sure
   who...maybe Peter Conklin or Dave Stone ... who motivated the effort.
   
   The problem was presented to me as a theoretical one: is it possible
   for a paper tape loader to fit in the register space, load data,
   compute and check checksums, and jump to the loaded program when done?
   The others came close as Bob pointed out. My challenge was to "bum"
   one more location out of the loader. I don't remember how I found that
   SOJA hack, but when the dust settled, the loader was 2 instructions
   shorter. Indeed, I nearly broke my arm patting myself on the back.
   
   Then, to my surprise, we actually made use of the loader for most
   paper tapes. The loader was written up in the programming manuals but
   very few programmers could figure it out. I kept getting phone calls
   about that loader for years afterward - right up to the time the 10/20
   line was retired.
   
   Dave
   
Effective address calculation

   Later versions of the "Processor Reference Manual" had this paragraph
   repeated twice:
   
                              PLEASE READ THIS
                                      
     The calculation of E is the first step in the execution of every
     instruction. No other action taken by any instruction, no matter
     what it is, can possibly precede that calculation. There is
     absolutely nothing whatsoever that any instruction can do to any
     accumulator or memory location that can in any way affect its own
     effective address calculation.
     
   Note that "A: AOJA A," does not mean "increment accumulator 10 and
   then set the PC to the current value of that accumulator". Instead,
   the effective address E is calculated first, then the accumulator is
   incremented, then the PC is set to the remembered value of E.