System/380 Principles of Operation - Version 1.0


This document outlines the principles and history behind both MVS/380 1.0 (released in June 2009) and VM/380 0.9 (released in January 2009).


DRIVING FORCE
-------------

The lack of a free C compiler on z/OS meant that some people both past and present were unable to cost-justify the expense of a C compiler and thus had to write in a non-preferred language.  The free GCC compiler has had a S/370 target for approximately two decades, but it was always a case of "so close but yet so far", as unfortunately the GCC compiler itself had not been written with a non-Unix-based C90 environment in mind.  Even just opening a file - open() would often be used instead of fopen(), despite the fact that GCC was basically a text processing application that could have been written in pure C90.  The effort to overcome these problems is a story in itself (documented further down), but the end result was that GCC (which inherently generates 31-bit clean code) was ported to the (free) 24-bit MVS 3.8j operating system.  However, the memory constraint of approximately 9 MB memory, for both the 3 MB executable and its data, placed severe restrictions on how large a program being compiled could be before running out of memory.  GCC was for example unable to recompile all of itself on MVS.  On VM/370 where nearly 16 MB total is available, the situation was different - GCC could indeed recompile itself, although not all modules at full optimization (to do that required approximately 23 MB, including 3 MB for the compiler itself).

Basically, GCC needed to be executed as a 31-bit application rather than being constrained to 24-bits by the operating system and hardware.  With z/OS in mind, being 31-bit was never a problem.  However, z/OS is not available for free, so hobbyists were not directly able to write software targetting z/OS.  Going through hoops, it was possible to verify that z/OS worked, but what was ideally needed was a 31-bit MVS, even if it was substandard compared to z/OS.

Independently of the effort to make GCC work on an EBCDIC platform, a C runtime library (PDPCLIB) was being developed (since 1994).  Initially written on the PC, when access to C/370 was suddenly made available (in 1997), the mainframe (then OS/390) was able to be targetted.  The design of PDPCLIB was such that all OS (in this case, MVS) access was done via one assembler file less than 1000 lines long.  GCC meanwhile was of the order of 400,000 lines of C code, which then became 700,000 lines of assembler.  The important thing about this generated assembler was that it was "stock standard". No OS calls, just clean 31-bit-capable code.  Whether the executable was 24-bit or 31-bit came down to just 1000 lines of hand-written assembler.  And this 1000 lines of code would determine the fate of EVERY C program, not just the GCC compiler itself.  And some of those programs require even more memory than GCC.  This 1000 lines of assembler code was eventually made AMODE 31, RMODE ANY for a good z/OS target.

The cleanliness of the generated code and the (deliberate) isolation of OS-dependent code had always held out the hope that one day those 1000 lines could be replaced with something that would allow the rest of the compiler to run 31-bit, free from the constraints of a 24-bit operating system. Something like a standalone program.  When it was time to analyze what could be done, it was noted that those 1000 lines could cope with being executed in AMODE 24, even if the caller was running AMODE 31, using ATL (above the 16 MB line) data, because the data that the assembler operated on was almost all obtained by the assembler code itself in a previous call, or resided on the stack (DSA - dynamic save area - which was allocated in the startup code, the other very small bit of assembler).  A simple AMODE switch to 24-bit should have been all that was required.  What this meant was that if there was some way to get into 31-bit mode, noting the possibility that interrupts may need to have been disabled to prevent the 24-bit operating system from interfering and getting confused, the C code would be able to run freely until it hit the assembler, at which point it could switch mode back and reenable interrupts and no-one would be any the wiser.  Sophisticated OS services like SPIE would possibly not be available, multitasking may have needed to be temporarily halted, but none of these things were actually required in the situation in question (hobbyist system trying to do a large compile, and when complete, return to business as usual).

MVS (24-bit) would have already loaded the 31-bit-capable code into memory, so it would just be sitting there waiting for an appropriate machine to execute it.  ie something like this sequence:

1. Suspend the entire traditional S/370 machine when ready to enter 31-bit mode.
2. Switch in an artificial machine (resembling S/390 to some extent) that could cope with 31-bit memory addresses, all in real memory, thus allowing application (GCC) logic to access large data structures, but not requiring operating system services.
3. When the application was ready to do I/O, and thus switch back to 24-bit mode ready for interaction with the operating system, at the point of mode transition, switch out the artificial machine and switch in the S/370 machine, which would be unaware that anything had actually happened unless there was some timing issue or interrupt issue.

The question was probably not if it was posible, but rather - how much work was required to construct a machine capable of fulfilling such a requirement.  In the end a method was found that only involved about 20 lines of code changes to the S/370 system provided by the Hercules emulator to produce an artificial S/380 system that was able to slot in very cleanly indeed.  Interrupts did not need to be disabled.  Machine type didn't need to be repeatedly swapped. Real memory was not required.

The simple technique involving mapping memory was first made to work in November 2007 (by Paul Edwards) and a formal release of the new architecture was released in December 2007.  A new SVC (233) was introduced (by "somitcw") with MVS/380 0.2 to obtain and release ATL memory.  By MVS/380 0.4 (March 2008) the SVC was hidden inside the GETMAIN (thus allowing source code compatibility with z/OS) until August 2008 (although not formally released until January 2009 with MVS/380 0.9) when Gerhard Postpischil developed a technique to intercept SVC 120 which provided 31-bit binary compatability between MVS/380 and z/OS (at least for applications that conformed to the 31-bit interface that MVS/380 supported).  At the same time (January 2009), CMS had also been modified (by Robert O'Hara) to allow the same source and binary compatability between VM/380 0.9 and z/VM.

In March 2009, Jason Winter introduced a far more sophisticated flavor of S/380 - providing for memory protection, multiple memory requests and even a semblance of virtual memory support.  At time of writing, this version requires people to roll their own version of Hercules, and is dependent on Jason's JCC compiler.  But technically, memory protection is available for those who see this as a barrier for adoption.  Either implementation (or others in the pipeline), is transparent to the actual application (at least applications that use a heap and thus only make one memory request of the OS - which GCCMVS is able to generate, including when it generates itself).


S/380 HARDWARE ARCHITECTURE
---------------------------

Hercules was used to create the necessary "hardware".  The existing S/370 was used as a base, and basically renamed to S/380, to avoid the need to create a 4th machine type.  There is a flag to downgrade S/380 to S/370, but it is S/370 that is considered to be an "option".  All instructions that are common to both S/390 and z/Arch, with one exception involving I/O, were added to S/380.  The way Hercules is constructed, this was a minor modification.  The S/370 I/O remains.  This is absolutely essential since all of MVS 3.8j uses it, and the goal is to not have to rewrite MVS, where the complete source code is not even available (not even PL/S-generated "source").

Adding the S/390 instructions means that the BSM instruction is available to switch into 31-bit mode.  Some small changes (e.g. not using a fixed 24-bit address mask) were necessary to get S/380 to respect that mode change.  The biggest change was what to do about the ATL (above the 16 MB line) memory.  All of MVS 3.8j is 24-bit.  Neither the operating system nor any applications ever reference memory above the line.  Similarly, the S/370 architecture meant that there was no expectation for more than 16 MB of real memory to be used.  All virtual memory references resolved to BTL (below the line) addresses.  So there was nothing in existence (it was never logically possible to create such a thing) to interfere with any use of ATL memory.  As such, the change required was decidely simple - simply map any ATL reference into the equivalent ATL address of real memory.  This meant that all address spaces resolve to the same address so you can only run one 31-bit application at a time if you want to be assured of memory protection.  Given the young state of S/380, and the body of current users, in practice this is a non-issue, as most people don't even run one 31-bit application, nevermind having a requirement to multitask multiple 31-bit applications.  In addition, storage keys were ignored so that the operating system didn't require modifications to set the storage key of ATL memory to 8 for use by problem-state programs.  As noted earlier, these tradeoffs don't exist in Jason Winter's version.

In order to run 31-bit applications under CMS, one more modification was required (that was not required for MVS).  Since CMS runs under CP, and neither CP nor CMS are 31-bit aware, when a CMS application does an SVC, it doesn't simply load the old PSW to return.  It instead constructs a new PSW, losing the old 31-bit status in the process.  Actually, CP, running in ECMODE, does save the fact that the interrupt occurred in 31-bit mode, so when returning from hardware interrupts, there is no problem.  The problem only arises because CMS runs in BCMODE and thus gets an inherently 24-bit PSW.  But it is CMS that needs to decide what the return address will be, and obviously with zero knowledge of 31-bit, it can't construct the required PSW.  This problem was circumvented by, during an SVC, saving the return address, and when an LPSW was done, to check the address being returned to, seeing if that is the same address previously noted and if so, restoring 31-bit mode prior to returning to the application.  When CP is eventually modified to include similar logic, this change can be removed from Hercules/380 (where it doesn't really belong).  Note that while this change satisfies the requirements for most SVCs, there are some SVCs that return control to a different address, thus bypassing S/380's ability to detect it.  If calling these SVCs, the application is required to switch to 24-bit mode prior to invoking the SVC (this is not particularly onerous, since the application will already have to do such AMODE switches whenever calling the file i/o routines, which are done as calls, and aren't (or weren't) 31-bit clean, at least in the XA days).  MVS does not have this problem, as the SVCs are not intercepted and the entire ECMODE (31-bit aware) PSW context is saved and restored on SVC return.


MVS/380 PROGRAMMING INTERFACE
-----------------------------

MEMORY

While there is nothing currently (ie this is subject to change without notice - and Jason Winter has a version where this does not apply) physically preventing an application from directly accessing ATL memory, the official interface is via the normal z/OS GETMAIN with the LOC=ANY parameter.  The MVS 3.8j macro was updated to allow this parameter, and the SVC 120 which it invokes is intercepted by an add-on program (SVC120I) that is usually run at system startup.  SVC120I also allows the operator to partition the ATL memory to allow 31-bit programs that go through the proper interface to multitask while sharing the memory (although at time of writing, in the non-JW version, there is nothing preventing such applications from overwriting each other's memory).  Programs that use the official interface are portable at both source and binary level to z/OS, since z/OS uses this exact facility to provide memory to applications.

Currently the (non-JW) GETMAIN enhancement does not allow more than one simultaneous ATL memory request from the same program, although if the memory is first freed, it is then available to be reobtained.  Depending on your application, this may or may not be a problem.  C programs usually use a heap, thus a single request for a large chunk of memory is quite sufficient, and generally preferred for performance.  This is certainly the case for users of PDPCLIB, and people using this library can have this done automatically for them.

One more restriction on the (non-JW) GETMAIN is that only requests for a chunk of memory equal to or greater than 16 MB will go to ATL memory.  The reason for this is that any such large request would otherwise fail, so there is no harm done by only honoring a single request for a block such as this.  However, an application that codes LOC=ANY may do that to signify that it doesn't care if the memory resides above the line, even if it only requests a small amount of memory.  So an application that requests 3 blocks of 1 MB of memory would fail on the second request if ATL memory is used, but would succeed if BTL memory is obtained.  So ATL memory is (currently, non-JW) reserved for use in a very specific GETMAIN request.  This restriction is also expected to be lifted in the future to provide the same facilities that Jason Winter's version has.


ASSEMBLER

IFOX00 has some constraints that have been lifted, bringing it a little closer to IEV90. IEV90 has been defined as an alias to allow assembly JCL to be more compatible between MVS/380 and z/OS. It is possible to construct assembly JCL in such a way that the same JCL will work on both real IFOX00 and real IEV90, and it is this style that is thus portable between the two systems.  Access to the S/390 instructions has been provided by copying SYS1.ZMACLIB into SYS1.MACLIB.  This implementation is subject to change without notice.


PDPCLIB

Users of any C program linked with PDPCLIB will need to define 3 standard DDs - SYSIN, SYSPRINT and SYSTERM, corresponding to stdin, stdout and stderr.  DCB information will need to be provided for new output datasets, unless the IBM default of RECFM=U is desired.  The startup code is designed to expect parameters in either TSO or BATCH style and will adjust automatically.  Files opened via fopen() can have a filename of "dd:xxx" which signifies that a DDNAME of "XXX" exists to be opened.  Otherwise, the filename will be dynamically allocated to a generated DDNAME (PDPxxx).

Text files (ie second parameter of fopen = "r" or "w") are processed as follows.  If file is F or FB, then trailing blanks will be stripped from records on input and replaced with a single newline character.  On writing, the newline is stripped and records are blank-padded.  If the line is longer than the LRECL, extraneous characters are silently discarded.  If file is V or VB, then on input the BDW and RDW are stripped, and a newline is added at the end of the record.  Unless the record consists of a single space, in which case the space is stripped as well.  On write, empty lines have a space added.  This is consistent with handling of Variable records on MVS (e.g. when using the ISPF editor), and some standard IBM utilities (e.g. IEBCOMPR) cannot cope with truly empty records (RDW=4).  If a line is longer than the maximum LRECL, extra characters are silently dropped. If RECFM=U, on read the BDW is stripped and the byte stream is presented unchanged to the application.  Unlike IBM's C compiler, block boundaries don't get newline characters inserted into the byte stream. The reason for this is it prevents the ability of a binary read of a text file from preserving the data, since the block boundaries disappear in such a scenario. When writing to RECFM=U text files, data is written until a block is full.  Unlike IBM's implementation of C, newline characters do not cause a new block to be written. Once again, this allows a binary transmit of a RECFM=U file to have the line separators preserved when the data arrives at say the PC side. No special handling of the block boundaries needs to be done.

Binary files (ie second parameter of fopen = "rb" or "wb") are processed as follows.  If file is F or FB, then on input, data will be presented to application unchanged.  On output, data is also written unchanged, except that the last record will be padded with NUL characters if required.  If file is V or VB, then on input the BDW will be stripped, but the full RDW will be presented to the application.  This makes the byte stream compatible with what a PC application would see when reading a VB file transferred via ftp with the "rdw" option.  On write, an RDW needs to be provided by the application.  Any invalid RDW causes a file error condition, and no further data is written.  With one exception.  An RDW of all-NUL is a signal to discard any further data written.  This allows for a binary copy of a V dataset to an F dataset to be copied back to V without change or error, even if NUL-padding was required.  (Note that this consideration doesn't apply to text files since no RDW is provided by the application).  If a provided RDW is greater than the maximum LRECL then the RDW will be silently adjusted and the extra data silently discarded.

Opening a PDS without a member for read will cause the directory to be read and presented as a byte stream.  Any attempt to write to a PDS directory will cause an abend.


ADDRESSING MODE

Programs desiring 31-bit execution may or may not be entered in 31-bit mode directly, and are required to detect what mode they were called in and make the appropriate switch, then restore the caller's AMODE on exit.


VM/380 PROGRAMMING INTERFACE
----------------------------


MEMORY

The interface for CMS programs is identical to MVS users.  GETMAIN with LOC=ANY will obtain ATL storage.  There is no partitioning facility available in CMS, but it's not a real concept for CMS.  It would be a concept for CP, but there is no facility in CP for partitioning, nor any communication from CMS to CP.  So only one guest OS should run an ATL-using application.  CMS applications that obtain memory via this interface can be ported to z/VM at both the source and binary levels also.


PARAMETERS

VM/380 provides EPLIST support.  This is the same in VM/380 as in z/VM.  Parameters should be obtained the same way by chaining back via the save areas.  Once again, this is handled automatically for users of PDPCLIB.


EXEC2

VM/380 provides limited EXEC2 support similar to z/VM.  As with z/VM, this is activated via &TRACE.  For portable scripting (between VM/380 and z/VM), only EXEC2 is guaranteed to have EPLIST available, and only the subset of EXEC2 commands that are present in EXEC should be used.


ASSEMBLER

On z/VM, the "ASSEMBLE" assembler is quite limited, and for programs with a large number of symbols you need to use "ASMAHL" instead.  VM/380 has simulated this by adding some limited enhancement to "ASSEMBLE", copying that to ASMAHL, and updating the maclibs to provide macros such as BSM.  Naturally this is subject to change without notice, but the programming interface remains the same.


MACROS

z/VM rearranged the macro libraries (e.g. replacing CMSLIB with DMSGPI).  To allow application portability, the macro library was copied to its new name, as well as having the GETMAIN macro updated (sourced from MVS) and having a BSM macro added to compensate for it not being internally defined in the assembler.


PDPCLIB

Users of any C program linked with PDPCLIB can either define the 3 standard DDs - SYSIN, SYSPRINT and SYSTERM, corresponding to stdin, stdout and stderr, or these will be allocated to the terminal dynamically.  New files can be either defined with FILEDEF and opened by DDNAME by specifying a filename of "dd:xxx" where xxx is the DDNAME, or else they can be a full filename.  If a full filename is specified, then on creation of an output binary file, DCB attributes are set to RECFM=F, LRECL=800.  An output text file is set to RECFM=V, LRECL=2000 by default.  Dynamically allocated files are given generated DDNAMEs of format PDPxxx where xxx is a number.  The startup code is designed to detect an EPLIST otherwise get parameters from a PLIST.  However, if a SYSPARM filedef is in place, the parameters are obtained from the first line of that file instead.  If both a SYSPARM (even a dummy one) and a parameter are provided, then special processing is signalled, on the assumption that this is an EXEC environment where only a PLIST is available, and the user has difficulty passing long and mixed case parameters to the application.  The parameter list will be lowercased, and only characters preceded by a "_" will be uppercased.  Spaces will be stripped unless preceded by "_".  If the first parameter is "_+" then the lower/upper rules are swapped.  Two underscores will create a single one.

Text files (ie second parameter of fopen = "r" or "w") are processed as follows.  If file is F, then trailing blanks will be stripped from records on input and replaced with a single newline character.  On writing, the newline is stripped and records are blank-padded.  If the line is longer than the LRECL, extraneous characters are silently discarded.  If file is V, then on input the BDW and RDW are stripped, and a newline is added at the end of the record.  Unless the record consists of a single space, in which case the space is stripped as well.  On write, empty lines have a space added.  This is consistent with handling of Variable records on MVS (e.g. when using the ISPF editor).  If a line is longer than the maximum LRECL, characters are silently dropped.

Binary files (ie second parameter of fopen = "rb" or "wb") are processed as follows.  If file is F or FB, then on input, data will be presented to application unchanged.  On output, data is also written unchanged, except that the last record will be padded with NUL characters if required.  If file is V, then on input the BDW will be stripped, but the full RDW will be presented to the application.  This makes the byte stream compatible with what a PC application would see when reading a VB file transferred via ftp with the "rdw" option.  On write, an RDW needs to be provided by the application.  Any invalid RDW causes a file error condition, and no further data is written.  With one exception.  An RDW of all-NUL is a signal to discard any further data written.  This allows for a binary copy of a V dataset to an F dataset to be copied back to V without change or error, even if NUL-padding was required.


ADDRESSING MODE

Programs desiring 31-bit execution may or may not be entered in 31-bit mode directly, and are required to detect what mode they were called in and make the appropriate switch, then restore the caller's AMODE on exit.


COMMON C CALLING CONVENTION
---------------------------

GCCMVS and GCCCMS (the C compilers bundled with MVS/380 and VM/380) generate a special entry point @@MAIN when a program with main() defined is processed.  All function names are uppercased and truncated to 8 characters, and "_" is converted to "@".  As such @@MAIN is distinct from MAIN.  @@MAIN simply branches to the assembler startup code (@@CRT0) and control is never returned to it.

GCC-generated code pretty much follows the standard OS linkage conventions, except that the list of addresses passed to the called program via R1 are not terminated with a 1 in bit 1. In C you are expected to know how many arguments you'll have. In addition, integer parameters are not stored as addresses, but instead their actual value is used.  This is expected to change in the future to be compatible with IBM and the Language Environment, so macros should be used in preparation for this change.

When @@CRT0 is invoked, it sets up a stack.  The first 18 words of the stack are a standard OS register save area.

@@CRT0 calls @@START, which in turn calls MAIN (the user's "main") - which is NOT @@MAIN (the entry point to the executable).

Each routine's save area comes from the GCC stack allocated in @@CRT0. These save areas are chained following OS conventions, ie savearea+4 points to the previous save area, savearea+8 points to the next one. A routine's save area includes space for its local variables. This amount is calculated by the compiler, and passed as the FRAME= parameter of the PDPPRLG macro.


FUTURE DIRECTION
----------------

The following projects are either underway or being considered:

  o  Upgrade GCC from 3.2.3 to 3.4.6 (the last version with an
     S/370 target - the GCC folk decided to drop the "i370"
     target in GCC 4).

  o  Port PDPCLIB and GCC to DOS/VS(E) using private source
     statement libraries (RECFM=F, LRECL=80) with Unix-style
     data (embedded newlines), plus NUL-padding on last record.

  o  Create an S/380 version of DOS/VS.

  o  Port OpenCobol to MVS/380.

  o  Port PDPCLIB to other mainframe C compilers (C/370,
     Dignus).

  o  Port PDPCLIB to Linux which will aid in using that
     environment to discover problems with MVS-targetted code.

  o  Possible enablement of languages other than C in GCC (the
     C++ target could perhaps use stdcxx as the C++ runtime
     library)

  o  Use BREXX or Regina to provide REXX internally to CMS as
     per z/VM (prototype demonstrated in May 2009).

  o  Provide the equivalent of mingw (or maybe LE/370) to CMS
     applications by putting the C runtime library in shared
     memory allowing small executables (prototype demonstrated
     in May 2009)

  o  Use native CMS macros instead of OS macros for CMS port of
     PDPCLIB (achieved May 2009).

  o  Port 31-bit version of RPF to MVS/380 and use ATL memory
     to allow editting of large files.

  o  Modify CP so that it is responsible for remembering which
     applications need the 31-bit restored and remove this
     logic from Hercules/380.

  o  Memory protection for different address spaces accessing
     ATL memory (multiple solutions to this, some relatively
     easy, some requiring extensive OS modifications, some
     already in existence).

  o  Add memmgr to the GETMAIN intercepts allowing an
     application to do multiple GETMAINs for ATL memory.

  o  CICS programming interface provided via KICKS.

  o  Port SQLite to GCCMVS now that Jason Winter has ported it
     to JCC.

  o  Getting GCCMVS/GCCMU as a native application under
     MUSIC/SP, completing Dave Edwards's (RIP) project.

  o  Adding TCP/IP to MVS and VM (Jason Winter has a solution
     of sorts to this, but it hasn't yet been integrated).

  o  A prelinker for GCC-generated code to allow long names
     and reentrancy.

  o  Enhancements to the CMS editor.

  o  Better cleanup on MVS of ATL memory for when a program
     terminates without doing a FREEMAIN.

  o  DFDSS-compatible backup (restore is already available).


GCC PORT HISTORY
----------------

The first i370 code generator for GCC was written in 1988-1989 by Jan Stein, targetting Amdahl's UTS.  It was distributed to others to use as a base.  In 1992 Dave Pitts picked that up, made modifications to it, and arranged with Richard Stallman to get it into the official GCC base, which happened in 1993 in GCC 2.5.  Unfortunately, GCC itself was far from being C90-compliant which would have made it easy to port to the mainframe (or any other) environment.  Considering the fact that objectively all it did was read in a bunch of text files (C code) and produced another text file (the assembler code) - at least with the "-S" option specified - it should have been possible to have written it C90-compliant.

One of the big problems was that the GCC coders had made assumptions that they were running on an ASCII host.  To solve this problem meant going into the internals of the compiler to find out where that had been done and make the code generic.  This work was largely done by Dave Pitts and by 1998, GCC 2.8.1 had an OS/390 EBCDIC port.  Also in 1998, Linas Vepstas (with assistance from Dan Lepore and a machine courtesy of Melinda Varian) started making large scale changes to the i370 target in support of an effort to port Linux to S/370.

In April 1998, Paul Edwards, shortly before he started working on a real MVS (called OS/390 at that point) system, had dusted off his 1997 MVS port of PDPCLIB, then contacted the GCC maintainers to ask about making modifications for MVS.  He was unaware of the other two activities (Linas and Dave were in communication with each other though), and the GCC maintainer apparently didn't know either, so work was done on 2.7.2 and later 2.8.1 to try to make it C90-compliant, with a simple compilation procedure and a single executable using Borland C++ and Watcom C++ on OS/2 (a deliberately alien platform), ready to be ported to MVS.  The maintainers weren't too thrilled about changes being made to make gcc a single executable, but some of the other changes were accepted.  Replacement Unix functions were written and the gcc executable was able to be compiled and linked (using AD/Cycle C/370) and display its usage.  However, when doing a real compile, it went into a loop, that required indepth knowledge of the GCC application to resolve, so the effort was aborted at that point.  In March 1999 the laptop with this work on it was stolen so any GCC changes that the maintainers hadn't accepted were lost.  However, the Unix I/O replacement functions had been backed up.  In addition, the concept of converting GCC into a single, C90-compliant executable had come close to being proven.

Apparently encountering difficulty getting i370 mods into mainstream GCC, Dave had been adding his i370 mods to different versions of GCC since 1998 and maintaining them separately.  Linas managed to get some, but not all, of his work into the GCC baseline (these additional changes made in 1999 would end up being lost from the active development stream until 2009).  At around this time (1999) another development had been taking place - the introduction of Hercules, which allowed the S/370, 390 etc hardware to be emulated, thus allowing hobbyists to run old versions of MVS (which were public domain).  So access to a mainframe ceased to be problematic, especially with the introduction of packaged systems like Tur(n)key from Volker Bandke.

By late 2002, Dave was up to version 3.2 of GCC, working under z/OS with USS (Posix support). Paul made initial contact with Dave in November 2002 to inquire about the technical plausibility of a port to non-USS MVS.  On year later, November 2003, Paul Edwards, working with Phil Roberts, picked up this version with a view to getting it working natively on MVS 3.8j.

The problems that Dave identified in any attempt to port to MVS 3.8j were that the size of the main part of the compiler (cc1) was 16 MB on OS/390, and that the way the gcc driver loaded cpp, cc1 etc would need to be emulated somehow, and that a scheme would be needed to map Unix style file names into MVS datasets.  Not mentioned were - the compiler had never been used to attempt to compile itself which would have revealed that it was riddled with bugs, and the fact that it was riddled with non-C90 functions, plus other non-C90 things like relying on a long function names to be unique.

However, as you can probably guess, there were solutions to all these problems.  First the 16 MB executable.  PDPCLIB is quite small, possibly because it doesn't support VSAM files, Posix and many other nice-to-have features.  It did however have the ability to process text files, which is all that was required for the GCC application.  While optimization wasn't switched on until years later, the entire optimized executable was eventually found to be just 3 MB (it was 4 MB unoptimized).  MVS 3.8j gave about 9 MB of address space, and if abnormally stripped down, could provide upwards of 10 MB.  This proved to be sufficient for most normal use.  Abnormal use - such as recompiling GCC itself at full optimization - was not possible though.

GCC is split up into multiple components, with a small "gcc" executable invoking the other executables in turn.  However, this is fairly strange in that most of the code is in cc1 anyway, so there's not a lot to be gained.  And the price is everything channelled via a "system" call, or fork/exec calls - which are all inherently non-portable.  The solution here was to mask out all that channelling code and instead get gcc to call cc1 etc as normal function calls to provide a single large gcc executable.  This in turn mean that the function names needed to be unique across all the executables, so duplicate functions needed to be found and then renamed with a #define.

The mapping of the include filenames was initially done by renaming them to 8-character unique names and changing the corresponding source code.  The path searching for include files was nullified and replaced with DD lookups for INCLUDE and SYSINCL (the latter for system headers).  Later on the "remap" facility was unearthed and all the renames in the source code were able to be reversed out.

The includes for non-standard headers (fcntl.h, sys/stat.h etc) were #ifdef'ed out.  These header files generally pointed to function calls which also didn't exist in C90.  The simplest solution to this problem was to create a mini-Posix library where open() is defined in terms of fopen().  Some functions were made to return a reasonable value for most common use.  Anything abnormal needed a code change to get rid of the call that wasn't needed in the first place in a text-processing program.

One of the bugs hit early on was the fact that the compiler was converting static functions into 8-character assembler identifiers according to their actual name, which mean that they needed to be unique within the source file.  When the dust settled, there were about 3000 functions that had to be #defined to new names, about half of them static (C90 requires the statics to be unique to more than 8 characters, so it was the MVS port of GCC that was at fault).  To make matters worse, the code was initially generating CSECTs for each function name.  The IBM linker is designed to just take the largest CSECT of the same name and silently use that instead of reporting a name clash.  The code generator was changed to use unnamed CSECTs and use ENTRY statements instead to ensure external identifiers were unique and clashes detected.  Years later, the static bug was fixed and a tool developed to search out duplicates in the generated assembler so as to only keep those names that needed to be kept (ie external name clashes, which ended up being about 1300).  Although the GNU config.h is annoying in that they don't provide a C90 version by default, and instead one needs to be constructed manually, it does have the advantage that all the remaps were able to be done in there and get picked up across the entire source base.

While those other problems were time-consuming to resolve, they were nonetheless straightforward.  It was the bugs that were the biggest obstacle.  Without someone familiar with the compiler internals, it was sometimes necessary to hack the generated assembler.  By March 2004 (after having started in November 2003), GCC 3.2 was a native MVS application able to recompile itself (at least on a 31-bit machine) and version 1.0 was released.  The most serious problem was with floating point - the native compiler was generating bad floating point values and the workaround was to generate the value "1.0" all the time instead.  This didn't cause a problem for the recompilation of GCC itself because it apparently didn't use floating point for anything important.  However, it meant that PDPCLIB had some defects due to this kludge that it wouldn't normally have had.

Meanwhile, mainstream GCC was about to release the 3.4.x series, the last that would include the i370 target, as for the 4.x series they had decided to unceremoniously dump it!  The GCC maintainers aren't MVS/EBCDIC users themselves (the S/390 port is Linux/ASCII), so it is a struggle to refit the EBCDIC support for each release as it is either screwed with or dumped or the changes aren't accepted in the first place.  So it always took a long time for the MVS version to come out, waiting on Dave Pitts to get the USS version working on the next release first.

At this point (April 2004), Dave Wade picked up the MVS port to try to get it working on VM/CMS with a view to enabling BREXX to be ported.  He succeeded in doing this, plus fixing the floating point bug, plus other bugs and unimplemented functionality in PDPCLIB and in January 2006, version 2.0 was officially released.  At around this time, Dave Pitts had independently moved his changes up to version 3.2.3, so the GCCMVS changes were reapplied on top of that.  So version 1.0 of 3.2.3 was released in January 2006 also.  Version 2.0 followed a short while later (March 2006) mainly to enable building on Unix with later versions of GCC.

Version 3.0 was released in August 2007 and significantly progressed the mainframe-ness of the compiler. The prologue/epilogue assembler macros were created rather than being done with a separate program. Include files could be concatenated.  It was fully 31-bit on z/OS (instead of being restricted to RMODE 24 due to the way the RDJFCB macro had been coded).  Remap was made to work. The generated files (mainly generated from the machine description) were able to be generated on the MVS system.  Optimization (-O2) was switched on, taking the executable size from 4 MB to 3 MB, although some code workarounds were needed to bypass optimizer bugs.  Aliases for PDPCLIB modules were provided to enable automatic linking.  Another code generator bug fix was applied.  Also, on VM, it was now possible to get GCC to recompile itself unoptimized - or to create a hybrid where most of it was optimized, but a few modules were still unoptimized.  This state of affairs was probably made possible earlier when GCC had been modified to stop invoking setjmp() all the time which consumed a lot of memory (saving the stack), due to an overzealous implementation that would later be changed.  Regardless, this was the point at which it was now possible to have a purely mainframe compiler able to recompile itself on a freely available mainframe operating system.  Even the MVS version could theoretically be generated from VM/370.  This was never tried as it was academic and was soon replaced by an alternative and superior advance.

Up until this point, Paul Edwards, due to his very old and very flakey PC, had never dared attempt to install Hercules to see GCC running for himself.  Instead, all work had been done via email as he sent code to Phil Roberts and Phil sent back dumps, traces and on the odd occasion, the result of a successful run.  If Phil hadn't done this, everything would probably have been delayed by 4 years.  By November 2007 Paul had purchased a new laptop and had Hercules running TK3 (rather than TK3SU1 as that required Hercules 3.05 which wasn't working due to another problem).  It was then discovered that there wasn't enough region in TK3 out of the box to compile many of the source files that had previously been set as compilable in 24-bit.  Previously an elaborate scheme had been set up such that the JCL had "dummy" compiles where instead of doing a real compile, the old assembler (from the PC) was simply copied.  On a 31-bit system, those dummy compiles were then globally changed to real compiles.

The problem was that there was no good figure to use for available memory.  Multiple attempts were made to find a "lowest common denominator", but even the same machine produced different results on multiple runs.  By the 17th November 2007 the region had been lowered yet again, this time from 8500k to 8400k, but there was no end in sight to this problem.  We were trying to get too much out of the 24-bit system and it was simply the wrong solution.  This is why 31-bit systems exist and it was time to upgrade.  On 14th November 2007 Paul had initiated a general query to find out the best way to force through essentially once-off 31-bit compiles on the 24-bit MVS, with a bit of "trickery" (the phrase actually used upon success was "Paul managed to ram a square peg in a round hole with a sledge hammer").  There was a wide variety of opinions and suggestions, and on the 20th November 2007 an S/380 [in practice, but still displaying S/370] machine was able to enter 31-bit mode and stay there with no complaint from MVS.  On the 21st November the S/380 test program was able to write to ATL memory, although it wasn't until the 22nd November that this was realised due to confusion over the so-called "crap byte" that BALR inserts (and BASR doesn't).  By 7th December, 2007 GCC had been compiled end-to-end (ie reproducing itself) on the S/380 eliminating any remaining doubt about whether it was technically possible or not.

Version 4.0 was released in December 2007 and a heap manager (memmgr) was added which provided support for the newly created MVS/380.  In addition, the PC and mainframe were producing identical assembler thanks to the -Os option being used plus some other minor code changes plus another code generation problem being fixed.  This showed that there were no code generator bugs that had introduced a bug into the GCC executable itself.  Later it was discovered (by Dave Edwards when he was doing work on MUSIC/SP) that -O2 causes different code (both forms valid) to be generated depending on the exact representation of floating point values.  -Os does not appear to be sensitive to that.  Ideally code shouldn't be written that is sensitive to that, but no-one knew where that was happening.  Prior to Dave's discovery, it was assumed that one of the code generation bug fixes, or the generally random nature of those code changes were responsible for the identical code.  -Os had been switched on for an entirely different reason (ie an apparently incorrect claim that it produced significantly faster code than -O2 on MVS).

Version 5.0 was released in March 2008 and the last major standards violation - requiring statics to be unique in the first 8 characters was lifted as it was discovered what needed to be changed so that static functions could be renamed to be unique, generated names.

Version 6.0 was released in January 2009 along with version 1.00 of PDPCLIB as (after 15 years) it became C90-compliant, at least on the mainframe (as far as it was known) - with the exception that there were still a lot of known compiler bugs which no-one involved knew how to fix.  So finally there was a free (even for commercial use) C90-compliant (although given the known bugs, it would be best to give this "beta" status) environment on the mainframe.  The VM build procedure was totally revamped, and techniques developed to allow traditional automatic linking.  Plus it became a totally mainframe product as BISON and SED were provided on the mainframe so that nothing at all came from the PC.  The 31-bit GCC executables produced for both MVS/380 and VM/380 were made available, unchanged, for z/OS and z/VM users.  The z/OS deliverable was made available as an XMIT, while the z/VM deliverable was provided as an AWS tape. Also, output was switched to move mode rather than locate mode by default which made debugging-via-printf much easier after an abend.

The availability of a C compiler allowed a variety of other C products to be ported, and these were all bundled with MVS/380 0.9 and VM/380 0.9.  They were bison, brexx, bwbasic, diffutils, flex, m4, patch, sed, minizip (zlib).  The changes to brexx and bwbasic to support MVS and CMS were incorporated into the base product.

In February 2009, Linas was contacted to see if he was interested in fixing some of the remaining compiler bugs (about 7 serious ones preventing code from being compiled), and it was discovered that some of his code changes were not even the current version of the compiler.  Paul merged in the remaining code changes, except for one, which ironically was the only change that fixed one of the 7 bugs, but as a side-effect created an even more serious bug in other code! However, this change allowed experimentation to find out what change was required to circumvent the problem in question.

In April 2009, Paul had reached the point with Dave Pitts's unreleased 3.4.6 mods of being able to produce a single executable under Windows using gcc from Cygwin.  This was a precursor for getting it to work on MVS with PDPCLIB.

May 2009 saw two more advances. Robert O'Hara had produced GCCLIB, a native CMS runtime, which could be made resident, enabling small applications to be developed.  Meanwhile, Linas passed on sufficient knowledge to Paul Edwards to enable him to fix GCC compiler bugs.  This allowed the 15 or thereabouts known compiler bugs to be fixed at long last, producing a far more robust compiler.

June 2009 saw the release of GCC 3.2.3 MVS 7.0, the first in-practice C90-compliant release.  This was the point where GCC on MVS came of age, and the point at which C became a lingua franca for computers, with every major (or at least, more widely used than DOS/VSE) platform now speaking the language for no additional monetary cost.

Paul Edwards, July 15, 2009