Boot loaders
The Atari Lynx has a strict boot process that assumes that the cartridge contains an encrypted header. The decryption process performed by Mikey will restore unencrypted code and data to 0x0200 and finally perform a JMP to that address. The code that is executed is referred to as a “boot loader” as it load and boots the program on the rest of the cartridge. The loader must make sure to play by these rules of the boot process, which are baked into Mikey and are the same for all Lynx consoles.
Atari two-stage boot loader
Atari used a boot loader that was part of the Handy development kit created by Epyx. The implementation complements the decryption and de-obfuscation process by Mikey during booting with a checksum validation. The checksum can detect any changes in the cartridge’s contents after the header. The header itself is already protected as it will not be decryptable to the original code if any change is made to it. The full decryption process during the boot sequence is explained in the corresponding chapter on startup in more detail.
The games that were published by Atari all had a similar boot loader consisting of two stages, most likely because the amount of code and data would not fit within the 5 block limit of a frame. They devised a clever mechanism to split the loading of the loader into two encrypted frames. The two stages work in conjunction, with the first using some parts of the second to read files from the cartridge before handing over control to the second stage.
The earliest versions of the Atari boot loader were 512 bytes total and had two 5-block frames of 256 encrypted bytes each for both stages.
Games with first stage of 5 blocks
Only the first published games have a first frame consisting of 5 blocks.
- Blue Lightning
- Blue Lightning Demo
- California Games
- Chips Challenge
- Electrocop
- Gates of Zendocon*
- Gauntlet 3
Note: Gates of Zendocon has a different first and second stage when compared to all the other games listed.
Later versions only required 410 bytes bytes for a header with a 3-block (154 bytes) in the first stage and 5-block frame (256 bytes) for the second. The chapter on encryption covers blocks and frames more extensively. All variations of the boot loaders use two stages.
First stage of Epyx boot loader
All Atari games that have a first stage frame of three blocks have the same logic. The first stage is responsible for loading the second stage and boot screen in preparation of all lengthy operations that the second stage will execute. The first stage performs the following steps:
- Clear display palette
- Load and decrypts second stage
- Initialize Suzy’s sprite engine
- Load and display title screen
- Set colors in palette (yes, after displaying it first)
- Initialize substitution box for hashing algorithm
- Start hashing in second frame
The last step effectively passes control of the boot loader to the second stage. In addition to these listed steps the first stage also contains a copy of the first two entries of the cartridge’s directory structure and a unique 16-byte hash algorithm checksum value.
The first stage is only different because it includes these two copied directory entries and the calculated checksum. The size of the ROM is not relevant in the first stage and has no effect on the code or data. This specific data only accounts for 14+16=30 different bytes, the obfuscation and encryption make all bytes completely different.
Details for first stage
The first stage starts with a weird branch to a location directly after the branch.
BRA .10 ; self mod
.10
JSR ClearPalette
STZ ptr
inc ptr+1 ; assumes BASE2_ORG is BASE_ORG+$100
LDA #RESTLESS ; Make sure ROM is powered
STA IODAT
JMP RSA_LOAD
bootContinue
;-- Read the sprite display file
ldy #6
JSR ReadFile
The reason for this branch is not very obvious at first. The first stage gets loaded by a call to the function RSA_LOAD which resides at $FE40 in Mikey’s boot ROM. It will decrypt and de-obfuscate the stage by reading it from the cartridge and loading it at the address specified in the zero=page ptr WORD variable. After decrypting the boot ROM will always make a jump to $0200.
The second stage must also be decrypted, and the only way to do so is by using RSA_LOAD. Th challenge is to deal with the fact that after decryption the boot ROM jumps to BASE_ORG at $0200 again. The self modifying branch code allows the RSA_LOAD routine to be used anyway. The first time after decrypting stage 1 the boot ROM jumps to $0200 the branch will fall through to the next part. This clears the palette and sets the destination address of the second decryption. It is assumed that the second file is loaded at BASE_ORG + $0100 = $0300.
The ClearPalette function not only clears the palette to all zeros giving the characteristic black startup screen. It also modifies the branch to go to bootContinue.
#IFDEF REAL_VERSION
;TEMP ;-- Set up the return branch
LDA #bootContinue-BASE_ORG-2
STA BASE_ORG+1 ; Self-modify
#ENDIF ; of #IFDEF REAL_VERSION
In doing so the second time Mikey jumps to $0200 after RSA_LOAD the first stage skips the part that would clear the palette and set the destination address for the second stage again. This avoids the boot process ending up in an eternal loop.
With the second stage now loaded at $0300, the first stage can use some methods located
there. In particular, it uses the ReadFile function to read files from the cartridge.
The entries are missing the flag byte and are only 7 instead of 8 bytes each.
file01dir
.BY FILE0PAGE
.WO FILE0OFFSET,FILE0ADDRESS,FILE0SIZE
.BY FILE1PAGE
.WO FILE1OFFSET,FILE1ADDRESS,FILE1SIZE
They are embedded during compilation by including a file romdir.i that was created by the buildchk tool during the header encryption process. This tool reads an unencrypted ROM, scans for the first two directory entries and writes the relevant information (without the flag) into the include file romdir.i.
The assumption is that these directory entries are for two specific files:
- A sprite for the title screen preceded with 32 bytes for the color palette
- The main entry point for the program
The first stage references the FILE0ADDRESS to set the color values of the palette and initialize a sprite control block (SCB) structure for Suzy to display.
SuzyValues
.BYTE >{FILE0ADDRESS+32}
.BYTE <{FILE0ADDRESS+32}
The file01dir is referenced in a subroutine called copydir to load either the first or second entry.
copydir
ldx #7
.00 lda file01dir,y
sta directory-1,x
dey
dex
bne .00
rts
The copydir routine expects an offset to the last element of the entry to load. It is called by ReadFile, located in the second stage, who will copy the entry to a known working location directory in zero-page memory. The values in the directory entry are used extensively in ReadFile by referencing directory.
There are two calls to ReadFile to load the respective files using expected offsets at #6 and #R_ENTRY_SIZE*2-1.
Note
The original source code used
#6were it would be more consistent when that value had been#R_ENTRY_SIZE-1.
After reading the title screen file Suzy’s sprite engine is initialized with specific values as chosen by the loader. It defines the base address for video memory, the location of the sprite chain to display and offsets for rendering:
| Register | Value | Comment |
|---|---|---|
SPRGO |
0x01 |
|
SPRSYS |
0x20 |
Set NO_COLLIDE |
VIDBAS |
0x0400 |
Referred to as DISPLAY_ORG in source code |
SUZYBUSEN |
0x01 |
|
HOFF |
0x0000 |
Only sets low byte HOFFL which nulls HOFFH automatically |
VOFF |
0x0000 |
Same as HOFF |
SCBNEXTH |
>{FILE0ADDRESS+32} |
Skips first 32 bytes of color palette values |
SCBNEXTL |
<{FILE0ADDRESS+32} |
|
HSIZOFFL |
$7F |
Almost any value $0000-$00ff will do |
HSIZOFFH |
$7F |
Idem |
The video base address is initialized to $0400, right after the second stage of the loader. The video memory area is located at $0400 to $23E0. The loader requires that any SCB loaded in the first file is located after $2400 as per the instructions to create a valid cartridge.
The first file with the load screen holds 32 bytes to populate the color palette, followed by the SCB data.
Fun fact
The game “Todd’s Adventures in Slimeworld” is the only game with a slightly different first 3-block stage. A tiny portion of the code has a different order in copying the palette and setting the display location correctly. All other cartridges do the two operations in reversed order:
;-- Alright, finally display this picture ; STZ DISPADRL ; this is true from MIKEY ROM LDA #>DISPLAY_ORG STA DISPADRH ;-- Copy the palette to the hardware registers LDY #31 .50 LDA FILE0ADDRESS,Y STA GREEN0,Y DEY BPL .50It is not that different, but shows an evolution in the source code used for the first stage.
The final part of stage 1 is the initialization of the substitution box. This is as simple as setting the values for the 256 byte sbox array from zero to 255.
;-- Initialize sbox with values from 0 to 255
; LDX #0 ; X = zero from above
.0 TXA
STA sbox,X
INX
BNE .0
The handover to the second stage happens by making a call to the method MakeSbox which will continue the creation of the substitution box, and executes the remaining logic of the second stage.
Second stage of Epyx loader
The second stage performs the checksum verification of cartridge content. During calculation the user can dismiss the boot screen by pressing any button or use the directional pad. If the calculated checksum matches the embedded value in the first stage the boot loader finishes by loading the second file and jumping to its load location where execution of the main program begins. Should the checksum not match, then this means that the bytes did not match the original content. It implies either a deliberate change of content, or a malfunctioning cartridge. The boot loader will stay in an infinite loop and essentially freeze.
Second stage details
The checksum is calculated by creating a hash using the substitution box that was created. The routine loads the cartridge content in 256 byte blocks and computes a part of the hash value from it. You can find this computed hash value in memory at $0001 in zero-page memory.
During the calculation of the hash value per page the JOYSTICK hardware register is evaluated for a non-zero value, which would indicate a directional press of the joystick or button press.
lda JOYSTICK ; clear screen if user pressed stick
beq .f1
jsr ClearPalette
.f1
When JOYSTICK does not equal #$00 it will not take the branch and call ClearPalette to create a black screen by setting all colors to black.
Finally, when the hash value has been calculated, it is compared to the embedded value at checkstring. If it doesn’t match, the fail branch will be taken indefinitely.
LDX #RESULTLENGTH-1 ; Check final result to see
check LDA buffer0,X ; if it checks out
CMP checkstring,X ; If value isn't equal
fail BNE fail ; loop here for a really long time
DEX
BPL check
When the values match the code continues and will load the file from the second directory entry and jump to that location. The second file is the main entry point of the program.
* Hash checks out OK!
* Clear the display to a shade of grey, then load and execute file 1
JSR ClearPalette
ldy #R_ENTRY_SIZE*2-1
JSR ReadFile
JMP (file01dir+R_ENTRY_SIZE+R_DEST)
There are three variations of the second stage boot loader for the original Atari Lynx games (besides the aforementioned 512 bytes header games). The main difference is the block size for 128KB, 256KB and 512KB cartridges.
| ROM size (KB) | Page size (in bytes) | High byte |
|---|---|---|
| 128 | 512 | 2 |
| 256 | 1024 | 4 |
| 512 | 2048 | 8 |
The routine setCartAddress uses the high byte of the ROMPAGESIZE to set the correct page. Similarly, the LoadAndHash routine has special logic to differentiate between the 128KB size cartridge versus the 256KB and 512KB size cartridges. The code includes some #IF statements that add code segments depending on the ROMPAGESIZE being larger than 2, so essentially having other code for 128KB cartridges.
Despite the fact that there are three different second stage loader frames, these are found in all games with the laoder of 3 and 5 block frames. Since the code is the same, the encrypted version of it is also the same. Put another way, when you look at the second stage loader of any Atari game of a certain ROM size, it will have the same bytes.
Flaw in checksum
The calculation of the checksum in the buildchk tool as well as in the loader is started after the header and some additional bytes. The source code for the algorithm computes the start position using the following code:
RSA_PAGE_LENGTH = 51;
j = ((RSA_PAGE_LENGTH * (3 + 5) - 1) + (RomSize / 4096)) & (~((RomSize / 4096) - 1));
The RomSize can be 131072, 262144 or 524288 bytes. Given these numbers and the RSA_PAGE_LENGTH of 51 the starting positions of the checksum are:
| Rom size | Starting position | |
|---|---|---|
| 128KB | 416 | 0x1a0 |
| 256KB | 448 | 0x1c0 |
| 512KB | 512 | 0x200 |
This means that the directory entries right after the encrypted header of 410 bytes up to the starting position are unchecked as well as unencrypted. This implies they could be changed. Granted, the first two directory entries are not really used by the first loader stage. Still, it is unclear why the starting position was calculated this way.
Developer support
The source code also included a way to build the boot loader code just to calculate the checksum. The compilation is influenced by three definitions: DEBUG, BUILDCHECK and REAL_VERSION. You can set the first two explicitly by including the equates as shown below.
;DEBUG .EQ 1
;BUILDCHECK .EQ 1
#IFNDEF DEBUG
#IFNDEF BUILDCHECK
REAL_VERSION .EQ 1
#ENDIF
#ENDIF
The last one REAL_VERSION is inferred from a build that is neither debug nor a build check.
After the code that performs the calculation of the checksum a conditional compilation determines whether the checksum will actually be checked. By settings the equate BUILDCHECK a developer can compile boot.src to be a real boot loader or one for debugging purposes.
terminate
;-- OK, hash string created.
;-- If we're building the checkstring, then break here
;-- else check the hash with the required value
#IFDEF BUILDCHECK
.BY $13
BRK
#ELSE
LDX #RESULTLENGTH-1 ; Check final result to see
When BUILDCHECK is defined it will insert both a #$13 byte and a BRK command into the terminate. The #$13 value is a trigger for Howard to enter break mode and render control to the Amiga running Howdebug. Similarly the BRK instruction will do a software interrupt that can be picked up by Pinky and Mandebug to transition into the halt state.
After the debugger tool has attached it can inspect the memory at $0001 and $0285 where the 16 bytes of the computed hash value and the embedded value are located. This allowed the programmer to check the actual computed value of the checksum and manually verify and compare it.
You can create the BUILDCHECK version of the boot loader by compiling using:
asm +DBUILDCHECK=1 boot.src
Alternatively, setting the DEBUG (but not BUILDCHECK) flag would also create a single stage loader, but with a check of the hash value.
Instead of two files boot.bin and boot2.bin a single file boot.bin is created that can be loaded directly into either Howard or Pinky. The boot.bin file should not be encrypted by the tool chain.
Hacked loader
There is a header circulating that works with any rom that expects a 410 byte encrypted loader. This hacked header is created using an adjusted version of the original boot loader source code. It leaves much of the implementation intact, except that it uses the two directory entries right after the header instead of the two embedded entries. Also, it does calculate the hash value for the checksum, but does not compare it to the values included in the first stage of the loader.
The available hacked loader is intended for 256 KB ROM files, which are the most common. Using the modified source code you can also create a version for 128KB and 512 KB cartridges.
The modified loader has changes in both the first and second stage. The first stage has a couple of modifications, most notably the copydir routine. The original routine copies the embedded file01dir directory entry into zero-page memory, which is used by the ReadFile routine.
0262: copydir
0262:A2 07 ldx #7
0264:B9 41 02 .00 lda file01dir,y
0267:95 35 sta directory-1,x
0269:88 dey
026A:CA dex
026B:D0 F7 bne .00
026D:60 rts
The changed version reads two directory entries from the cartridge. After reading the second stage of the loader, the cartridge IO registers are left directly after the header at byte $410 from the start of the cartridge’s content. This is where the directoriy entries are assumed to be located. The routine sequentially reads 16 more bytes and copies them in the expanded directory area.
0262: copydir
0262:A0 10 ldy #2*ROMDIR_ENTRY_SIZE ; Read two directory entries from cartridge
0264:AD B2 FC .00 lda RCART_0
0267:95 36 sta directory,x
0269:E8 inx
026A:88 dey
026B:D0 F7 bne .00
026D:60 rts
The different copydir does not require the Y register to be set to the index of the last byte anymore. Instead the X register is initialized to 0x00 to indicate the first entry in zero-page memory. It is set just before a jump to subroutine ReadFile,which calls copydir at the beginning. The boot process continues as normal, with Suzy being initialized and the boot screen displayed.
The second stage of the loader starts with the unaltered substitution box algorithm and will calculate the checksum as usual. There are some changes in reading through the entire contents of the cartridge, as the directory entry used now is exactly like the original entries read from the cartridge. The original implementation uses embedded copies with only 7 bytes instead of the 8 size entries outside of the header. The ReadDataLoop routine has some trivial changes to correct the difference
At the end of the routine the section named terminate has some bigger changes.
The terminate does not perform a comparison of the computed hash with the embedded checksum anymore. Instead, the loop now copies the second directory entry in zero-page memory over the first one. This allows the ReadFile routine to use the address directory again, without needing to copy the entry (which is why the subroutine call to ReadFile+3 is made). All other instructions are replaced with no-operation NOP statements, to keep the size of the second stage the same. Finally, the last part is a jump to the destination address of the main program, relying on the second unchanged directory entry in zero-page memory.
03DD:A2 08 LDX #ROMDIR_ENTRY_SIZE ; Copy second directory over first
03DF:B5 3D .5 LDA directory+ROMDIR_ENTRY_SIZE,X ; if it checks out
03E1:95 35 STA directory,X ; If value isn't equal
03E3:CA DEX
03E4:D0 F9 BNE .5 ; loop here for a really long time
03E6:EA NOP
03E7:EA NOP
03E8:EA NOP
03E9:20 4F 02 JSR ClearPalette
03EC:EA NOP
03ED:EA NOP
03EE:20 03 03 JSR ReadFile+3
03F1:6C 41 00 JMP (directory+ROMDIR_ENTRY_SIZE+ROMDIR_DEST)
Micro loader
The Atari Lynx community created their own minimal boot loader after the discovery of the encryption keys and the algorithm to obfuscate and encrypt. It is a two frame loader that only does the bare essentials to get a game loaded. The first frame is only 1 block of 51 bytes and the second stage is an unencrypted 151 bytes long. The first stage assumes that after loading and decryption of the fist stage loading can continue right after the header. It will be at the first directory entry in the list of files. These files are the same as for programs created with the Handy development kit.
You can recognize the micro loader by its signature first block that is 52 bytes, for the $ff indicating a single block and the 51 bytes of the encrypted block.
$ff, $b6, $bb, $82, $d5, $9f, $48, $cf, $23, $37, $8e, $07, $38, $f5, $b6, $30,
$d6, $2f, $12, $29, $9f, $43, $5b, $2e, $f5, $66, $5c, $db, $93, $1a, $78, $55,
$5e, $c9, $0d, $72, $1b, $e9, $d8, $4d, $2f, $e4, $95, $c0, $4f, $7f, $1b, $66,
$8b, $a7, $fc, $21
The micro loader performs the following functions:
- Force Mikey address range to be RAM memory
- Clear palette to all black
- Set ComLynx port to open collector
- Configure
AUDINto output and low value - Read second stage to
$FB68and jump to it
The second stage is exactly 151 long and will be loaded at $FB68 up to $FBFF just before the start of the hardware registers for Suzy at $FC00. The second stage will load the next 8 bytes for the directory entry of the main executable. It uses the information in the entry to skip to the right location in the cartridge and load the appropriate length of bytes into the indicated destination. Finally it performs a JMP to the load address and starts the main program.
The micro loader is located at the cc65 GitHub repository in bootldr.s.
The tool chain of cc65 defaults to using the boot loader in most projects and examples. The boot loader is included automatically when the __BOOTLDR__ value is imported in the configuration. More details are available in the chapter on directories.
Early boot loaders
The first 6 original games and the Blue Lightning Demo cartridge were all 128KB in size.
- Blue Lightning 4042f5bb0d7ed8143401b867c08d5636
- Blue Lightning Demo 1c16b2c53468805acbcd6f46f8a2c1af
- California Games 73c2570bf45e37e52596982849748a9e
- Chips Challenge 9c4ee705b3093e41e2743be695d94715
- Electrocop 59791b2bd296c9f20ec4896913bfdac6
- Gates of Zendocon* 7ad54c0ba888ea619f43635a3fab4dd5 ???
- Gauntlet 3 0bedd65cbe42a07ed2ae9686003158ab
These games have a boot loader of two 5-block frames. The code is similar to the later 3 and 5 block frames boot loader seen in all other Atari releases.
The main differences are evolutionary. The older boot loaders are less efficient and later refactored to be smaller in size. Some optimizations were made, but most notably the older loaders are less secure.
The earliest boot loaders read the directory entry directly from the cartridge right after the encrypted header.
LDA #1
JSR PrepareShiftRegister
JSR copydir
LDA directory+ROMDIR_DEST
CLC
ADC #32
STA SCBNEXTL
LDA directory+ROMDIR_DEST+1
ADC #0
STA SCBNEXTH
It prepares the second block at $0200 in the cartridge to be read and copies all eight bytes of the directory entry. The destination address is used to set the SCB data for the title screen after adding 32 bytes to skip the color palette values.
The fact that the destination address for the title screen was read from unencrypted data opened a security loophole that was successfully exploited. It allowed Bastian Schick and Lars Baumstark to modify the address in the entry and overwrite code in the loader to skip checksumming.