BBC BASIC Programmers' Reference

Three (or more) pass assembly

by Richard Russell, September 2021

The x86 CPU has some instructions which have alternative encodings that are different lengths. One of those instructions is 'push constant', which has these two alternative forms when the constant is in the range -128 to +127 (I'll use the value &55 as an example):

  68 55 00 00 00    push &55  ; long form
  6A 55             push &55  ; short form

These instructions have identical effects, but one is 5 bytes and the other 2 bytes. The BBC BASIC assembler (as do most) tries to be 'helpful' by choosing the shorter form, so in this case it will use the '6A 55' encoding. But for this to work reliably the length of the instruction must not change between the first and second pass of the assembler.

If it does (because the value is in the range -128 to +127 on one pass, but not on the other) it will result in what is called a 'phase error'. Here is an example of when just such a situation applies:

DEF PROCassemble
LOCAL P%, pass%, test%
DIM code% 100
FOR pass% = 0 TO 1
  P% = code%
  [OPT pass% * 3
  push test%
  .test% db "Hello"
  ]
NEXT pass%
ENDPROC

If you run this code it produces something like this output:

09A31DB4                                OPT pass% * 3
09A31DB4 68 B6 1D A3 09                 push test%
09A31DB9 48 65 6C 6C 6F       .test%    db "Hello"

But wait, this is wrong! The address test% is &09A31DB9 but the value pushed in the previous instruction is &09A31DB6. BBC BASIC is broken! I want my money back!

No, it's not broken. On the first pass the value of test% is zero, because it's declared as LOCAL, so the assembler encodes the 'push 0' as 6A 00. But on the second pass the value of test% is no longer in the range -128 to +127 and therefore the assembler uses the long encoding: 68 B6 1D A3 09. The resulting phase error causes the code emitted by the assembler to be invalid.

The way 'real' assemblers deal with this situation is that they run not just two passes but multiple passes, until everything stabilises. In a (rare) situation that it never stabilises, or not after an arbitrary maximum number of passes, they report an unresolvable phase error.

So one way to 'fix' the problem in this specific case is to use more than two passes. It would be possible to use the same kind of technique as 'proper' assemblers do by using a checksum or CRC as a measure of whether the code has 'stabilised' and run multiple passes until it has, but in this simple case running three passes is sufficient:

DEF PROCassemble
LOCAL P%, pass%, test%
DIM code% 100
FOR pass% = 0 TO 2
  P% = code%
  [OPT pass% - (pass% <> 0)
  push test%
  .test% db "Hello"
  ]
NEXT pass%
ENDPROC

Which produces (for example):

09A31DBF                                OPT pass% - (pass% <> 0)
09A31DBF 68 C4 1D A3 09                 push test%
09A31DC4 48 65 6C 6C 6F       .test%    db "Hello"

Note that the OPT value is 0 on the first pass, 2 on the second pass and 3 (to display the listing) on the third pass. If you don't want a listing specify an OPT of 2 on the third pass too:

  [OPT pass% - (pass% = 1)

A simpler solution will often be to change the order in which things are declared to avoid the phase error in the first place, for example:

DEF PROCassemble
LOCAL P%, code%, pass%, test%
DIM code% 100
FOR pass% = 0 TO 1
  P% = code%
  [OPT pass% * 3
  .test% db "Hello"
  .start% push test%
  ]
NEXT pass%
ENDPROC

In fact the original code would have worked correctly had test% not been made LOCAL. Then the variable would have been undefined on the first pass (assuming it's not used elsewhere in the program) and the assembler would have defaulted to using the long form for the push instruction.