User Tools

Site Tools


Converting 40-bit floats to 64-bit floats

by Richard Russell, August 2014

The assembly language routine below converts a 40-bit floating-point value, in the registers cl edx, into a 64-bit floating-point ('double') value, in the registers ecx edx:

      movzx   ecx,cl                  ;zero-extend exponent
      add     ecx,895                 ;adjust exponent
      rol     edx,1                   ;move sign to LSB
      shld    ecx,edx,21              ;align exponent
      shl     edx,20                  ;align mantissa
      btr     edx,20                  ;get sign to carry
      rcr     ecx,1                   ;insert sign 

Note that this routine does not deal with variants (i.e. a 40-bit value containing an integer rather than a float). To avoid the necessity of providing extra code for this purpose you can convert a variant into a float by multiplying by 1.0 in BASIC thus:

      var *= 1.0
This website uses cookies for visitor traffic analysis. By using the website, you agree with storing the cookies on your computer.More information
converting_2040-bit_20floats_20to_2064-bit_20floats.txt · Last modified: 2018/03/31 13:19 (external edit)