User Tools

Site Tools


tokeniser

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

tokeniser [2018/03/31 13:19]
127.0.0.1 external edit
tokeniser [2018/04/17 19:04] (current)
tbest3112 Added syntax highlighting
Line 2: Line 2:
  
 //by JGH, June 2006//\\ \\  BBC BASIC programs are tokenised, that is, BASIC keywords are stored as one-byte values. This results in programs which execute faster and are more compact.\\ \\  A tokenised line can easily be detokenised,​ or expanded, as there is a one-to-one mapping between token values and the expanded string. For example, code similar to the following would expand a tokenised line:​\\ ​ //by JGH, June 2006//\\ \\  BBC BASIC programs are tokenised, that is, BASIC keywords are stored as one-byte values. This results in programs which execute faster and are more compact.\\ \\  A tokenised line can easily be detokenised,​ or expanded, as there is a one-to-one mapping between token values and the expanded string. For example, code similar to the following would expand a tokenised line:​\\ ​
 +<code bb4w>
         quote%=FALSE         quote%=FALSE
         REPEAT         REPEAT
Line 8: Line 9:
           addr%=addr%+1           addr%=addr%+1
         UNTIL ?addr%=13         UNTIL ?addr%=13
 +</​code>​
 Tokenising, however, is more fiddly. Tokens can be abbreviated on entry and characters are only tokenised at certain parts of the line. For instance, in the following line:​\\ ​ Tokenising, however, is more fiddly. Tokens can be abbreviated on entry and characters are only tokenised at certain parts of the line. For instance, in the following line:​\\ ​
 +<code bb4w>
         ON NOON GOTO 1,2         ON NOON GOTO 1,2
 +</​code>​
 the first '​ON'​ is the token ON, but the second '​ON'​ is part of the variable '​NOON'​. The second '​ON'​ must be left untokenised.\\ \\  The **EVAL** function tokenises the supplied string and evaluates it as an expression. Usefully, the tokenised string can be retrieved from where BASIC has stored it.\\ \\  the first '​ON'​ is the token ON, but the second '​ON'​ is part of the variable '​NOON'​. The second '​ON'​ must be left untokenised.\\ \\  The **EVAL** function tokenises the supplied string and evaluates it as an expression. Usefully, the tokenised string can be retrieved from where BASIC has stored it.\\ \\ 
 ==== In Windows BASIC: ==== ==== In Windows BASIC: ====
 +<code bb4w>
         B%=EVAL("​0:"​+A$)         B%=EVAL("​0:"​+A$)
         token$=$(!332+2)         token$=$(!332+2)
 +</​code>​
 This code may fail if an event interrupt (e.g. ON TIME) occurs between the two statements. To avoid this use the following alternative which (in //BBC BASIC for Windows// version 6 only) does not allow an intervening interrupt:​\\ ​ This code may fail if an event interrupt (e.g. ON TIME) occurs between the two statements. To avoid this use the following alternative which (in //BBC BASIC for Windows// version 6 only) does not allow an intervening interrupt:​\\ ​
 +<code bb4w>
         IF EVAL("​1:"​+A$) token$=$(!332+2)         IF EVAL("​1:"​+A$) token$=$(!332+2)
 +</​code>​
 The input and output share the same memory buffer, which is OK so long as the tokenising process shortens the code (which is almost always the case) but can cause a crash if it lengthens the code. That can happen only in exceptional circumstances such as the following code:​\\ ​ The input and output share the same memory buffer, which is OK so long as the tokenising process shortens the code (which is almost always the case) but can cause a crash if it lengthens the code. That can happen only in exceptional circumstances such as the following code:​\\ ​
 +<code bb4w>
         ON A% GOTO 10,​20,​30,​40,​50         ON A% GOTO 10,​20,​30,​40,​50
 +</​code>​
 The tokenising process encodes the line numbers in a special internal format which results in the overall length increasing from 25 to 31 bytes. To reduce the chance of this causing a crash the tokenising routine can be adapted as follows:​\\ ​ The tokenising process encodes the line numbers in a special internal format which results in the overall length increasing from 25 to 31 bytes. To reduce the chance of this causing a crash the tokenising routine can be adapted as follows:​\\ ​
 +<code bb4w>
         IF EVAL("​1RECTANGLE:"​+A$) token$=$(!332+3)         IF EVAL("​1RECTANGLE:"​+A$) token$=$(!332+3)
 +</​code>​
 \\  \\ 
 ==== In ARM BASIC: ==== ==== In ARM BASIC: ====
 +<code bb4w>
         SYS "​XOS_GenerateError",​0,​STRING$(255,"​*"​) TO ,A%         SYS "​XOS_GenerateError",​0,​STRING$(255,"​*"​) TO ,A%
         B%=EVAL("​0:"​+A$)         B%=EVAL("​0:"​+A$)
         token$=$(A%-14)         token$=$(A%-14)
 +</​code>​
 \\  \\ 
 ==== In 6502 BASIC: ==== ==== In 6502 BASIC: ====
 +<code bb4w>
         A%=EVAL("​0:"​+A$)         A%=EVAL("​0:"​+A$)
         token$=$((!4 AND &​FFFF)-LENA$-1)         token$=$((!4 AND &​FFFF)-LENA$-1)
 +</​code>​
 \\  By preceding the code you want to tokenise with "​0:"​ you can safely pass it to **EVAL** without provoking a Syntax error. You can then extract the tokenised code from memory, so long as you do it immediately after calling **EVAL**.\\ \\  This can be written as functions as follows:​\\ ​ \\  By preceding the code you want to tokenise with "​0:"​ you can safely pass it to **EVAL** without provoking a Syntax error. You can then extract the tokenised code from memory, so long as you do it immediately after calling **EVAL**.\\ \\  This can be written as functions as follows:​\\ ​
 +<code bb4w>
         DEF FNTokenise_Win(A$):​LOCAL A%,B%         DEF FNTokenise_Win(A$):​LOCAL A%,B%
         WHILELEFT$(A$,​1)="​ ":​A$=MID$(A$,​2):​ENDWHILE         WHILELEFT$(A$,​1)="​ ":​A$=MID$(A$,​2):​ENDWHILE
Line 40: Line 57:
         DEF FNTokenise_65(A$):​LOCAL A%         DEF FNTokenise_65(A$):​LOCAL A%
         A%=EVAL("​0:"​+A$):​=$((!4 AND &​FFFF)-LENA$-1)         A%=EVAL("​0:"​+A$):​=$((!4 AND &​FFFF)-LENA$-1)
 +</​code>​
 \\  These functions are used in full in the '​Tokenise'​ BASIC library at [[http://​mdfs.net/​System/​Library/​BLib|mdfs.net]].\\ \\  A text file can then be tokenised using the following code:​\\ ​ \\  These functions are used in full in the '​Tokenise'​ BASIC library at [[http://​mdfs.net/​System/​Library/​BLib|mdfs.net]].\\ \\  A text file can then be tokenised using the following code:​\\ ​
 +<code bb4w>
       in%=OPENIN(text$)       in%=OPENIN(text$)
       out%=OPENOUT(basic$)       out%=OPENOUT(basic$)
Line 54: Line 73:
       CLOSE#​out%:​out%=0       CLOSE#​out%:​out%=0
       CLOSE#​in%:​in%=0       CLOSE#​in%:​in%=0
 +</​code>​
 \\  \\ 
 ==== Notes ==== ==== Notes ====
  Acorn BBC BASIC programs are stored slightly differently. See [[/​Format|Format]] and relevant pages on [[http://​beebwiki.jonripley.com/​|Acorn-specific sites]] for details.\\ \\  This technique may fail if the tokenised code is //longer// than the original text version, which can happen if it contains an **ON GOTO** or **ON GOSUB** statement. This problem may be mitigated to some extent as follows (for Windows BASIC):​\\ ​  Acorn BBC BASIC programs are stored slightly differently. See [[/​Format|Format]] and relevant pages on [[http://​beebwiki.jonripley.com/​|Acorn-specific sites]] for details.\\ \\  This technique may fail if the tokenised code is //longer// than the original text version, which can happen if it contains an **ON GOTO** or **ON GOSUB** statement. This problem may be mitigated to some extent as follows (for Windows BASIC):​\\ ​
 +<code bb4w>
         B%=EVAL("​0OTHERWISE:"​+A$)         B%=EVAL("​0OTHERWISE:"​+A$)
         token$=$(!332+3)         token$=$(!332+3)
 +</​code>​
 \\  \\ 
 ==== See also ==== ==== See also ====
tokeniser.txt ยท Last modified: 2018/04/17 19:04 by tbest3112