Difference between revisions of "Crunching BASIC programs"

From BeebWiki
Jump to: navigation, search
m (mended long lines)
 
(mention minification)
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
[[Category:BASIC]]
 
[[Category:BASIC]]
To fit more code into the BBC Micro's limited memory, BASIC programs can be '''crunched'''. Crunching reduces the size of a program without changing its meaning, but in the process makes it almost unreadable -- it is a form of obfuscation.
+
To fit more code into the BBC Micro's limited memory, BASIC programs can be
 +
'''crunched''' (in modern terms, '''minified'''.) Crunching reduces the size of a program without changing
 +
its meaning, but in the process makes it almost unreadable -- it is a form
 +
of obfuscation.
  
Crunching can be done by a utility ROM that compacts a program in memory, or more recently by sending a text listing to be compacted on another computer and reading it back in. Automated crunchers tend to concentrate on the following areas that yield  
+
Crunching can be done by a utility program that compacts a program in memory, or
big and easy savings:
+
more recently by sending a text listing to be compacted on another computer
 +
and reading it back in. Automated crunchers tend to concentrate on the
 +
following areas that yield big and easy savings:
 
* Removing redundant characters
 
* Removing redundant characters
 
* Joining lines together (a multi-byte line header is replaced by a colon)
 
* Joining lines together (a multi-byte line header is replaced by a colon)
 
* Shortening variable names.
 
* Shortening variable names.
  
Utility ROMs are able to remove spaces that were needed during keying, but are redundant now that the code has been [[Tokeniser|tokenised]]. So they may produce a program that can't be typed back in from a listing.
+
Utilities are able to remove spaces that were needed during keying, but
 +
are redundant now that the code has been [[Tokeniser|tokenised]]. So they
 +
may produce a program that can't be typed back in from a listing.
  
As BASIC is an interpreted language, crunching delivers an overall time saving as well. The suggestions below are targeted at saving space more than time. See Chapter 32 of the B+ User Guide for tips on increasing the speed of a program.
+
As BASIC is an interpreted language, crunching delivers an overall time
 +
saving as well. The suggestions below are targeted at saving space more
 +
than time. See Chapter 32 of the B+ User Guide for tips on increasing the
 +
speed of a program.
  
 
== Suggestions ==
 
== Suggestions ==
 
+
The following does not apply to the characters within a string constant or a *command.
The following does not apply to the characters within a string constant or a *command. Correctness is not guaranteed!
+
Correctness is not guaranteed!
  
 
=== Trivial ===
 
=== Trivial ===
 
* Delete empty lines.
 
* Delete empty lines.
* Delete leading and trailing spaces. A space between the line number and the code is stored in memory, and optional. <nowiki>
+
* Delete leading and trailing spaces. A space between the line number and the code is stored in memory, and optional. <nowiki>
 
</nowiki>E.g.<pre>1030 ENDPROC</pre>could be replaced by<pre>1030ENDPROC</pre><nowiki>
 
</nowiki>E.g.<pre>1030 ENDPROC</pre>could be replaced by<pre>1030ENDPROC</pre><nowiki>
</nowiki><code>[[LISTO]] 1</code> makes <code>[[LIST]]</code> reinsert the space on the screen.
+
</nowiki><code>[[LISTO]] 1</code> makes <code>[[LIST]]</code> reinsert the space when listing.
 
* Replace multiple spaces with a single space.
 
* Replace multiple spaces with a single space.
 
* Delete leading and trailing colons.
 
* Delete leading and trailing colons.
 
* Replace multiple colons with a single colon.
 
* Replace multiple colons with a single colon.
* Delete comments.
+
* Delete REM program comments and \ assembler comments.
  
 
=== Easy ===
 
=== Easy ===
 
* Eliminate the keyword <code>[[LET]]</code>.
 
* Eliminate the keyword <code>[[LET]]</code>.
 
* Delete the keyword <code>[[THEN]]</code> except before a system variable assignment, unary operator, [[=|function return statement]], *command or implied-<code>[[GOTO]]</code> line number.
 
* Delete the keyword <code>[[THEN]]</code> except before a system variable assignment, unary operator, [[=|function return statement]], *command or implied-<code>[[GOTO]]</code> line number.
* <code>[[NEXT]]</code> statements don't have to name the control variables. One <code>NEXT</code> statement can terminate several <code>[[FOR]]</code> loops, using commas. <nowiki>
+
* <code>[[NEXT]]</code> statements don't have to name the control variables. One <code>NEXT</code> statement can terminate several <code>[[FOR]]</code> loops, using commas. <nowiki>
 
</nowiki><pre>NEXT X%:NEXT Y%</pre> can be replaced with <pre>NEXT,</pre><nowiki>
 
</nowiki><pre>NEXT X%:NEXT Y%</pre> can be replaced with <pre>NEXT,</pre><nowiki>
 
</nowiki>If the program breaks when the control variable is removed, the <code>FOR...NEXT</code> loops are mis-nested!
 
</nowiki>If the program breaks when the control variable is removed, the <code>FOR...NEXT</code> loops are mis-nested!
* Functions with a single argument, except <code>[[RND]]</code>, don't need brackets around the argument. E.g.<pre>PRINT CHR$letter%, INKEY100, STR$~code%</pre>
+
* Functions with a single argument, except <code>[[RND]]</code>, don't need brackets around the argument. E.g.<pre>PRINT CHR$letter%, INKEY100, STR$~code%</pre>
* The result of a numeric function can be discarded with <code>[[IF]]</code> rather than assigning to a dummy variable: <pre>IFGET</pre> This only saves space at the end of a line.
+
* The result of a numeric function can be discarded with <code>[[IF]]</code> rather than assigning to a dummy variable: <pre>IFGET</pre>. This only saves space at the end of a line.
  
 
=== Moderate ===
 
=== Moderate ===
* Replace [[Teletext]] <code>[[CHR$]]</code> functions with inline characters in strings, using SHIFT/CTRL and the function keys. The listing cannot be printed and re-typed after this.
+
* Replace [[Teletext]] <code>[[CHR$]]</code> functions with inline characters in strings, using SHIFT/CTRL and the function keys. The listing cannot be printed and re-typed after this.
 
* Express very large or very small real constants in scientific format: <pre>G=6.673E-11</pre>
 
* Express very large or very small real constants in scientific format: <pre>G=6.673E-11</pre>
 
* A zero before a decimal point can be eliminated.
 
* A zero before a decimal point can be eliminated.
 
* Express integer constants &ge; +1,000,000 as hexadecimal and the rest as decimal.
 
* Express integer constants &ge; +1,000,000 as hexadecimal and the rest as decimal.
* VDU sequences may be shorter with some byte constants combined into word constants using semicolons. In particular <code>0;</code> replaces <code>0,0,</code>
+
* VDU sequences may be shorter with some byte constants combined into word constants using semicolons. In particular <code>0;</code> replaces <code>0,0,</code>
  
 
=== Tedious ===
 
=== Tedious ===
* Delete spaces:-
+
* Delete spaces:
 
** After characters <code>"#$%'()*+,-./:;<=>[\]^{|}~</code>
 
** After characters <code>"#$%'()*+,-./:;<=>[\]^{|}~</code>
 
*** Preserve spaces between string constants so that they do not merge.
 
*** Preserve spaces between string constants so that they do not merge.
Line 52: Line 62:
 
*** Preserve the space in <code>END ELSE</code>, <code>ERR OR</code>, <code>GET $</code>, <code>INKEY $</code>, <code>MOD E</code>, <nowiki>
 
*** Preserve the space in <code>END ELSE</code>, <code>ERR OR</code>, <code>GET $</code>, <code>INKEY $</code>, <code>MOD E</code>, <nowiki>
 
</nowiki><code>OPT </code><keyword>, <assembler mnemonic><code> </code><keyword> and <code>TO P</code> if the listing is to be typed in.
 
</nowiki><code>OPT </code><keyword>, <assembler mnemonic><code> </code><keyword> and <code>TO P</code> if the listing is to be typed in.
 +
** Automated crunchers can remove all spaces immediately before and after tokenised keywords
 
* Replace long variable, procedure and function names with shorter ones.
 
* Replace long variable, procedure and function names with shorter ones.
 
** Preserve the variable type; don't replace integers with reals or vice versa as rounding errors may result.
 
** Preserve the variable type; don't replace integers with reals or vice versa as rounding errors may result.
Line 61: Line 72:
 
** <code>A</code>, <code>X</code>, <code>Y</code>, <code>a</code>, <code>x</code> and <code>y</code> are reserved inside assembly language segments <nowiki>
 
** <code>A</code>, <code>X</code>, <code>Y</code>, <code>a</code>, <code>x</code> and <code>y</code> are reserved inside assembly language segments <nowiki>
 
</nowiki>as they are register indicators, not variable names (6502 BASIC).
 
</nowiki>as they are register indicators, not variable names (6502 BASIC).
 +
** <code>E</code> should be reserved, or used carefully together with whitespace removal as it may form the exponent of a preceding &lt;num-const&gt;.
 
* Replace multiple lines with fewer multi-statement lines.
 
* Replace multiple lines with fewer multi-statement lines.
 
** The longest line that can be typed in is 240 characters including the line number.
 
** The longest line that can be typed in is 240 characters including the line number.
Line 68: Line 80:
 
** Keywords <code>[</code>, <code>ELSE</code>, <code>[[REPEAT]]</code> and <code>[[THEN]]</code>, and <code>DEF</code>... statements don't need a colon between them and the next statement.
 
** Keywords <code>[</code>, <code>ELSE</code>, <code>[[REPEAT]]</code> and <code>[[THEN]]</code>, and <code>DEF</code>... statements don't need a colon between them and the next statement.
 
* <code>DATA</code> strings don't need double quotes unless they contain double quotes, commas or leading spaces.
 
* <code>DATA</code> strings don't need double quotes unless they contain double quotes, commas or leading spaces.
* When crunching a text listing, keywords can be abbreviated to their minimum forms. See Chapter 48 of the B+ User Guide.
+
* When crunching a text listing, keywords can be abbreviated to their minimum forms. See Chapter 48 of the B+ User Guide.
  
 
=== Difficult ===
 
=== Difficult ===
Line 77: Line 89:
  
 
== References ==
 
== References ==
 
+
Based on ''crunch.pl'', packaged with [http://regregex.bbcmicro.net/#prog.edospat EDOSPAT].
Based on ''crunch.pl'', packaged with [http://homepages.tesco.net/~rainstorm/#prog.edospat EDOSPAT 4.40].
 
  
 
-- [[User:Beardo|beardo]] 19:12, 11 October 2007 (BST)
 
-- [[User:Beardo|beardo]] 19:12, 11 October 2007 (BST)

Latest revision as of 15:30, 15 August 2018

To fit more code into the BBC Micro's limited memory, BASIC programs can be crunched (in modern terms, minified.) Crunching reduces the size of a program without changing its meaning, but in the process makes it almost unreadable -- it is a form of obfuscation.

Crunching can be done by a utility program that compacts a program in memory, or more recently by sending a text listing to be compacted on another computer and reading it back in. Automated crunchers tend to concentrate on the following areas that yield big and easy savings:

  • Removing redundant characters
  • Joining lines together (a multi-byte line header is replaced by a colon)
  • Shortening variable names.

Utilities are able to remove spaces that were needed during keying, but are redundant now that the code has been tokenised. So they may produce a program that can't be typed back in from a listing.

As BASIC is an interpreted language, crunching delivers an overall time saving as well. The suggestions below are targeted at saving space more than time. See Chapter 32 of the B+ User Guide for tips on increasing the speed of a program.

Suggestions

The following does not apply to the characters within a string constant or a *command. Correctness is not guaranteed!

Trivial

  • Delete empty lines.
  • Delete leading and trailing spaces. A space between the line number and the code is stored in memory, and optional. E.g.
    1030 ENDPROC
    could be replaced by
    1030ENDPROC
    LISTO 1 makes LIST reinsert the space when listing.
  • Replace multiple spaces with a single space.
  • Delete leading and trailing colons.
  • Replace multiple colons with a single colon.
  • Delete REM program comments and \ assembler comments.

Easy

  • Eliminate the keyword LET.
  • Delete the keyword THEN except before a system variable assignment, unary operator, function return statement, *command or implied-GOTO line number.
  • NEXT statements don't have to name the control variables. One NEXT statement can terminate several FOR loops, using commas.
    NEXT X%:NEXT Y%
    can be replaced with
    NEXT,
    If the program breaks when the control variable is removed, the FOR...NEXT loops are mis-nested!
  • Functions with a single argument, except RND, don't need brackets around the argument. E.g.
    PRINT CHR$letter%, INKEY100, STR$~code%
  • The result of a numeric function can be discarded with IF rather than assigning to a dummy variable:
    IFGET
    . This only saves space at the end of a line.

Moderate

  • Replace Teletext CHR$ functions with inline characters in strings, using SHIFT/CTRL and the function keys. The listing cannot be printed and re-typed after this.
  • Express very large or very small real constants in scientific format:
    G=6.673E-11
  • A zero before a decimal point can be eliminated.
  • Express integer constants ≥ +1,000,000 as hexadecimal and the rest as decimal.
  • VDU sequences may be shorter with some byte constants combined into word constants using semicolons. In particular 0; replaces 0,0,

Tedious

  • Delete spaces:
    • After characters "#$%'()*+,-./:;<=>[\]^{|}~
      • Preserve spaces between string constants so that they do not merge.
    • Before characters !"#&'()*+,-/:;<=>?@[\]^{|}~
      • Again preserve spaces between string constants.
    • Between numbers and other code, but not between two numbers.
    • After keywords, but not after real variable names.
      • Preserve the space in END ELSE, ERR OR, GET $, INKEY $, MOD E, OPT <keyword>, <assembler mnemonic> <keyword> and TO P if the listing is to be typed in.
    • Automated crunchers can remove all spaces immediately before and after tokenised keywords
  • Replace long variable, procedure and function names with shorter ones.
    • Preserve the variable type; don't replace integers with reals or vice versa as rounding errors may result.
    • Use the resident integers A% to Z% for speed, but otherwise it is best to use names with a lowercase character to avoid collisions with keywords.
    • Automated crunchers should use all one-character names first, then two character names with the first character 'varying fastest', and so on.
    • @% is reserved if the program PRINTs variables.
    • A%, C%, X% and Y% are reserved if CALL or USR appear in the program (6502 BASIC).
    • O% and P% are reserved if the program contains assembly language.
    • A, X, Y, a, x and y are reserved inside assembly language segments as they are register indicators, not variable names (6502 BASIC).
    • E should be reserved, or used carefully together with whitespace removal as it may form the exponent of a preceding <num-const>.
  • Replace multiple lines with fewer multi-statement lines.
    • The longest line that can be typed in is 240 characters including the line number.
    • Remember DATA and DEF must be at the beginning of a line.
    • DATA 1 (newline) DATA 2 becomes DATA 1,2.
    • Don't add to the end of a line containing ELSE, IF, ON, REM or a *command.
    • Keywords [, ELSE, REPEAT and THEN, and DEF... statements don't need a colon between them and the next statement.
  • DATA strings don't need double quotes unless they contain double quotes, commas or leading spaces.
  • When crunching a text listing, keywords can be abbreviated to their minimum forms. See Chapter 48 of the B+ User Guide.

Difficult

  • Use operator precedence rules to find redundant brackets in expressions and remove them.
  • Use intermediate variables to reduce the number of repeated sub-expressions.
  • Refactor repeated segments of code into a function, procedure or subroutine.
  • Find other ways of storing data in the program besides DATA; see Data without DATA.

References

Based on crunch.pl, packaged with EDOSPAT.

-- beardo 19:12, 11 October 2007 (BST)