Difference between revisions of "Crunching BASIC programs"

From BeebWiki
Jump to: navigation, search
(Tedious: beware E)
(mention minification)
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[Category:BASIC]]
 
[[Category:BASIC]]
 
To fit more code into the BBC Micro's limited memory, BASIC programs can be
 
To fit more code into the BBC Micro's limited memory, BASIC programs can be
'''crunched'''. Crunching reduces the size of a program without changing
+
'''crunched''' (in modern terms, '''minified'''.) Crunching reduces the size of a program without changing
 
its meaning, but in the process makes it almost unreadable -- it is a form
 
its meaning, but in the process makes it almost unreadable -- it is a form
 
of obfuscation.
 
of obfuscation.
Line 23: Line 23:
  
 
== Suggestions ==
 
== Suggestions ==
The following does not apply to the characters within a string constant or a
+
The following does not apply to the characters within a string constant or a *command.
*command. Correctness is not guaranteed!
+
Correctness is not guaranteed!
  
 
=== Trivial ===
 
=== Trivial ===
Line 34: Line 34:
 
* Delete leading and trailing colons.
 
* Delete leading and trailing colons.
 
* Replace multiple colons with a single colon.
 
* Replace multiple colons with a single colon.
* Delete comments.
+
* Delete REM program comments and \ assembler comments.
  
 
=== Easy ===
 
=== Easy ===
Line 89: Line 89:
  
 
== References ==
 
== References ==
Based on ''crunch.pl'', packaged with [http://homepages.tesco.net/~rainstorm/#prog.edospat EDOSPAT 4.40].
+
Based on ''crunch.pl'', packaged with [http://regregex.bbcmicro.net/#prog.edospat EDOSPAT].
  
 
-- [[User:Beardo|beardo]] 19:12, 11 October 2007 (BST)
 
-- [[User:Beardo|beardo]] 19:12, 11 October 2007 (BST)

Latest revision as of 15:30, 15 August 2018

To fit more code into the BBC Micro's limited memory, BASIC programs can be crunched (in modern terms, minified.) Crunching reduces the size of a program without changing its meaning, but in the process makes it almost unreadable -- it is a form of obfuscation.

Crunching can be done by a utility program that compacts a program in memory, or more recently by sending a text listing to be compacted on another computer and reading it back in. Automated crunchers tend to concentrate on the following areas that yield big and easy savings:

  • Removing redundant characters
  • Joining lines together (a multi-byte line header is replaced by a colon)
  • Shortening variable names.

Utilities are able to remove spaces that were needed during keying, but are redundant now that the code has been tokenised. So they may produce a program that can't be typed back in from a listing.

As BASIC is an interpreted language, crunching delivers an overall time saving as well. The suggestions below are targeted at saving space more than time. See Chapter 32 of the B+ User Guide for tips on increasing the speed of a program.

Suggestions

The following does not apply to the characters within a string constant or a *command. Correctness is not guaranteed!

Trivial

  • Delete empty lines.
  • Delete leading and trailing spaces. A space between the line number and the code is stored in memory, and optional. E.g.
    1030 ENDPROC
    could be replaced by
    1030ENDPROC
    LISTO 1 makes LIST reinsert the space when listing.
  • Replace multiple spaces with a single space.
  • Delete leading and trailing colons.
  • Replace multiple colons with a single colon.
  • Delete REM program comments and \ assembler comments.

Easy

  • Eliminate the keyword LET.
  • Delete the keyword THEN except before a system variable assignment, unary operator, function return statement, *command or implied-GOTO line number.
  • NEXT statements don't have to name the control variables. One NEXT statement can terminate several FOR loops, using commas.
    NEXT X%:NEXT Y%
    can be replaced with
    NEXT,
    If the program breaks when the control variable is removed, the FOR...NEXT loops are mis-nested!
  • Functions with a single argument, except RND, don't need brackets around the argument. E.g.
    PRINT CHR$letter%, INKEY100, STR$~code%
  • The result of a numeric function can be discarded with IF rather than assigning to a dummy variable:
    IFGET
    . This only saves space at the end of a line.

Moderate

  • Replace Teletext CHR$ functions with inline characters in strings, using SHIFT/CTRL and the function keys. The listing cannot be printed and re-typed after this.
  • Express very large or very small real constants in scientific format:
    G=6.673E-11
  • A zero before a decimal point can be eliminated.
  • Express integer constants ≥ +1,000,000 as hexadecimal and the rest as decimal.
  • VDU sequences may be shorter with some byte constants combined into word constants using semicolons. In particular 0; replaces 0,0,

Tedious

  • Delete spaces:
    • After characters "#$%'()*+,-./:;<=>[\]^{|}~
      • Preserve spaces between string constants so that they do not merge.
    • Before characters !"#&'()*+,-/:;<=>?@[\]^{|}~
      • Again preserve spaces between string constants.
    • Between numbers and other code, but not between two numbers.
    • After keywords, but not after real variable names.
      • Preserve the space in END ELSE, ERR OR, GET $, INKEY $, MOD E, OPT <keyword>, <assembler mnemonic> <keyword> and TO P if the listing is to be typed in.
    • Automated crunchers can remove all spaces immediately before and after tokenised keywords
  • Replace long variable, procedure and function names with shorter ones.
    • Preserve the variable type; don't replace integers with reals or vice versa as rounding errors may result.
    • Use the resident integers A% to Z% for speed, but otherwise it is best to use names with a lowercase character to avoid collisions with keywords.
    • Automated crunchers should use all one-character names first, then two character names with the first character 'varying fastest', and so on.
    • @% is reserved if the program PRINTs variables.
    • A%, C%, X% and Y% are reserved if CALL or USR appear in the program (6502 BASIC).
    • O% and P% are reserved if the program contains assembly language.
    • A, X, Y, a, x and y are reserved inside assembly language segments as they are register indicators, not variable names (6502 BASIC).
    • E should be reserved, or used carefully together with whitespace removal as it may form the exponent of a preceding <num-const>.
  • Replace multiple lines with fewer multi-statement lines.
    • The longest line that can be typed in is 240 characters including the line number.
    • Remember DATA and DEF must be at the beginning of a line.
    • DATA 1 (newline) DATA 2 becomes DATA 1,2.
    • Don't add to the end of a line containing ELSE, IF, ON, REM or a *command.
    • Keywords [, ELSE, REPEAT and THEN, and DEF... statements don't need a colon between them and the next statement.
  • DATA strings don't need double quotes unless they contain double quotes, commas or leading spaces.
  • When crunching a text listing, keywords can be abbreviated to their minimum forms. See Chapter 48 of the B+ User Guide.

Difficult

  • Use operator precedence rules to find redundant brackets in expressions and remove them.
  • Use intermediate variables to reduce the number of repeated sub-expressions.
  • Refactor repeated segments of code into a function, procedure or subroutine.
  • Find other ways of storing data in the program besides DATA; see Data without DATA.

References

Based on crunch.pl, packaged with EDOSPAT.

-- beardo 19:12, 11 October 2007 (BST)