Crunching BASIC programs

From BeebWiki
Jump to: navigation, search

To fit more code into the BBC Micro's limited memory, BASIC programs can be crunched (in modern terms, minified.) Crunching reduces the size of a program without changing its meaning, but in the process makes it almost unreadable -- it is a form of obfuscation.

Crunching can be done by a utility program that compacts a program in memory, or more recently by sending a text listing to be compacted on another computer and reading it back in. Automated crunchers tend to concentrate on the following areas that yield big and easy savings:

  • Removing redundant characters
  • Joining lines together (a multi-byte line header is replaced by a colon)
  • Shortening variable names.

Utilities are able to remove spaces that were needed during keying, but are redundant now that the code has been tokenised. So they may produce a program that can't be typed back in from a listing.

As BASIC is an interpreted language, crunching delivers an overall time saving as well. The suggestions below are targeted at saving space more than time. See Chapter 32 of the B+ User Guide for tips on increasing the speed of a program.

Suggestions

The following does not apply to the characters within a string constant or a *command. Correctness is not guaranteed!

Trivial

  • Delete empty lines.
  • Delete leading and trailing spaces. A space between the line number and the code is stored in memory, and optional. E.g.
    1030 ENDPROC
    could be replaced by
    1030ENDPROC
    LISTO 1 makes LIST reinsert the space when listing.
  • Replace multiple spaces with a single space.
  • Delete leading and trailing colons.
  • Replace multiple colons with a single colon.
  • Delete REM program comments and \ assembler comments.

Easy

  • Eliminate the keyword LET.
  • Delete the keyword THEN except before a system variable assignment, unary operator, function return statement, *command or implied-GOTO line number.
  • NEXT statements don't have to name the control variables. One NEXT statement can terminate several FOR loops, using commas.
    NEXT X%:NEXT Y%
    can be replaced with
    NEXT,
    If the program breaks when the control variable is removed, the FOR...NEXT loops are mis-nested!
  • Functions with a single argument, except RND, don't need brackets around the argument. E.g.
    PRINT CHR$letter%, INKEY100, STR$~code%
  • The result of a numeric function can be discarded with IF rather than assigning to a dummy variable:
    IFGET
    . This only saves space at the end of a line.

Moderate

  • Replace Teletext CHR$ functions with inline characters in strings, using SHIFT/CTRL and the function keys. The listing cannot be printed and re-typed after this.
  • Express very large or very small real constants in scientific format:
    G=6.673E-11
  • A zero before a decimal point can be eliminated.
  • Express integer constants ≥ +1,000,000 as hexadecimal and the rest as decimal.
  • VDU sequences may be shorter with some byte constants combined into word constants using semicolons. In particular 0; replaces 0,0,

Tedious

  • Delete spaces:
    • After characters "#$%'()*+,-./:;<=>[\]^{|}~
      • Preserve spaces between string constants so that they do not merge.
    • Before characters !"#&'()*+,-/:;<=>?@[\]^{|}~
      • Again preserve spaces between string constants.
    • Between numbers and other code, but not between two numbers.
    • After keywords, but not after real variable names.
      • Preserve the space in END ELSE, ERR OR, GET $, INKEY $, MOD E, OPT <keyword>, <assembler mnemonic> <keyword> and TO P if the listing is to be typed in.
    • Automated crunchers can remove all spaces immediately before and after tokenised keywords
  • Replace long variable, procedure and function names with shorter ones.
    • Preserve the variable type; don't replace integers with reals or vice versa as rounding errors may result.
    • Use the resident integers A% to Z% for speed, but otherwise it is best to use names with a lowercase character to avoid collisions with keywords.
    • Automated crunchers should use all one-character names first, then two character names with the first character 'varying fastest', and so on.
    • @% is reserved if the program PRINTs variables.
    • A%, C%, X% and Y% are reserved if CALL or USR appear in the program (6502 BASIC).
    • O% and P% are reserved if the program contains assembly language.
    • A, X, Y, a, x and y are reserved inside assembly language segments as they are register indicators, not variable names (6502 BASIC).
    • E should be reserved, or used carefully together with whitespace removal as it may form the exponent of a preceding <num-const>.
  • Replace multiple lines with fewer multi-statement lines.
    • The longest line that can be typed in is 240 characters including the line number.
    • Remember DATA and DEF must be at the beginning of a line.
    • DATA 1 (newline) DATA 2 becomes DATA 1,2.
    • Don't add to the end of a line containing ELSE, IF, ON, REM or a *command.
    • Keywords [, ELSE, REPEAT and THEN, and DEF... statements don't need a colon between them and the next statement.
  • DATA strings don't need double quotes unless they contain double quotes, commas or leading spaces.
  • When crunching a text listing, keywords can be abbreviated to their minimum forms. See Chapter 48 of the B+ User Guide.

Difficult

  • Use operator precedence rules to find redundant brackets in expressions and remove them.
  • Use intermediate variables to reduce the number of repeated sub-expressions.
  • Refactor repeated segments of code into a function, procedure or subroutine.
  • Find other ways of storing data in the program besides DATA; see Data without DATA.

References

Based on crunch.pl, packaged with EDOSPAT.

-- beardo 19:12, 11 October 2007 (BST)