Difference between revisions of "Crunching BASIC programs"
m (mended long lines) |
(mention minification) |
||
(7 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:BASIC]] | [[Category:BASIC]] | ||
− | To fit more code into the BBC Micro's limited memory, BASIC programs can be '''crunched'''. | + | To fit more code into the BBC Micro's limited memory, BASIC programs can be |
+ | '''crunched''' (in modern terms, '''minified'''.) Crunching reduces the size of a program without changing | ||
+ | its meaning, but in the process makes it almost unreadable -- it is a form | ||
+ | of obfuscation. | ||
− | Crunching can be done by a utility | + | Crunching can be done by a utility program that compacts a program in memory, or |
− | big and easy savings: | + | more recently by sending a text listing to be compacted on another computer |
+ | and reading it back in. Automated crunchers tend to concentrate on the | ||
+ | following areas that yield big and easy savings: | ||
* Removing redundant characters | * Removing redundant characters | ||
* Joining lines together (a multi-byte line header is replaced by a colon) | * Joining lines together (a multi-byte line header is replaced by a colon) | ||
* Shortening variable names. | * Shortening variable names. | ||
− | + | Utilities are able to remove spaces that were needed during keying, but | |
+ | are redundant now that the code has been [[Tokeniser|tokenised]]. So they | ||
+ | may produce a program that can't be typed back in from a listing. | ||
− | As BASIC is an interpreted language, crunching delivers an overall time saving as well. | + | As BASIC is an interpreted language, crunching delivers an overall time |
+ | saving as well. The suggestions below are targeted at saving space more | ||
+ | than time. See Chapter 32 of the B+ User Guide for tips on increasing the | ||
+ | speed of a program. | ||
== Suggestions == | == Suggestions == | ||
− | + | The following does not apply to the characters within a string constant or a *command. | |
− | The following does not apply to the characters within a string constant or a *command. | + | Correctness is not guaranteed! |
=== Trivial === | === Trivial === | ||
* Delete empty lines. | * Delete empty lines. | ||
− | * Delete leading and trailing spaces. | + | * Delete leading and trailing spaces. A space between the line number and the code is stored in memory, and optional. <nowiki> |
</nowiki>E.g.<pre>1030 ENDPROC</pre>could be replaced by<pre>1030ENDPROC</pre><nowiki> | </nowiki>E.g.<pre>1030 ENDPROC</pre>could be replaced by<pre>1030ENDPROC</pre><nowiki> | ||
− | </nowiki><code>[[LISTO]] 1</code> makes <code>[[LIST]]</code> reinsert the space | + | </nowiki><code>[[LISTO]] 1</code> makes <code>[[LIST]]</code> reinsert the space when listing. |
* Replace multiple spaces with a single space. | * Replace multiple spaces with a single space. | ||
* Delete leading and trailing colons. | * Delete leading and trailing colons. | ||
* Replace multiple colons with a single colon. | * Replace multiple colons with a single colon. | ||
− | * Delete comments. | + | * Delete REM program comments and \ assembler comments. |
=== Easy === | === Easy === | ||
* Eliminate the keyword <code>[[LET]]</code>. | * Eliminate the keyword <code>[[LET]]</code>. | ||
* Delete the keyword <code>[[THEN]]</code> except before a system variable assignment, unary operator, [[=|function return statement]], *command or implied-<code>[[GOTO]]</code> line number. | * Delete the keyword <code>[[THEN]]</code> except before a system variable assignment, unary operator, [[=|function return statement]], *command or implied-<code>[[GOTO]]</code> line number. | ||
− | * <code>[[NEXT]]</code> statements don't have to name the control variables. | + | * <code>[[NEXT]]</code> statements don't have to name the control variables. One <code>NEXT</code> statement can terminate several <code>[[FOR]]</code> loops, using commas. <nowiki> |
</nowiki><pre>NEXT X%:NEXT Y%</pre> can be replaced with <pre>NEXT,</pre><nowiki> | </nowiki><pre>NEXT X%:NEXT Y%</pre> can be replaced with <pre>NEXT,</pre><nowiki> | ||
</nowiki>If the program breaks when the control variable is removed, the <code>FOR...NEXT</code> loops are mis-nested! | </nowiki>If the program breaks when the control variable is removed, the <code>FOR...NEXT</code> loops are mis-nested! | ||
− | * Functions with a single argument, except <code>[[RND]]</code>, don't need brackets around the argument. | + | * Functions with a single argument, except <code>[[RND]]</code>, don't need brackets around the argument. E.g.<pre>PRINT CHR$letter%, INKEY100, STR$~code%</pre> |
− | * The result of a numeric function can be discarded with <code>[[IF]]</code> rather than assigning to a dummy variable: <pre>IFGET</pre> This only saves space at the end of a line. | + | * The result of a numeric function can be discarded with <code>[[IF]]</code> rather than assigning to a dummy variable: <pre>IFGET</pre>. This only saves space at the end of a line. |
=== Moderate === | === Moderate === | ||
− | * Replace [[Teletext]] <code>[[CHR$]]</code> functions with inline characters in strings, using SHIFT/CTRL and the function keys. | + | * Replace [[Teletext]] <code>[[CHR$]]</code> functions with inline characters in strings, using SHIFT/CTRL and the function keys. The listing cannot be printed and re-typed after this. |
* Express very large or very small real constants in scientific format: <pre>G=6.673E-11</pre> | * Express very large or very small real constants in scientific format: <pre>G=6.673E-11</pre> | ||
* A zero before a decimal point can be eliminated. | * A zero before a decimal point can be eliminated. | ||
* Express integer constants ≥ +1,000,000 as hexadecimal and the rest as decimal. | * Express integer constants ≥ +1,000,000 as hexadecimal and the rest as decimal. | ||
− | * VDU sequences may be shorter with some byte constants combined into word constants using semicolons. | + | * VDU sequences may be shorter with some byte constants combined into word constants using semicolons. In particular <code>0;</code> replaces <code>0,0,</code> |
=== Tedious === | === Tedious === | ||
− | * Delete spaces: | + | * Delete spaces: |
** After characters <code>"#$%'()*+,-./:;<=>[\]^{|}~</code> | ** After characters <code>"#$%'()*+,-./:;<=>[\]^{|}~</code> | ||
*** Preserve spaces between string constants so that they do not merge. | *** Preserve spaces between string constants so that they do not merge. | ||
Line 52: | Line 62: | ||
*** Preserve the space in <code>END ELSE</code>, <code>ERR OR</code>, <code>GET $</code>, <code>INKEY $</code>, <code>MOD E</code>, <nowiki> | *** Preserve the space in <code>END ELSE</code>, <code>ERR OR</code>, <code>GET $</code>, <code>INKEY $</code>, <code>MOD E</code>, <nowiki> | ||
</nowiki><code>OPT </code><keyword>, <assembler mnemonic><code> </code><keyword> and <code>TO P</code> if the listing is to be typed in. | </nowiki><code>OPT </code><keyword>, <assembler mnemonic><code> </code><keyword> and <code>TO P</code> if the listing is to be typed in. | ||
+ | ** Automated crunchers can remove all spaces immediately before and after tokenised keywords | ||
* Replace long variable, procedure and function names with shorter ones. | * Replace long variable, procedure and function names with shorter ones. | ||
** Preserve the variable type; don't replace integers with reals or vice versa as rounding errors may result. | ** Preserve the variable type; don't replace integers with reals or vice versa as rounding errors may result. | ||
Line 61: | Line 72: | ||
** <code>A</code>, <code>X</code>, <code>Y</code>, <code>a</code>, <code>x</code> and <code>y</code> are reserved inside assembly language segments <nowiki> | ** <code>A</code>, <code>X</code>, <code>Y</code>, <code>a</code>, <code>x</code> and <code>y</code> are reserved inside assembly language segments <nowiki> | ||
</nowiki>as they are register indicators, not variable names (6502 BASIC). | </nowiki>as they are register indicators, not variable names (6502 BASIC). | ||
+ | ** <code>E</code> should be reserved, or used carefully together with whitespace removal as it may form the exponent of a preceding <num-const>. | ||
* Replace multiple lines with fewer multi-statement lines. | * Replace multiple lines with fewer multi-statement lines. | ||
** The longest line that can be typed in is 240 characters including the line number. | ** The longest line that can be typed in is 240 characters including the line number. | ||
Line 68: | Line 80: | ||
** Keywords <code>[</code>, <code>ELSE</code>, <code>[[REPEAT]]</code> and <code>[[THEN]]</code>, and <code>DEF</code>... statements don't need a colon between them and the next statement. | ** Keywords <code>[</code>, <code>ELSE</code>, <code>[[REPEAT]]</code> and <code>[[THEN]]</code>, and <code>DEF</code>... statements don't need a colon between them and the next statement. | ||
* <code>DATA</code> strings don't need double quotes unless they contain double quotes, commas or leading spaces. | * <code>DATA</code> strings don't need double quotes unless they contain double quotes, commas or leading spaces. | ||
− | * When crunching a text listing, keywords can be abbreviated to their minimum forms. | + | * When crunching a text listing, keywords can be abbreviated to their minimum forms. See Chapter 48 of the B+ User Guide. |
=== Difficult === | === Difficult === | ||
Line 77: | Line 89: | ||
== References == | == References == | ||
− | + | Based on ''crunch.pl'', packaged with [http://regregex.bbcmicro.net/#prog.edospat EDOSPAT]. | |
− | Based on ''crunch.pl'', packaged with [http:// | ||
-- [[User:Beardo|beardo]] 19:12, 11 October 2007 (BST) | -- [[User:Beardo|beardo]] 19:12, 11 October 2007 (BST) |
Latest revision as of 15:30, 15 August 2018
To fit more code into the BBC Micro's limited memory, BASIC programs can be crunched (in modern terms, minified.) Crunching reduces the size of a program without changing its meaning, but in the process makes it almost unreadable -- it is a form of obfuscation.
Crunching can be done by a utility program that compacts a program in memory, or more recently by sending a text listing to be compacted on another computer and reading it back in. Automated crunchers tend to concentrate on the following areas that yield big and easy savings:
- Removing redundant characters
- Joining lines together (a multi-byte line header is replaced by a colon)
- Shortening variable names.
Utilities are able to remove spaces that were needed during keying, but are redundant now that the code has been tokenised. So they may produce a program that can't be typed back in from a listing.
As BASIC is an interpreted language, crunching delivers an overall time saving as well. The suggestions below are targeted at saving space more than time. See Chapter 32 of the B+ User Guide for tips on increasing the speed of a program.
Suggestions
The following does not apply to the characters within a string constant or a *command. Correctness is not guaranteed!
Trivial
- Delete empty lines.
- Delete leading and trailing spaces. A space between the line number and the code is stored in memory, and optional.
E.g.
1030 ENDPROC
could be replaced by1030ENDPROC
LISTO 1
makesLIST
reinsert the space when listing. - Replace multiple spaces with a single space.
- Delete leading and trailing colons.
- Replace multiple colons with a single colon.
- Delete REM program comments and \ assembler comments.
Easy
- Eliminate the keyword
LET
. - Delete the keyword
THEN
except before a system variable assignment, unary operator, function return statement, *command or implied-GOTO
line number. -
NEXT
statements don't have to name the control variables. OneNEXT
statement can terminate severalFOR
loops, using commas.NEXT X%:NEXT Y%
can be replaced withNEXT,
If the program breaks when the control variable is removed, theFOR...NEXT
loops are mis-nested! - Functions with a single argument, except
RND
, don't need brackets around the argument. E.g.PRINT CHR$letter%, INKEY100, STR$~code%
- The result of a numeric function can be discarded with
IF
rather than assigning to a dummy variable:IFGET
. This only saves space at the end of a line.
Moderate
- Replace Teletext
CHR$
functions with inline characters in strings, using SHIFT/CTRL and the function keys. The listing cannot be printed and re-typed after this. - Express very large or very small real constants in scientific format:
G=6.673E-11
- A zero before a decimal point can be eliminated.
- Express integer constants ≥ +1,000,000 as hexadecimal and the rest as decimal.
- VDU sequences may be shorter with some byte constants combined into word constants using semicolons. In particular
0;
replaces0,0,
Tedious
- Delete spaces:
- After characters
"#$%'()*+,-./:;<=>[\]^{|}~
- Preserve spaces between string constants so that they do not merge.
- Before characters
!"#&'()*+,-/:;<=>?@[\]^{|}~
- Again preserve spaces between string constants.
- Between numbers and other code, but not between two numbers.
- After keywords, but not after real variable names.
- Preserve the space in
END ELSE
,ERR OR
,GET $
,INKEY $
,MOD E
,OPT
<keyword>, <assembler mnemonic>TO P
if the listing is to be typed in.
- Preserve the space in
- Automated crunchers can remove all spaces immediately before and after tokenised keywords
- After characters
- Replace long variable, procedure and function names with shorter ones.
- Preserve the variable type; don't replace integers with reals or vice versa as rounding errors may result.
- Use the resident integers
A%
toZ%
for speed, but otherwise it is best to use names with a lowercase character to avoid collisions with keywords. - Automated crunchers should use all one-character names first, then two character names with the first character 'varying fastest', and so on.
-
@%
is reserved if the programPRINT
s variables. -
A%
,C%
,X%
andY%
are reserved ifCALL
orUSR
appear in the program (6502 BASIC). -
O%
andP%
are reserved if the program contains assembly language. -
A
,X
,Y
,a
,x
andy
are reserved inside assembly language segments as they are register indicators, not variable names (6502 BASIC). -
E
should be reserved, or used carefully together with whitespace removal as it may form the exponent of a preceding <num-const>.
- Replace multiple lines with fewer multi-statement lines.
- The longest line that can be typed in is 240 characters including the line number.
- Remember
DATA
andDEF
must be at the beginning of a line. -
DATA 1
(newline)DATA 2
becomesDATA 1,2
. - Don't add to the end of a line containing
ELSE
,IF
,ON
,REM
or a *command. - Keywords
[
,ELSE
,REPEAT
andTHEN
, andDEF
... statements don't need a colon between them and the next statement.
-
DATA
strings don't need double quotes unless they contain double quotes, commas or leading spaces. - When crunching a text listing, keywords can be abbreviated to their minimum forms. See Chapter 48 of the B+ User Guide.
Difficult
- Use operator precedence rules to find redundant brackets in expressions and remove them.
- Use intermediate variables to reduce the number of repeated sub-expressions.
- Refactor repeated segments of code into a function, procedure or subroutine.
- Find other ways of storing data in the program besides
DATA
; see Data without DATA.
References
Based on crunch.pl, packaged with EDOSPAT.
-- beardo 19:12, 11 October 2007 (BST)