Acorn cassette format

From BeebWiki
Jump to: navigation, search

The Acorn cassette format is a proprietary format developed by Acorn Computer for storing data on audio cassette tape. There is a 300 bits-per-second CUTS code, supported by the System series, Models A, B, B+ and Master, and a 1200 bps code similar to Basicode, which is found on these machines and also the Electron, Atom and most commercial cassettes. Aspects of the format are shared with the ROM Filing System.

Thomas Harte has developed the UEF file format to hold accurate copies of the Acorn, CUTS and Basicode signals on a cassette. Most popular emulators can open UEF files and serve the contents to the virtual machine as though it was playing the original cassette; but thanks to the structure of UEF, the emulators can also employ a 'speed hack' to load standard files many times faster than originally possible.

The description below is best illustrated by an example UEF file.

Physical layer

Hardware

The audio signal is presented to the Acorn computer on a 7 pin DIN connection. A standard 5 pin 180° DIN plug will fit the computer's socket, but the motor control provided by the computer will not be available.

The signal levels are nominally compatible with the LINE IN/OUT sockets on most cassette decks. Where there are no such sockets, the input hardware will tolerate the PHONES output of most portable cassette players, provided that the volume control is kept low — not more than 10-20% of full volume. Many inexpensive decks can make strong, albeit distorted, recordings through the MIC socket.

The physical layer for ROMs and PHROMs consists of the JEDEC pinout and interface, and the interface to the TMS 5220 speech synthesiser respectively.

Signal

The signal may be in one of three states; zero, one or no carrier. Breaks in the carrier are detected and used to reset the cassette loading firmware.

Zero is represented by a sinusoidal wave at 1200 Hz. The exact frequency is nominally 16 000 000 / 13 312, or 1201.9 Hz, but tape decks vary in speed so small differences are tolerated. One cycle (at 1200 baud) or four cycles (at 300 baud) represent one zero bit.

Binary one is represented by a sinusoidal wave at 2400 Hz. The nominal frequency is closer to 2403.8 Hz. Two cycles (at 1200 baud) or eight cycles (at 300 baud) represent a one bit. An odd number of 2400 Hz cycles can and does occur.

To allow the tape deck circuitry to settle, each data stream is preceded by 5.1 seconds of 2400 Hz leader tone. This is reduced to 1.1 seconds if the recording computer has paused in the middle of a file, or 0.9 seconds between data blocks recorded in one go.

At the end of the stream is a 5.3 second, 2400 Hz trailer tone. This is reduced to 0.2 seconds when pausing in the middle of a file (giving at least 1.3 seconds' delay between data blocks.) The timings are derived from VSYNC interrupts so they vary between recordings.

Data layer

Data is recorded asynchronously on the tape in 8-bit bytes. Each byte consists of one start bit (a zero), 8 data bits lowest first, and one stop bit (a one). On the Atom there are one-and-a-half stop bits (i.e. three 2400 Hz cycles.) By contrast Basicode uses two stop bits.

To alleviate problems with previous versions, BBC Micro MOS 1.20 and above inject a 'dummy byte' at the start of each leader tone, as well as when the RECORD then RETURN prompt is answered. The carrier usually starts with four 2400 Hz cycles, then a byte of value &AA as described above, and continues with 1.1 or 5.1 seconds of 2400 Hz tone. The Electron does not insert dummy bytes.

If a carriage return has been inserted into the keyboard buffer, for instance with

OSCLI "FX 138,0,13":*SAVE OBJcode 3000 +4F00

then the prompt dummy byte will cut off the leader dummy byte creating an irregular pattern. By selecting *OPT 1,0 the prompt is not printed and only the leader dummy byte is recorded.

For paged ROMs, data transfer is initialised by service call &D and bytes are read by service call &E (q.v.) Bytes are fetched from PHROMs with the TMS 5220's READ command.

Device address

This layer also comprises a specification of how to fill a four byte field to address locations within the media. This is used to interpret the 'address of next file' in the data block header, so that the filing system can quickly search for files by name.

On cassette there is no random access so the field is reserved, and is conventionally filled with four zero bytes.

For paged ROMs this is the address of the start of the next file within the paged ROM space, so the field contains a little-endian 32 bit address, in the range &00008000 to &0000BFFF. The upper 16 bits are ignored.

For PHROMs the field also contains a little-endian 32 bit address within the present PHROM, in the range &00000000 to &00003FFF. The bytes of the address field are in READ BYTE order so that the MOS itself may load the address into the speech processor to skip to the next file. (The only RFS-related address in READ AND BRANCH format is the offset to the start of the data stream, stored at fixed offsets &3E..3F in the PHROM header; even then, MOS 1.20 emulates the read and branch in firmware instead of sending this command to the speech processor.)

Block layer

At this level the filing system deals with complete bytes.

System Series

The System machines store the whole file in a single block, with no checksums. A System series block consists of the following sequence of bytes:

  • Address of first byte, two bytes, high byte first.
  • 1 + address of last byte, two bytes, high byte first.
  • Data, lowest address first.

Atom

The Atom supports the System series format (as a 'nameless file') and its own format. The data consists of files separated by inter-file gaps and appropriate leaders but no trailers, and further subdivided into blocks separated by inter-block gaps and leaders. Atom files are in the 300 baud rate format and the 8N1 data format. All bytes are followed by an extra short wave after the stop bit except the checksum. An Atom block consists of a header and data separated by carrier tone. The following sequence of bytes describes the header:

  • Four synchronisation bytes (&2A).
  • File name (one to thirteen characters).
  • One end of file name marker byte (&0D).
  • Block flag, one byte.
  • Block number, two bytes, high byte first.
  • Data block length − 1, one byte.
  • Execution address, two bytes, high byte first.
  • Load address, two bytes, high byte first.

The block flag is set as follows:

  • Bit 7 is set if this block is not the last block of a file.
  • Bit 6 is set if the block contains data. If clear then the 'data block length' field is invalid.
  • Bit 5 is set if this block is not the first block of a file.
  • Bits 4 to 0 are undefined. Normally their contents are:
    • in the first block, bits 15 to 11 of the end address (=1 + the last address saved in the file);
    • in subsequent blocks, bits 6 to 2 of the previous block flag.

Next is carrier tone with a varying length around 1200 waves.

The data block follows with a trailing checksum.

  • Data, 1 to 256 bytes.
  • Checksum, one byte.

The checksum is a simple sum (modulo 256) of all bytes from the start of the header block to the end of the data block.

An inter-block gap follows the checksum of about two seconds.

BBC Micro, Electron and Master

The Cassette Filing System shares its block format and most of the handler code with the ROM Filing System.

On cassettes the data consists of files separated by carrier breaks and the appropriate leaders and trailers, and further subdivided into blocks separated by inter-block gaps. In paged ROMs and PHROMS the data consists of files logically divided into blocks and chained together without gaps.

A block consists of the following sequence of bytes, as listed in the B+ User Guide, page 369:

  • One synchronisation byte (&2A).
  • The block header:
    • File name (one to ten characters).
    • One end of file name marker byte (&00).
    • Load address of file, four bytes, low byte first.
    • Execution address of file, four bytes, low byte first.
    • Block number, two bytes, low byte first.
    • Data block length, two bytes, low byte first.
    • Block flag, one byte.
    • Address of next file, four bytes. See Data layer above.
  • CRC on header starting from the file name, two bytes.
  • Data, number of bytes as stated in the data block length field.
  • CRC on data, two bytes. Omitted if data block length = 0.

The file name may contain any character(s) including spaces, but not NUL (&00). If it contains non-printable characters then the file can only be loaded with, for instance, CHAIN "". This is not sensible for ROMs and PHROMs so they are restricted to printable characters.

The blocks of a file must be numbered sequentially in ascending order starting from &0000.

The data block can be any length, but by making it 256 bytes or less it can be loaded safely into the system buffers and the file read as a sequential file. For compatibility all but the last block must be 256 bytes and the last block must be 256 bytes or less.

The block length can be zero. This happens when saving an empty file, or when CLOSE#ing a file just after a full block has been written.

The block flag is constructed as follows:

  • Bit 7 is set if this block is the last block of a file.
  • Bit 6 is set if the block is empty.
  • Bit 0 is set if the file is Locked, meaning that it can only be loaded with */ or *RUN.

The CRC is a cyclic redundancy check, a confirmation value generated from the preceding data so that corruption of the data can be detected with a high likelihood of success. Both the header CRC and the data CRC are stored high byte first. The CRC algorithm is defined according to the following Rocksoft™ Model record:

   Name   : "XMODEM"
   Width  : 16
   Poly   : 1021
   Init   : 0000
   RefIn  : False
   RefOut : False
   XorOut : 0000
   Check  : 31C3

There is also a description of the algorithm in the B+ User guide, and a Web page containing example machine code.

For ROMs and PHROMs, there are two more block types defined. They are identified by their first byte. The first of these is a continuation block:

  • One continuation byte (&23).
  • Data, number of bytes as stated in the most recent data block length field.
  • CRC on data, two bytes. Omitted if data block length = 0.

Continuation blocks are used for all but the first and last blocks of a file, to save the space that would be taken up by a header. The continuation block implies that the entire header would be the same, except that the block number is one greater than the previous block.

The second is an 'end of ROM' block, consisting of one byte:

  • One end-of-ROM marker (&2B).

The end-of-ROM marker causes the filing system to move on to the next device. The final file's 'next file address' field points to this marker. It is required in all ROMs and PHROMs.

Phase shift

Because of the properties of magnetic induction, recording the signal to cassette tape and playing it back differentiates it - in the case of data signals this is seen as a phase shift. The computer may record a signal changing from high to low frequency in the trough of a wave, but on playback the change may appear as the wave crosses zero volts. On a commercial cassette that has been copied from a master tape, the frequencies change at the peak of the wave — and the waveform has become triangular.

Phase shift matters because computers are notoriously sensitive to it, although it cannot be detected by ear. The BBC Micro and Master are less prone thanks to the serial ULA; the Atom and Electron, feeding the signal straight into a digital port, seem 'deaf' to some cassettes but not others, and yet tape-to-tape copies of the bad cassettes often work fine.[1]

Phase shift is measured in degrees. A device usually adds 90° (differentiation) or 180° (polarity reversal) at a time. The cassette, the tape deck, the amplifier and even the PC sound card can all contribute varying amounts of phase shift, so anyone wanting to find the shift factor of their system must calibrate against a known and widely-published title. At the moment there is no agreed title, so those digitising tapes just note their equipment model numbers, and report the phase as they see it. (See documentation for MakeUEF).

Copy protection

Unlike most other computers, the signal and data layers of the Acorn protocol are fixed in hardware. The serial ULA sets the modulation frequencies, and the ACIA restricts the format of each bit and byte. So there are no Speedlock-style schemes to offer copy protection or faster loading times. (Only the System series and the Atom were capable of them, and no schemes are known to exist.) The only data-layer hack was to add parity or an extra stop bit (Pace's Fortress), and the data stream could still be recovered with fairly simple code.

Most of the effort went into changing the block layer. Acorn provided a trivial content protection scheme with the MOS; the Locked bit prevented users loading the file into memory and then having control of the machine to peruse it. It was easily defeated by getting an interrupt routine to clear the Locked bit before it was checked.

Later games deliberately recorded non-standard block headers to stop the CFS loading the main program; a normal, Locked loader file carried its own interrupt routine, to massage the CFS's state as the main file's block headers were read in. Examples include 3D Grand Prix, Arkanoid, Eagle's Wing and Stairway To Hell.

Pace's Fortress took control of the serial subsystem and loaded the game data itself, from one long stream that changed parity halfway through. This title, along with Icon's Caveman Capers, fell foul of an extension to the Serial Processor's interface, making the games compatible only with the Ferranti ULA. Skirmish by Go-Dax and many Ultimate games featured stream loading; other titles had non-standard streams divided into blocks by very short lengths of carrier tone.

See also the list of protected titles.

See also

CRC-16 calculation code.

References

  1. Stairway to Hell Forums post by retro_junkie, 26th April 2007.

-- beardo 17:20, 23 January 2009 (UTC)