Difference between revisions of "INF file format"

From BeebWiki
Jump to: navigation, search
m (Corrected description and title.)
(Comma syntax: appears in SparkFS.)
Line 163: Line 163:
 
When transferring a file to a system that supports long file names but not
 
When transferring a file to a system that supports long file names but not
 
Acorn attributes, the attributes may be appended to the filename as a
 
Acorn attributes, the attributes may be appended to the filename as a
punctuated string of hex digits.  The origin of this convention is not
+
punctuated string of hex digits.  This convention appears in SparkFS,
known.
+
written by David Pilling in 1992, but its origin is uncertain.
  
 
Acorn MOS files are renamed as follows:
 
Acorn MOS files are renamed as follows:

Revision as of 00:04, 21 October 2022

The INF file convention is a technique used in the Acorn computing community to store and exchange the metadata attached to files in Acorn MOS and RISC OS.

An INF file is a short, formatted text file accompanying a data file (as a sidecar file) which contains attributes that Acorn MOS and RISC OS need to interpret the data file properly, but which are not part of the file's data. Use of INF files allows the attributes to be saved and copied by applications and operating systems that only handle plain files.

The INF convention is informal, and arose in the mid-1990s as Acorn computer users began to exchange files on the internet, largely with and through PCs that did not recognise Acorn metadata. Several variants of the format have emerged, all compatible with each other in their most basic form. Each variant encodes the three most significant attributes of an Acorn file (the native filename, the load address and the execution address), plus its particular choice of other attributes in its own format.

INF files are used exclusively with files for the BBC Micro, Acorn Electron and Acorn Master series computers. Though they can potentially apply to RISC OS files as well they are cumbersome for the task, and RISC OS applications adopt other techniques to exchange metadata over cosmopolitan channels.

Acorn operating systems embed the metadata natively within the filing system, and do not support metadata in INF files. All handling is done by user applications on import to or export from the Acorn system.

Basic format

At its simplest, the INF convention consists of a file containing one line of text, in the same directory as a binary file; the name of the text file equals the name of the binary file, plus .inf. The directory may be a directory on a disc, or one in an archive file such as Zip. Both filenames obey the conventions of the host file system.

The line of text contains three fields, separated by a space or spaces:

  • a string giving the name of the file in Acorn DFS or ADFS format
  • one to eight hexadecimal digits giving the load address where the start of the binary data should be loaded
  • one to eight hexadecimal digits giving the execution address which the CPU should call after loading, if commanded.

A trailing newline is optional.

Here is an example:

   $.DCONV 1900 801F

This means that a file named $.DCONV in Acorn DFS has a load address of &1900 and an execution address of &801F. (This file happens not to be executable but can be recognised as a BASIC I program.)

Considerations

Acorn DFS and ADFS interpret directories differently. The directory specifier (up to and including .) is typical, but optional.

The above description appears to allow ADFS pathnames to be specified, but this is not expected in practice and the ADFS directory tree should instead be replicated in the host file system. The INF file should then omit the directory specifier, just in case.

Load and execution addresses are 32-bit quantities, however Acorn DFS truncates them to 18 bits for storage on disc. On retrieval it expands the addresses to 32 bits in the OSFILE control block, but its *INFO command prints only 24 bits – six hex digits. Some INF file generators copy the 24-bit style; all readers tolerate 32-bit input.

When reading a DFS disc, the 18-bit address should be mapped to &0000..&2FFFF or &FFFF0000..&FFFFFFFF, according to bits 17 and 16.

Variants

bbcim

In 1995 W.H.Scholten introduced bbcim, a tool to add or extract files in a DFS disc image. In its manual page he documented a set of extension fields which bbcim implements. The latest version of the archive format is 0.83, dated to 1997. One example is given as:

   $.ELITE FF0E00 FF8023 Locked CRC=XXXX NEXT  ELITEdata

and the general format is:

   {TAPE <tfs_filename> | <dfs_filename>} <load address> <exec address> [<file length>] ['Locked'] CRC=<xxxx> [NEXT <next filename>]

The extension fields operate as follows:

  • TAPE, if present, indicates that the file originates from a cassette, and its name should be used verbatim (up to 10 characters.)
  • Locked (or L) indicate that the file is locked against alteration and deletion (on cassette: against loading without execution.) Omitted if the file is unlocked.
  • CRC=XXXX (with XXXX replaced by four hex digits) records the cyclic redundancy check value of the file data, as used on Acorn cassettes and in XMODEM. Optional.
  • NEXT ELITEdata specifies that the host file named ELITEdata (and its accompanying ELITEdata.inf) contains the file following this file on the cassette tape. After $.ELITE is loaded, a CHAIN "", LOAD "" or *RUN "" command should find and load ELITEdata (under its BBC name listed in ELITEdata.inf.) A host filename is given as filenames may be duplicated on a tape. Optional with cassettes and unnecessary with discs.

If the file is to be read in as if from cassette, a $. at the start of <tfs_filename> is stripped unless the TAPE keyword is present. Files named TAPE should be listed as $.TAPE or TAPE TAPE.

A hexadecimal field giving the length of the file in bytes may follow the execution address, but bbcim ignores it.

A list of characters is recommended for exclusion from host filenames: \ / : * ? " < > | . ` '

Xfer

Mark de Weger introduced Xfer sometime before 1999. Angus Duggan ported the transfer program to C and distributed it with a format specification which he had apparently modified from de Weger's original. The latest version is 1.1, dated to 4 November 1999.

This specification indirectly acknowledges the bbcim archive format and then goes on to define a strict subset of allowed INF files, for processing convenience. It first depicts the 'standard' format as

   [<filename>] <load address> <exec address> [<length>] [<lock>] [<crc>] [<next>] [*<boot>*]

(The bbcim specification does not clearly affirm that the filename is optional.) The stricter 'specialised standard format' is displayed thus:

   <filename> <load address> <exec address> <length> <lock> [*<boot>*] [<attrs>] [<type>]

The first five attributes are mandatory. The new attributes have meanings as follows:

  • OPT4=x (replacing x with a digit 0..3) is optional when <filename> equals !BOOT or $.!BOOT, and forbidden otherwise; it sets the boot option of the disc to which the file is written.
  • ATTR=XXXXXXXX (replacing XXXXXXXX with one to eight hex digits) reports the attribute doubleword from the OSFILE control block, representing permitted actions on the file. The meaning of the doubleword is filing system dependent and overrides <lock>.
  • TYPE=X reports the object type returned by OSFILE: X is replaced by 1 for a file or 2 for a directory.

Further limitations on existing attributes apply:

  • L is a forbidden <lock> value; only Locked and the empty string are allowed.

Xfer in C up to version 5.1 enforces the restrictions, refusing to transfer file data unless the correct length is specified.

Harston

J.G.Harston introduced an expanded set of positional (nameless) hexadecimal fields to record the essential attributes of modern or networked filing systems, viz.:

   filename  load  exec  length  access  modification_date  modification_time
   creation_date  creation_time  user_account  auxilary_account

By way of example:

   filename FFFF1900 FFFF8023 00001273 33 7B23 123106 7B20 112708 0100 0040

All fields from the third (exec) onwards are optional.

Date and time fields are filing system specific. The access field is not, and in the special case of Acorn DFS must be converted to the RISC OS convention. L is accepted as a synonym of 19.

His specification also acknowledges the named fields CRC=XXXX, identical to the one in bbcim, and BOOT=n, serving the same function as OPT4 in Xfer. BOOT must follow CRC, if present. The host filename is assumed to take priority over the BBC filename.

Other techniques

Acorn attributes may be preserved in more practical ways which RISC OS applications prefer:

Comma syntax

When transferring a file to a system that supports long file names but not Acorn attributes, the attributes may be appended to the filename as a punctuated string of hex digits. This convention appears in SparkFS, written by David Pilling in 1992, but its origin is uncertain.

Acorn MOS files are renamed as follows:

   <filename>,<load>-<exec>

where <load> and <exec> each consist of one to eight hex digits.

RISC OS files are transformed this way:

   <filename>,<filetype>

where <filetype> carries three (or four) hex digits.

As Acorn filenames may themselves contain commas, and as some of the shorter compositions are also valid filenames, it must be signalled out-of-band whether the names have been transformed or not. The transform is assumed in the following cases:

  • in Tar archives
  • in directory trees served to a RISC OS machine by HostFS

application/riscos MIME type

When transmitting files over the internet (including by electronic mail) the attributes may be embedded as parameters in the reserved application/riscos media type declaration sent in the message header. Support for these parameters is far from universal.

A full specification looks like this (in email messages and HTTP responses:)

   Content-Type: application/riscos; name="<filename>,<filetype>";
       type="<typename>"; load="<load>"; exec="<exec>"; access="<access>"

<filetype> is redundant to <load> and <exec>, as the latter encode the filetype along with the file's modification time in RISC OS format. <typename> is a human-readable gloss to aid in choosing the correct application.

In-protocol metadata

Arc and Zip archives created by Acorn-aware archivers embed the attributes in metadata stored with each member, as opposed to a separate INF member. The metadata is skipped when extracting the files on non-Acorn systems.

Xfer and Xfer in C communicate attributes in their custom protocol while copying files to and from the BBC Micro. The client software on the PC stores the attributes in its variant INF format files as mentioned above.

Regregex (talk) 02:11, 11 March 2022 (CET)