Difference between revisions of "High density floppy disc access"

From BeebWiki
Jump to: navigation, search
m (Added category.)
(Balanced-stack ISRs: updated explanation of NMI latency)
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
[[Category:Hardware]]
 
[[Category:Hardware]]
[[File:High density screenshot.jpg|320px|right|Screenshot of a 510 KB catalogue]]
+
[[File:High density screenshot.jpg|360px|right|Screenshot of a 510 KB catalogue]]
 
It is possible to read, write and format floppy discs on the BBC Micro in
 
It is possible to read, write and format floppy discs on the BBC Micro in
 
high density (that is, at 500 kHz bandwidth in MFM) with suitable hardware
 
high density (that is, at 500 kHz bandwidth in MFM) with suitable hardware
and a specially coded floppy disc controller driver. Besides DMA, which
+
and a specially coded floppy disc controller driver. Besides DMA, which
 
would require extensive modifications to the machine, there are many
 
would require extensive modifications to the machine, there are many
 
possible approaches to achieving the higher throughput including polling,
 
possible approaches to achieving the higher throughput including polling,
conventional ISRs and a self-interrupting ISR proposed by Tom Seddon. The
+
conventional ISRs and a self-interrupting ISR proposed by Tom Seddon. The
 
concept has been developed and proved with conventional ISRs and a
 
concept has been developed and proved with conventional ISRs and a
 
supporting busy loop, driving a slightly modified disc interface from 1984.  
 
supporting busy loop, driving a slightly modified disc interface from 1984.  
Line 13: Line 13:
  
 
==Hardware==
 
==Hardware==
[[File:High density modified board.jpg|360px|left|An interface modified for high density]]
+
[[File:High density modified board.jpg|1120px|left|An interface modified for high density]]
Prior to the 1983 introduction of the 'high density' 5¼-inch floppy drive
+
Prior to the 1983 introduction of the 'high density' 5 ¼-inch floppy drive
 
in the IBM PC-AT, the same signal format was in use on 8-inch floppies in
 
in the IBM PC-AT, the same signal format was in use on 8-inch floppies in
the IBM System/34 from 1978. Some mid-range controllers such as the WD279x
+
the IBM System/34 from 1978. Some mid-range controllers such as the WD279x
 
series supported this format and were also adopted in early third-party disc
 
series supported this format and were also adopted in early third-party disc
interface boards for the BBC Micro. Supplying the clock frequency specified
+
interface boards for the BBC Micro. Supplying the clock frequency specified
 
for 8-inch operation is the only modification needed for such boards;
 
for 8-inch operation is the only modification needed for such boards;
 
allowing frequency selection to permit continued single and double density
 
allowing frequency selection to permit continued single and double density
Line 24: Line 24:
  
 
Later controllers designed exclusively for double density have been found
 
Later controllers designed exclusively for double density have been found
good for high density work when overclocked. The Ajax controller fitted to
+
good for high density work when overclocked. The Ajax controller fitted to
 
the Atari STE is an apparently unmodified WD1772 rated at 16 MHz.  
 
the Atari STE is an apparently unmodified WD1772 rated at 16 MHz.  
 
Experimenters have had promising though imperfect results doubling the clock
 
Experimenters have had promising though imperfect results doubling the clock
rate to a standard
+
rate to a standard WD1770[http://www.stairwaytohell.com/sthforums/viewtopic.php?p=14418#p14418].  
WD1770[http://www.stairwaytohell.com/sthforums/viewtopic.php?p=14418#p14418].  
 
 
There is no foreseeable reason why other controller families designed for
 
There is no foreseeable reason why other controller families designed for
 
high density should not be usable.
 
high density should not be usable.
Line 38: Line 37:
  
 
===Timing===
 
===Timing===
[[File:NMI_timing_diagram.svg|thumb|320px|right|Diagram of worst case NMI timings. At a conservative rate of 15 µs per interrupt, loading to a 1 MHz address fails, shown by crosses.]]
+
[[File:NMI_timing_diagram.png|thumb|678px|right|Diagram of worst case NMI timings. At a conservative rate of 15 µs per interrupt, loading to a 1 MHz address fails, shown by crosses.]]
 
The non-maskable interrupt (NMI) service routines have been carefully
 
The non-maskable interrupt (NMI) service routines have been carefully
 
studied to ensure they meet the floppy drive controller's timing
 
studied to ensure they meet the floppy drive controller's timing
 
requirements at high speed, which they do in all but one occasional
 
requirements at high speed, which they do in all but one occasional
 
case. There are two constraints from the FDC:
 
case. There are two constraints from the FDC:
# the ISR must become re-entrant within 15 − ε µs of the interrupt;
+
# the ISR must become re-entrant within 15 − ε µs of the interrupt;
# the ISR must service the FDC within 11.5 µs of the interrupt.
+
# the ISR must service the FDC within 11.5 µs of the interrupt.
  
The first time interval is nominally 16 µs but allowance is made for jitter
+
The first time interval is nominally 16 µs but allowance is made for jitter
and fast disc drives. The ε (epsilon) means that in the worst case the NMIs will
+
and fast disc drives. The ε (epsilon) means that in the worst case the NMIs will
 
just miss one decision point and just catch the next.
 
just miss one decision point and just catch the next.
 
There are three further constraints from the 6502 CPU and the BBC Micro's
 
There are three further constraints from the 6502 CPU and the BBC Micro's
 
clock system:
 
clock system:
  
# 6502 instructions are atomic, and instruction processing may continue for up to 4.5 µs from the onset of an NMI. (This is composed of two clock cycles and one 7-cycle instruction; see below.)
+
# 6502 instructions are atomic, and instruction processing may continue for up to 4.5 µs from the onset of an NMI. (This is composed of two clock cycles and one 7-cycle instruction; see below.)
# After this instruction, the 6502 executes an NMI sequence for 3.5 µs.
+
# After this instruction, the 6502 executes an NMI sequence for 3.5 µs.
# A clock cycle that accesses a 1 MHz memory address is [[Cycle stretching|extended]] to 1 µs, or 1.5 µs if out-of-sync with an underlying 1 MHz monotonic clock (1MHzE).  Once the CPU is synchronised with this clock the length of later extended cycles can be predicted by machine code analysis.
+
# A clock cycle that accesses a 1 MHz memory address is [[Cycle stretching|extended]] to 1 µs, or 1.5 µs if out-of-sync with an underlying 1 MHz monotonic clock (1MHzE).  Once the CPU is synchronised with this clock the length of later extended cycles can be predicted by machine code analysis.
  
 
It has been found impossible to meet all these constraints, unless we
 
It has been found impossible to meet all these constraints, unless we
Line 61: Line 60:
 
interrupts disabled.
 
interrupts disabled.
  
We can exploit the atomic nature of 6502 instructions to buy time. The ISR
+
We can exploit the atomic nature of 6502 instructions to buy time. The ISR
 
need only be re-entrant, not complete, by the next interrupt; so as
 
need only be re-entrant, not complete, by the next interrupt; so as
 
long as the final state-changing instruction is in progress when an NMI
 
long as the final state-changing instruction is in progress when an NMI
arrives, the behaviour will be correct. The time this instruction takes to
+
arrives, the behaviour will be correct. The time this instruction takes to
 
complete has already been budgeted-for as the 'previous' instruction of the
 
complete has already been budgeted-for as the 'previous' instruction of the
 
next interrupt, and the stack will not overflow as long as the CPU gets
 
next interrupt, and the stack will not overflow as long as the CPU gets
 
partway through RTI, on average.
 
partway through RTI, on average.
  
According to some documentation, e.g. [http://www.nvg.org/bbc/doc/6502.txt 64doc]
+
According to some documentation, e.g. [http://www.nvg.org/bbc/doc/6502.txt 64doc] by Sonninen ''et al.'', to cause an
by Sonninen et al., to cause an NMI sequence in place of the next
+
NMI sequence in place of the next instruction the NMI must occur before the
instruction the NMI must occur before the last cycle of the current
+
last cycle of the current instruction. This is confirmed by traces<ref name="6502org-interrupt-bug"/> of 6502
instruction. This is confirmed by traces<ref name="6502org-interrupt-bug"/>
+
hardware, and caused by the 6502 sampling the NMI input only on certain
of 6502 hardware, with suggestions that the 6502 registers an NMI for two
+
cycles (usually the last) of each instruction. As we just missed the
cycles before commencing a sequence to reduce nuisance tripping. As we just
+
decision point in the worst case, the FDC service deadline comes 11 µs
missed the decision point in the worst case, the FDC service deadline comes
+
after the start of that next instruction. In all cases currently, we are
11 µs after the start of that next instruction. In all cases currently, we
+
clear. If 64doc were incorrect then the timings would still hold true; all
are clear. If 64doc were incorrect then the timings would still hold true;
+
the instruction rectangles would just move 1 clock cycle to the left such
all the instruction rectangles would just move 1 clock cycle to the left
+
that each ISR gets 0.5 µs more time to service the floppy drive controller.
such that each ISR gets 0.5 µs more time to service the floppy drive
 
controller.
 
  
The minimum time between Tube data channel accesses is 11.5 µs, which can be
+
The minimum time between Tube data channel accesses is 11.5 µs, which can be
increased with NOPs to 14.5 µs.
+
increased with NOPs to 14.5 µs.
  
 
====Reading from disc to I/O memory====
 
====Reading from disc to I/O memory====
[[File:Nmi rdio 16us check.svg|thumb|320px|right|Loading slow memory is stable at 16 µs ber byte.]]
+
[[File:Nmi rdio 16us check.png|thumb|690px|right|Loading slow memory is stable at 16 µs ber byte.]]
 
A failure mode has been discovered when reading to 1 MHz memory areas under
 
A failure mode has been discovered when reading to 1 MHz memory areas under
the above timings. As <tt>STA absolute,X</tt> instructions always take 5
+
the above timings. As <tt>STA absolute,X</tt> instructions always take 5
 
cycles, the ISR may overflow the stack when traversing 1 MHz pages such as
 
cycles, the ISR may overflow the stack when traversing 1 MHz pages such as
[[FRED]] and [[JIM]]. Also if a 1 MHz address at the end of a page is met
+
[[FRED]] and [[JIM]]. Also if a 1 MHz address at the end of a page is met
 
within a sector, the next few bytes may be written to the wrong page.
 
within a sector, the next few bytes may be written to the wrong page.
  
 
In such cases failure can be avoided as long as NMIs arrive at the nominal
 
In such cases failure can be avoided as long as NMIs arrive at the nominal
interval of 16 µs (see diagram); note that drives with crystal controlled
+
interval of 16 µs (see diagram); note that drives with crystal controlled
motors can stay very close to this figure. 2 MHz memory can still be
+
synchronous motors can stay very close to this figure. 2 MHz memory can still be
successfully loaded at 15 µs per byte.
+
successfully loaded at 15 µs per byte.
  
 
====6502 interrupt bug====
 
====6502 interrupt bug====
 
The 6502 has recently been
 
The 6502 has recently been
discovered<ref name="nesdev-interrupt-bug">[http://nesdev.parodius.com/bbs/viewtopic.php?p=63094#63094 Nesdev forum post] by blargg, 18 June 2010</ref><ref name="6502org-interrupt-bug">[http://forum.6502.org/viewtopic.php?p=11464#11464 6502.org forum post] by Hias Reichl, 3 September 2010</ref> to defer its
+
discovered<ref name="6502org-interrupt-bug">[http://forum.6502.org/viewtopic.php?p=11464#11464 6502.org forum post] by Hias Reichl, 3 September 2010</ref><ref name="nesdev-interrupt-bug">[http://nesdev.parodius.com/bbs/viewtopic.php?p=63094#63094 Nesdev forum post] by blargg, 18 June 2010</ref> to defer its
 
response to IRQs or NMIs occurring just before the last cycle of a taken
 
response to IRQs or NMIs occurring just before the last cycle of a taken
branch to the same page. In that case the branch completes and the
+
branch to the same page. In that case the branch completes and the
 
instruction at the destination is executed before the interrupt sequence is
 
instruction at the destination is executed before the interrupt sequence is
begun. The effect (as explained by Nesdev user ''blargg'') is that the
+
begun. The effect (as explained by Nesdev user ''blargg'') is that the
 
branch adds one cycle to the maximum interrupt latency of the destination
 
branch adds one cycle to the maximum interrupt latency of the destination
 
instruction.
 
instruction.
  
This is in addition to the one cycle of latency caused by double-checking
+
This is in addition to the usual one cycle of latency between sampling the
the NMI line (see above), thus one of the active samples must precede the
+
NMI input and fetching the next opcode; here the sample is taken on the
last cycle of the branch, making two cycles of latency in total.
+
penultimate cycle instead of the last, making two cycles of latency in
 +
total.
  
 
As long as all instructions in the busy loop, which are the target of
 
As long as all instructions in the busy loop, which are the target of
Line 119: Line 117:
 
====Calculating in the general case====
 
====Calculating in the general case====
 
Those who wish to develop their own high density system can check if the
 
Those who wish to develop their own high density system can check if the
worst case timings can be met, as follows. A useful ISR is assumed to have
+
worst case timings can be met, as follows. A useful ISR is assumed to have
 
at least these elements: a fetch, an indexed store, a register increment
 
at least these elements: a fetch, an indexed store, a register increment
 
instruction and a conditional branch for incrementing a pointer.
 
instruction and a conditional branch for incrementing a pointer.
  
# Determine the minimum expected interval between NMIs, in clock cycles. The datasheet will list a nominal value but allow some room for jitter and drive speed variations. 30 is a good number for high density, 60 for double, 120 for single.
+
# Determine the minimum expected interval between NMIs, in clock cycles. The datasheet will list a nominal value but allow some room for jitter and drive speed variations. 30 is a good number for high density, 60 for double, 120 for single.
# Calculate the maximum non-re-entrant service time. Start with 20 cycles: 7 for the NMI sequence, 4 for LDA, 5 for STA,X, 2 for INX and 2 for BNE.
+
# Calculate the maximum non-re-entrant service time. Start with 20 cycles: 7 for the NMI sequence, 4 for LDA, 5 for STA,X, 2 for INX and 2 for BNE.
# Add the number of cycles in the longest running instruction in your busy loop. If you have interrupts enabled, you should take this to be 7.
+
# Add the number of cycles in the longest running instruction in your busy loop. If you have interrupts enabled, you should take this to be 7.
 
# If any of the longest running instructions are the destination of a branch from the same page, add 1 cycle.
 
# If any of the longest running instructions are the destination of a branch from the same page, add 1 cycle.
 
# If your FDC is a WD2791 or WD2795, add 2 cycles.
 
# If your FDC is a WD2791 or WD2795, add 2 cycles.
 
# If your FDC is on the 1 MHz bus, add 2 cycles.
 
# If your FDC is on the 1 MHz bus, add 2 cycles.
 
# If users will be saving or loading data into 1 MHz areas and expecting it to work, add 3 cycles.
 
# If users will be saving or loading data into 1 MHz areas and expecting it to work, add 3 cycles.
# If the total is less than the NMI interval in cycles, the test passes. The ISR will make it to the INC instruction to cross a page boundary, and not overflow the stack the rest of the time.
+
# If the total is less than the NMI interval in cycles, the test passes. The ISR will make it to the INC instruction to cross a page boundary, and not overflow the stack the rest of the time.
The experimental system has a score of 31 and so only guarantees the nominal 32 cycle (16 µs) rate; see above.
+
The experimental system has a score of 31 and so only guarantees the nominal 32 cycle (16 µs) rate; see above.
  
 
===Restrictions===
 
===Restrictions===
 
The small amount of time available means many of the usual features of a
 
The small amount of time available means many of the usual features of a
 
disc ISR must be left out.  There is not enough time to count the bytes and
 
disc ISR must be left out.  There is not enough time to count the bytes and
discard them after a certain number have been transferred. Thus all
+
discard them after a certain number have been transferred. Thus all
 
transfers are rounded up to whole sectors; this is most critical in OSWORD
 
transfers are rounded up to whole sectors; this is most critical in OSWORD
 
&7F which can no longer emulate the 8271's Read ID command faithfully, and
 
&7F which can no longer emulate the 8271's Read ID command faithfully, and
 
less so in OSFILE where some memory beyond end-of-file will be overwritten,
 
less so in OSFILE where some memory beyond end-of-file will be overwritten,
though in most cases this is not a significant problem. If multiple sector
+
though in most cases this is not a significant problem. If multiple sector
 
transfer commands (%10x1xxxx) are to be used, the busy loop would have to
 
transfer commands (%10x1xxxx) are to be used, the busy loop would have to
 
count the sectors and issue a Force Interrupt after the appropriate number,
 
count the sectors and issue a Force Interrupt after the appropriate number,
Line 147: Line 145:
 
Nor is there time to test the FDC status register and determine whether this
 
Nor is there time to test the FDC status register and determine whether this
 
is a data request (DRQ) or an interrupt request (INTRQ) signalling that the
 
is a data request (DRQ) or an interrupt request (INTRQ) signalling that the
command has terminated. In most WD1770-based boards these two lines are
+
command has terminated. In most WD1770-based boards these two lines are
 
both connected to the CPU's NMI input, but in combination with high speed
 
both connected to the CPU's NMI input, but in combination with high speed
 
ISRs this will cause dropped bytes when writing to disc and extra bytes when
 
ISRs this will cause dropped bytes when writing to disc and extra bytes when
reading. The INTRQ line will need to be disconnected if using ISRs, which
+
reading. The INTRQ line will need to be disconnected if using ISRs, which
 
should not but may affect the hardware's compatibility with standard filing
 
should not but may affect the hardware's compatibility with standard filing
 
system ROMs.
 
system ROMs.
  
The ISR cannot afford to save the registers it uses. These then become
+
The ISR cannot afford to save the registers it uses. These then become
 
''volatile'' in the main thread and the AUG warns us on p.296, "If they are
 
''volatile'' in the main thread and the AUG warns us on p.296, "If they are
 
modified, the main program will suddenly find garbage in its registers in
 
modified, the main program will suddenly find garbage in its registers in
the middle of some important processing. It is probable that a total system
+
the middle of some important processing. It is probable that a total system
‘crash’ would result from this." Therefore all disc operations must be
+
âcrashâ would result from this." Therefore all disc operations must be
confined within the busy loop with maskable interrupts disabled. As the
+
confined within the busy loop with maskable interrupts disabled. As the
 
keyboard will not be read, commands cannot be 'typed ahead' while a file is
 
keyboard will not be read, commands cannot be 'typed ahead' while a file is
 
loading; and the busy loop cannot use the volatile registers except for
 
loading; and the busy loop cannot use the volatile registers except for
Line 165: Line 163:
  
 
On the other hand, the ISR can save state in the volatile registers; they
 
On the other hand, the ISR can save state in the volatile registers; they
now become out-of-bounds to the busy loop altogether. For instance, X can
+
now become out-of-bounds to the busy loop altogether. For instance, X can
 
become a running index register so that only the high address byte of a
 
become a running index register so that only the high address byte of a
 
fetch or store instruction would need to be incremented in any interrupt
 
fetch or store instruction would need to be incremented in any interrupt
period. The low byte of the address should be kept at &00, so that fetches
+
period. The low byte of the address should be kept at &00, so that fetches
 
do not take an extra cycle to cross a page boundary.
 
do not take an extra cycle to cross a page boundary.
  
 
===Byte-in-hand===
 
===Byte-in-hand===
 
With respect to write operations, to help meet the FDC's service time (11.5
 
With respect to write operations, to help meet the FDC's service time (11.5
µs) each byte can be prefetched so that the ISR can send it to the
+
µs) each byte can be prefetched so that the ISR can send it to the
controller first thing before fetching the next byte. The byte rests 'in
+
controller first thing before fetching the next byte. The byte rests 'in
 
hand' in one of the volatile registers between interrupts; the main routine
 
hand' in one of the volatile registers between interrupts; the main routine
 
or busy loop prefetches the first byte of each request before sending the
 
or busy loop prefetches the first byte of each request before sending the
 
first command.
 
first command.
  
This means that one extra byte is fetched and discarded per request. When
+
This means that one extra byte is fetched and discarded per request. When
 
saving I/O memory there is a mild risk of a side effect when the area to be
 
saving I/O memory there is a mild risk of a side effect when the area to be
 
saved ends near a memory-mapped register (though consider also the issue of
 
saved ends near a memory-mapped register (though consider also the issue of
whole-sector transfers above.Likewise with the Tube, and this is the only
+
whole-sector transfers above). Likewise with the Tube, and this is the only
 
issue there when OSWORD &7F is the exclusive user of the data channel.
 
issue there when OSWORD &7F is the exclusive user of the data channel.
  
 
However if the channel owner (e.g. OSGBPB) calls OSWORD &7F to fetch bytes
 
However if the channel owner (e.g. OSGBPB) calls OSWORD &7F to fetch bytes
 
from the Tube on its behalf, then the owner will find some data has been
 
from the Tube on its behalf, then the owner will find some data has been
dropped and subsequent bytes will be out of sequence. In such a case OSGBPB
+
dropped and subsequent bytes will be out of sequence. In such a case OSGBPB
 
should ideally be structured so that the channel is reopened after OSWORD
 
should ideally be structured so that the channel is reopened after OSWORD
 
&7F is called, as EDOSpat now does; however it may initially be easier to
 
&7F is called, as EDOSpat now does; however it may initially be easier to
 
revert the ISR to the conventional, fetch-before-store form in the singular
 
revert the ISR to the conventional, fetch-before-store form in the singular
case of saving from the Tube. (The above does not apply to OSGBPB saving
+
case of saving from the Tube. (The above does not apply to OSGBPB saving
 
the file buffers in I/O memory.)
 
the file buffers in I/O memory.)
  
Line 199: Line 197:
 
====Busy loop====
 
====Busy loop====
 
The main thread is confined to this loop while a disc operation is in progress.
 
The main thread is confined to this loop while a disc operation is in progress.
               \Based on NMI disc op routine from EDOS 0.4 by Alan Williams
+
               \ Based on NMI disc op routine from EDOS 0.4 by Alan Williams
               \Target is the Opus WD2791 interface.
+
               \ Target is the Opus WD2791 interface.
 
   
 
   
               \On entry A=ROM slot to access
+
               \ On entry A=ROM slot to access
               \         X=Value as required by ISR
+
               \         X=Value as required by ISR
               \         Y=FDC command
+
               \         Y=FDC command
  0D10          . edospat_disc_op
+
  0D10          .edospat_disc_op
 
  0D10 85 F4    STA mos_romsel_copy
 
  0D10 85 F4    STA mos_romsel_copy
 
  0D12 8D 30 FE STA bbc_romsel
 
  0D12 8D 30 FE STA bbc_romsel
 
   
 
   
               \Loop while b7 and b5 both clear.
+
               \ Loop while b7 and b5 both clear.
               \WD2791 drops some commands otherwise.
+
               \ WD2791 drops some commands otherwise.
  0D15          . edospat_disc_op_wait
+
  0D15          .edospat_disc_op_wait
 
  0D15 AD 80 FE LDA fdc_base+0
 
  0D15 AD 80 FE LDA fdc_base+0
  0D18 49 5F    EOR# st_sense%EOR disc_op_eor%
+
  0D18 49 5F    EOR # st_sense%EOR disc_op_eor%
  0D1A 29 A0    AND# disc_op_and%
+
  0D1A 29 A0    AND # disc_op_and%
 
  0D1C          OPT ps%
 
  0D1C          OPT ps%
 
  0D1C F0 F7    BEQ dest%
 
  0D1C F0 F7    BEQ dest%
 
  0D1E          OPT FNbndrq( ps%, disc_op_bne%, edospat_disc_op_wait)
 
  0D1E          OPT FNbndrq( ps%, disc_op_bne%, edospat_disc_op_wait)
 
   
 
   
               \Disable interrupts
+
               \ Disable interrupts
 
  0D1E 08      PHP
 
  0D1E 08      PHP
 
  0D1F 78      SEI
 
  0D1F 78      SEI
 
   
 
   
               \First address, or load-immediate instructions pasted here.
+
               \ First address, or load-immediate instructions pasted here.
               \The address may be in another ROM slot, hence loaded here.
+
               \ The address may be in another ROM slot, hence loaded here.
  0D20          . edospat_disc_op_addr
+
  0D20          .edospat_disc_op_addr
 
  0D20 AD 00 0E LDA nmi_form_bytes+0
 
  0D20 AD 00 0E LDA nmi_form_bytes+0
 
  0D23 EA      NOP
 
  0D23 EA      NOP
 
   
 
   
               \Send command. A and X are now out of bounds
+
               \ Send command. A and X are now out of bounds
               \Then wait 50 us for status register to settle
+
               \ Then wait 50 us for status register to settle
 
  0D24 8C 80 FE STY fdc_base+0
 
  0D24 8C 80 FE STY fdc_base+0
  0D27 A0 14    LDY#20
+
  0D27 A0 14    LDY #20
  0D29          . edospat_disc_op_settle
+
  0D29          .edospat_disc_op_settle
 
  0D29 88      DEY
 
  0D29 88      DEY
 
  0D2A D0 FD    BNE edospat_disc_op_settle
 
  0D2A D0 FD    BNE edospat_disc_op_settle
 
   
 
   
               \Loop until controller indicates ready.
+
               \ Loop until controller indicates ready.
  0D2C          . edospat_disc_op_loop
+
  0D2C          .edospat_disc_op_loop
 
  0D2C AC 80 FE LDY fdc_base+0
 
  0D2C AC 80 FE LDY fdc_base+0
 
  0D2F          OPT ps%
 
  0D2F          OPT ps%
Line 244: Line 242:
 
  0D31          OPT FNbnrdy( ps%, st_sense%, edospat_disc_op_loop)
 
  0D31          OPT FNbnrdy( ps%, st_sense%, edospat_disc_op_loop)
 
   
 
   
               \Loop until controller indicates not busy also.
+
               \ Loop until controller indicates not busy also.
  0D31          . edospat_disc_op_test_busy
+
  0D31          .edospat_disc_op_test_busy
 
  0D31 84 A0    STY edos_disc_op_temp
 
  0D31 84 A0    STY edos_disc_op_temp
 
  0D33 46 A0    LSR edos_disc_op_temp
 
  0D33 46 A0    LSR edos_disc_op_temp
Line 252: Line 250:
 
  0D37          OPT FNbbusy( ps%, st_sense%, edospat_disc_op_loop)
 
  0D37          OPT FNbbusy( ps%, st_sense%, edospat_disc_op_loop)
 
   
 
   
               \Save ISR's A and X for the next call.
+
               \ Save ISR's A and X for the next call.
  0D37          . edospat_disc_op_exit
+
  0D37          .edospat_disc_op_exit
 
  0D37 8E 21 0D STX edospat_disc_op_addr+1
 
  0D37 8E 21 0D STX edospat_disc_op_addr+1
 
  0D3A 8D 23 0D STA edospat_disc_op_addr+3
 
  0D3A 8D 23 0D STA edospat_disc_op_addr+3
  0D3D A9 A2    LDA#&A2  \=LDX immediate
+
  0D3D A9 A2    LDA #&A2  \ =LDX immediate
 
  0D3F 8D 20 0D STA edospat_disc_op_addr
 
  0D3F 8D 20 0D STA edospat_disc_op_addr
  0D42 A9 A9    LDA#&A9  \=LDA immediate
+
  0D42 A9 A9    LDA #&A9  \ =LDA immediate
 
  0D44 8D 22 0D STA edospat_disc_op_addr+2
 
  0D44 8D 22 0D STA edospat_disc_op_addr+2
 
   
 
   
               \Our state is saved, restore interrupt state
+
               \ Our state is saved, restore interrupt state
 
  0D47 28      PLP
 
  0D47 28      PLP
 
   
 
   
               \Page the EDOS ROM back in
+
               \ Page the EDOS ROM back in
 
  0D48 AD 60 0D LDA edos_disc_op_rom
 
  0D48 AD 60 0D LDA edos_disc_op_rom
  0D4B          . edospat_disc_op_switch_rom
+
  0D4B          .edospat_disc_op_switch_rom
 
  0D4B 85 F4    STA mos_romsel_copy
 
  0D4B 85 F4    STA mos_romsel_copy
 
  0D4D 8D 30 FE STA bbc_romsel
 
  0D4D 8D 30 FE STA bbc_romsel
 
   
 
   
               \Restore X and Y to values on entry
+
               \ Restore X and Y to values on entry
 
  0D50 98      TYA
 
  0D50 98      TYA
 
  0D51 A6 A1    LDX edos_idcount
 
  0D51 A6 A1    LDX edos_idcount
 
  0D53 AC 63 0D LDY edos_disc_op_cmd
 
  0D53 AC 63 0D LDY edos_disc_op_cmd
 
   
 
   
               \Present controller status in A and set flags
+
               \ Present controller status in A and set flags
  0D56 49 FF    EOR# st_sense%
+
  0D56 49 FF    EOR # st_sense%
 
   
 
   
 
               \return to OSWORD &7F routine
 
               \return to OSWORD &7F routine
Line 282: Line 280:
  
 
====Interrupt service routines====
 
====Interrupt service routines====
               \Read from disc to I/O memory
+
               \ Read from disc to I/O memory
 
   
 
   
               \On entry X=low byte of destination address
+
               \ On entry X=low byte of destination address
               \         ?&0D07=high byte
+
               \         ?&0D07=high byte
  0D00          . nmi_rdio
+
  0D00          .nmi_rdio
 
  0D00 AD 83 FE LDA fdc_base+3
 
  0D00 AD 83 FE LDA fdc_base+3
  0D03 49 FF    EOR# sense%
+
  0D03 49 FF    EOR # sense%
  0D05          . nmi_rdio_addr
+
  0D05          .nmi_rdio_addr
  0D05 9D 00 0D STA mos_nmi AND&FF00,X
+
  0D05 9D 00 0D STA mos_nmi AND &FF00,X
 
  0D08 E8      INX
 
  0D08 E8      INX
 
  0D09 D0 03    BNE nmi_rdio_exit
 
  0D09 D0 03    BNE nmi_rdio_exit
 
  0D0B EE 07 0D INC nmi_rdio_addr+2
 
  0D0B EE 07 0D INC nmi_rdio_addr+2
  0D0E          . nmi_rdio_exit
+
  0D0E          .nmi_rdio_exit
 
  0D0E 40      RTI
 
  0D0E 40      RTI
  
               \Write from I/O memory to disc
+
               \ Write from I/O memory to disc
 
   
 
   
               \On entry A=contents of source address
+
               \ On entry A=contents of source address
               \         X=low byte of source address
+
               \         X=low byte of source address
               \         ?&0D0D=high byte
+
               \         ?&0D0D=high byte
  0D00          . nmi_wrio
+
  0D00          .nmi_wrio
  0D00 49 FF    EOR# sense%
+
  0D00 49 FF    EOR # sense%
 
  0D02 8D 83 FE STA fdc_base+3
 
  0D02 8D 83 FE STA fdc_base+3
 
  0D05 E8      INX
 
  0D05 E8      INX
 
  0D06 D0 03    BNE nmi_wrio_addr
 
  0D06 D0 03    BNE nmi_wrio_addr
 
  0D08 EE 0D 0D INC nmi_wrio_addr+2
 
  0D08 EE 0D 0D INC nmi_wrio_addr+2
  0D0B          . nmi_wrio_addr
+
  0D0B          .nmi_wrio_addr
  0D0B BD 00 0D LDA mos_nmi AND&FF00,X
+
  0D0B BD 00 0D LDA mos_nmi AND &FF00,X
  0D0E          . nmi_wrio_exit
+
  0D0E          .nmi_wrio_exit
 
  0D0E 40      RTI
 
  0D0E 40      RTI
  
               \Read from disc to coprocessor
+
               \ Read from disc to coprocessor
 
   
 
   
               \No entry conditions
+
               \ No entry conditions
  0D00          . nmi_rdtu
+
  0D00          .nmi_rdtu
 
  0D00 AD 83 FE LDA fdc_base+3
 
  0D00 AD 83 FE LDA fdc_base+3
  0D03 49 FF    EOR# sense%
+
  0D03 49 FF    EOR # sense%
 
  0D05 8D E5 FE STA tube_host_fifo_3
 
  0D05 8D E5 FE STA tube_host_fifo_3
 
  0D08 40      RTI
 
  0D08 40      RTI
  
               \Write from coprocessor to disc
+
               \ Write from coprocessor to disc
 
   
 
   
               \On entry A=byte to be written
+
               \ On entry A=byte to be written
  0D00          . nmi_wrtu
+
  0D00          .nmi_wrtu
  0D00 49 FF    EOR# sense%
+
  0D00 49 FF    EOR # sense%
 
  0D02 8D 83 FE STA fdc_base+3
 
  0D02 8D 83 FE STA fdc_base+3
 
  0D05 AD E5 FE LDA tube_host_fifo_3
 
  0D05 AD E5 FE LDA tube_host_fifo_3
 
  0D08 40      RTI
 
  0D08 40      RTI
  
               \Read sector IDs to coprocessor
+
               \ Read sector IDs to coprocessor
 
   
 
   
               \On entry X=bytes remaining (initially X=4)
+
               \ On entry X=bytes remaining (initially X=4)
  0D00          . nmi_idtu
+
  0D00          .nmi_idtu
 
  0D00 AD 83 FE LDA fdc_base+3
 
  0D00 AD 83 FE LDA fdc_base+3
  0D03 49 FF    EOR# sense%
+
  0D03 49 FF    EOR # sense%
 
  0D05 CA      DEX
 
  0D05 CA      DEX
 
  0D06 30 03    BMI nmi_idtu_exit
 
  0D06 30 03    BMI nmi_idtu_exit
 
  0D08 8D E5 FE STA tube_host_fifo_3
 
  0D08 8D E5 FE STA tube_host_fifo_3
  0D0B          . nmi_idtu_exit
+
  0D0B          .nmi_idtu_exit
 
  0D0B 40      RTI
 
  0D0B 40      RTI
  
               \Verify disc
+
               \ Verify disc
 
   
 
   
               \No entry conditions
+
               \ No entry conditions
  0D00          . nmi_veri
+
  0D00          .nmi_veri
 
  0D00 2C 83 FE BIT  fdc_base+3
 
  0D00 2C 83 FE BIT  fdc_base+3
 
  0D03 40      RTI
 
  0D03 40      RTI
  
               \Format disc from RLE table in pages &0E..&0F
+
               \ Format disc from RLE table in pages &0E..&0F
 
   
 
   
               \On entry A=current byte from table (pre-inverted)
+
               \ On entry A=current byte from table (pre-inverted)
               \         X=current table index (initially 0)
+
               \         X=current table index (initially 0)
  0D00          . nmi_form
+
  0D00          .nmi_form
 
  0D00 8D 83 FE STA fdc_base+3
 
  0D00 8D 83 FE STA fdc_base+3
 
  0D03 DE 00 0F DEC nmi_form_counts,X
 
  0D03 DE 00 0F DEC nmi_form_counts,X
 
  0D06 D0 04    BNE nmi_form_exit
 
  0D06 D0 04    BNE nmi_form_exit
 
  0D08 E8      INX
 
  0D08 E8      INX
  0D09          . nmi_form_addr
+
  0D09          .nmi_form_addr
 
  0D09 BD 00 0E LDA nmi_form_bytes,X
 
  0D09 BD 00 0E LDA nmi_form_bytes,X
  0D0C          . nmi_form_exit
+
  0D0C          .nmi_form_exit
 
  0D0C 40      RTI
 
  0D0C 40      RTI
  
Line 371: Line 369:
 
The absence of NMI sequences and RTI instructions frees a small amount of
 
The absence of NMI sequences and RTI instructions frees a small amount of
 
time to implement more features, such as measuring and cutting off the
 
time to implement more features, such as measuring and cutting off the
transfer. With this approach the DRQ pin would have to be disconnected but
+
transfer. With this approach the DRQ pin would have to be disconnected but
 
INTRQ could remain attached (and a suitable ISR supplied) if desired.
 
INTRQ could remain attached (and a suitable ISR supplied) if desired.
  
Line 377: Line 375:
 
A negative-going edge on pin 38 of the 6502 CPU sets the overflow flag in
 
A negative-going edge on pin 38 of the 6502 CPU sets the overflow flag in
 
the status register.  A BVC instruction branching to itself produces a loop
 
the status register.  A BVC instruction branching to itself produces a loop
that exits within 3 clock cycles of lowering this pin. By connecting DRQ to
+
that exits within 3 clock cycles of lowering this pin. By connecting DRQ to
 
this pin through an inverter the CPU can execute a loop transferring a byte
 
this pin through an inverter the CPU can execute a loop transferring a byte
to the FDC every 8 µs or slightly less. This is fast enough to consider
+
to the FDC every 8 µs or slightly less. This is fast enough to consider
octal density, also known as extended density (ED) or the 2.8 MB format. In
+
octal density, also known as extended density (ED) or the 2.8 MB format. In
 
that case all addresses touched must be at 2 MHz and no page boundaries can
 
that case all addresses touched must be at 2 MHz and no page boundaries can
be crossed, limiting sectors to 128 or 256 bytes. This time the INTRQ
+
be crossed, limiting sectors to 128 or 256 bytes. This time the INTRQ
 
signal, similarly inverted, will be needed to raise an NMI to break out of
 
signal, similarly inverted, will be needed to raise an NMI to break out of
 
the loop.
 
the loop.
  
         \On entry X=low byte of address - 1
+
         \ On entry X=low byte of address - 1
         \         (or &FF if sector size = 256 bytes)
+
         \         (or &FF if sector size = 256 bytes)
         \         user=(address - 0) AND &FF00
+
         \         user=(address - 0) AND &FF00
 
   
 
   
 
  .disc_op
 
  .disc_op
Line 414: Line 412:
  
 
This is the fastest and most resilient polling loop possible, limited by the
 
This is the fastest and most resilient polling loop possible, limited by the
range of relative branching. It supports any mean request interval longer
+
range of relative branching. It supports any mean request interval longer
than 7.88 µs: in the worst case twelve requests 6½ to 8½ µs apart can
+
than 7.88 µs: in the worst case twelve requests to 8½ µs apart can
 
cause the code to reach the penultimate or last instruction and loop back in
 
cause the code to reach the penultimate or last instruction and loop back in
189 cycles. Typical request sequences with less jitter will be able to have
+
189 cycles. Typical request sequences with less jitter will be able to have
a mean interval closer to the absolute limit of 7.54 µs.
+
a mean interval closer to the absolute limit of 7.54 µs.
  
 
==Self-interrupting ISR==
 
==Self-interrupting ISR==
A method with potential when DRQ cannot be disconnected. The concept, due
+
A method with potential when DRQ cannot be disconnected. The concept, due
 
to Tom Seddon, is that NMIs occur in bursts and the ISR is repeated almost
 
to Tom Seddon, is that NMIs occur in bursts and the ISR is repeated almost
 
back-to-back; so it may as well be made long enough to interrupt itself,
 
back-to-back; so it may as well be made long enough to interrupt itself,
freeing the time taken by RTI to implement more features. As the busy loop
+
freeing the time taken by RTI to implement more features. As the busy loop
 
can be suspended entirely during a burst and the ISR itself is re-entrant
 
can be suspended entirely during a burst and the ISR itself is re-entrant
 
after a critical time, there is no need to stack and unstack a working set
 
after a critical time, there is no need to stack and unstack a working set
 
for every interrupt and we are pretty much back to polling, with the CPU
 
for every interrupt and we are pretty much back to polling, with the CPU
doing the branch for us. From Tom's
+
doing the branch for us. From Tom's
 
[http://mdfs.net/Archive/BBCMicro/2005/10/30/005559.htm Mailing List post]:
 
[http://mdfs.net/Archive/BBCMicro/2005/10/30/005559.htm Mailing List post]:
  
 
  The NMI routine would go a bit like this:
 
  The NMI routine would go a bit like this:
 
   
 
   
        \ +7 (7) -- NMI overhead
+
                            \ +7 (7) -- NMI overhead
  &D00    LDA FDC_DATA \ +6 (13)
+
  &D00    LDA FDC_DATA       \ +6 (13)
         STA (&C0),Y \ +6 (19)
+
         STA (&C0),Y         \ +6 (19)
         INY \ +2 (21)
+
         INY                 \ +2 (21)
 
         BNE NO_BUMP:INC &C1 \ +7 (worst case) 28 <-- point A
 
         BNE NO_BUMP:INC &C1 \ +7 (worst case) 28 <-- point A
         .NO_BUMP TXS \ +2 (30)
+
         .NO_BUMP TXS       \ +2 (30)
         NOP \ +2 (32)
+
         NOP                 \ +2 (32)
         NOP:NOP:NOP \ +6 (cater for best-case timings)
+
         NOP:NOP:NOP         \ +6 (cater for best-case timings)
         NOP \ the lucky NOP
+
         NOP                 \ the lucky NOP
 
         JMP READ_END
 
         JMP READ_END
  
 
However, pushing the PC and status register is an unwanted side effect of
 
However, pushing the PC and status register is an unwanted side effect of
 
receiving an NMI, and the ISR must reset the stack pointer with TXS, in
 
receiving an NMI, and the ISR must reset the stack pointer with TXS, in
place of the RTI. This saves four cycles at the tail end but gives us no
+
place of the RTI. This saves four cycles at the tail end but gives us no
more free registers in the ISR. The real advantage is that the ISR can wait
+
more free registers in the ISR. The real advantage is that the ISR can wait
 
for the next call with a string of NOPs, eliminating up to 5 cycles' latency
 
for the next call with a string of NOPs, eliminating up to 5 cycles' latency
at the head end. Care should be taken to assemble enough NOPs to cover the
+
at the head end. Care should be taken to assemble enough NOPs to cover the
single density data rate (one NMI every 64 µs). Also the registers remain
+
single density data rate (one NMI every 64 µs). Also the registers remain
 
stable in the busy loop, letting us reduce the maximum length of
 
stable in the busy loop, letting us reduce the maximum length of
instructions there. For instance bit 0 of the status register can be polled
+
instructions there. For instance bit 0 of the status register can be polled
 
with:
 
with:
  
Line 471: Line 469:
  
 
Bear in mind that in case of errors such as <tt>Sector not found</tt>, no
 
Bear in mind that in case of errors such as <tt>Sector not found</tt>, no
DRQs may be issued at all. It remains to be seen how the extra time and
+
DRQs may be issued at all. It remains to be seen how the extra time and
 
freedom can be applied.
 
freedom can be applied.
  
Line 480: Line 478:
  
 
==Other ways to use HD discs==
 
==Other ways to use HD discs==
Martin Barr reports that commodity 3½-inch high density discs work reliably
+
Martin Barr reports that commodity -inch high density discs work reliably
when formatted at 500 kHz in FM. Often, high density discs cannot hold
+
when formatted at 500 kHz in FM. Often, high density discs cannot hold
 
single- or double-density data for any length of time, even if the high
 
single- or double-density data for any length of time, even if the high
density hole in the jacket is covered. With true high density being a third
+
density hole in the jacket is covered. With true high density being a third
 
format this fourth, non-typical signal type was originally used on 8-inch
 
format this fourth, non-typical signal type was originally used on 8-inch
 
floppies by the IBM 3740.
 
floppies by the IBM 3740.
Line 489: Line 487:
 
With the WDC x7xx family and the Intel 8271, obtaining this type is a simple
 
With the WDC x7xx family and the Intel 8271, obtaining this type is a simple
 
matter of doubling the clock frequency while keeping the ~DDEN input high,
 
matter of doubling the clock frequency while keeping the ~DDEN input high,
if present. Existing ISRs capable of double density service can be reused
+
if present. Existing ISRs capable of double density service can be reused
as the data rate is the same. A double-density sector format such as ADFS,
+
as the data rate is the same. A double-density sector format such as ADFS,
 
DDOS or Watford DDFS can be employed with few changes, or single density
 
DDOS or Watford DDFS can be employed with few changes, or single density
 
Acorn DFS which uses half of each track and so needs less media rotation per
 
Acorn DFS which uses half of each track and so needs less media rotation per
track, increasing access speed. Among the modifications that must be done
+
track, increasing access speed. Among the modifications that must be done
 
to the former DFSs is that the disc formatting utility must prepare single
 
to the former DFSs is that the disc formatting utility must prepare single
 
density preambles to each sector header and data area.
 
density preambles to each sector header and data area.

Latest revision as of 19:37, 6 October 2019

Screenshot of a 510 KB catalogue

It is possible to read, write and format floppy discs on the BBC Micro in high density (that is, at 500 kHz bandwidth in MFM) with suitable hardware and a specially coded floppy disc controller driver. Besides DMA, which would require extensive modifications to the machine, there are many possible approaches to achieving the higher throughput including polling, conventional ISRs and a self-interrupting ISR proposed by Tom Seddon. The concept has been developed and proved with conventional ISRs and a supporting busy loop, driving a slightly modified disc interface from 1984. Naturally any of the techniques mentioned here require a high density disc and disc drive.

Hardware

An interface modified for high density

Prior to the 1983 introduction of the 'high density' 5 ¼-inch floppy drive in the IBM PC-AT, the same signal format was in use on 8-inch floppies in the IBM System/34 from 1978. Some mid-range controllers such as the WD279x series supported this format and were also adopted in early third-party disc interface boards for the BBC Micro. Supplying the clock frequency specified for 8-inch operation is the only modification needed for such boards; allowing frequency selection to permit continued single and double density operation is a useful extra.

Later controllers designed exclusively for double density have been found good for high density work when overclocked. The Ajax controller fitted to the Atari STE is an apparently unmodified WD1772 rated at 16 MHz. Experimenters have had promising though imperfect results doubling the clock rate to a standard WD1770[1]. There is no foreseeable reason why other controller families designed for high density should not be usable.

Balanced-stack ISRs

In this instance balanced-stack means that the ISR anticipates completion before the next interrupt, on average, and does not interfere with the stack so as to defray overflow.

Timing

Diagram of worst case NMI timings. At a conservative rate of 15 µs per interrupt, loading to a 1 MHz address fails, shown by crosses.

The non-maskable interrupt (NMI) service routines have been carefully studied to ensure they meet the floppy drive controller's timing requirements at high speed, which they do in all but one occasional case. There are two constraints from the FDC:

  1. the ISR must become re-entrant within 15 − ε µs of the interrupt;
  2. the ISR must service the FDC within 11.5 µs of the interrupt.

The first time interval is nominally 16 µs but allowance is made for jitter and fast disc drives. The ε (epsilon) means that in the worst case the NMIs will just miss one decision point and just catch the next. There are three further constraints from the 6502 CPU and the BBC Micro's clock system:

  1. 6502 instructions are atomic, and instruction processing may continue for up to 4.5 µs from the onset of an NMI. (This is composed of two clock cycles and one 7-cycle instruction; see below.)
  2. After this instruction, the 6502 executes an NMI sequence for 3.5 µs.
  3. A clock cycle that accesses a 1 MHz memory address is extended to 1 µs, or 1.5 µs if out-of-sync with an underlying 1 MHz monotonic clock (1MHzE). Once the CPU is synchronised with this clock the length of later extended cycles can be predicted by machine code analysis.

It has been found impossible to meet all these constraints, unless we ensure that no instruction longer than 6 cycles is executing when the NMI occurs, which can be arranged by carefully coding a busy loop with interrupts disabled.

We can exploit the atomic nature of 6502 instructions to buy time. The ISR need only be re-entrant, not complete, by the next interrupt; so as long as the final state-changing instruction is in progress when an NMI arrives, the behaviour will be correct. The time this instruction takes to complete has already been budgeted-for as the 'previous' instruction of the next interrupt, and the stack will not overflow as long as the CPU gets partway through RTI, on average.

According to some documentation, e.g. 64doc by Sonninen et al., to cause an NMI sequence in place of the next instruction the NMI must occur before the last cycle of the current instruction. This is confirmed by traces[1] of 6502 hardware, and caused by the 6502 sampling the NMI input only on certain cycles (usually the last) of each instruction. As we just missed the decision point in the worst case, the FDC service deadline comes 11 µs after the start of that next instruction. In all cases currently, we are clear. If 64doc were incorrect then the timings would still hold true; all the instruction rectangles would just move 1 clock cycle to the left such that each ISR gets 0.5 µs more time to service the floppy drive controller.

The minimum time between Tube data channel accesses is 11.5 µs, which can be increased with NOPs to 14.5 µs.

Reading from disc to I/O memory

Loading slow memory is stable at 16 µs ber byte.

A failure mode has been discovered when reading to 1 MHz memory areas under the above timings. As STA absolute,X instructions always take 5 cycles, the ISR may overflow the stack when traversing 1 MHz pages such as FRED and JIM. Also if a 1 MHz address at the end of a page is met within a sector, the next few bytes may be written to the wrong page.

In such cases failure can be avoided as long as NMIs arrive at the nominal interval of 16 µs (see diagram); note that drives with crystal controlled synchronous motors can stay very close to this figure. 2 MHz memory can still be successfully loaded at 15 µs per byte.

6502 interrupt bug

The 6502 has recently been discovered[1][2] to defer its response to IRQs or NMIs occurring just before the last cycle of a taken branch to the same page. In that case the branch completes and the instruction at the destination is executed before the interrupt sequence is begun. The effect (as explained by Nesdev user blargg) is that the branch adds one cycle to the maximum interrupt latency of the destination instruction.

This is in addition to the usual one cycle of latency between sampling the NMI input and fetching the next opcode; here the sample is taken on the penultimate cycle instead of the last, making two cycles of latency in total.

As long as all instructions in the busy loop, which are the target of branches from the same page, are shorter in duration than the longest instruction in the loop, then there is no impact on the worst-case interrupt latency of the system.

Calculating in the general case

Those who wish to develop their own high density system can check if the worst case timings can be met, as follows. A useful ISR is assumed to have at least these elements: a fetch, an indexed store, a register increment instruction and a conditional branch for incrementing a pointer.

  1. Determine the minimum expected interval between NMIs, in clock cycles. The datasheet will list a nominal value but allow some room for jitter and drive speed variations. 30 is a good number for high density, 60 for double, 120 for single.
  2. Calculate the maximum non-re-entrant service time. Start with 20 cycles: 7 for the NMI sequence, 4 for LDA, 5 for STA,X, 2 for INX and 2 for BNE.
  3. Add the number of cycles in the longest running instruction in your busy loop. If you have interrupts enabled, you should take this to be 7.
  4. If any of the longest running instructions are the destination of a branch from the same page, add 1 cycle.
  5. If your FDC is a WD2791 or WD2795, add 2 cycles.
  6. If your FDC is on the 1 MHz bus, add 2 cycles.
  7. If users will be saving or loading data into 1 MHz areas and expecting it to work, add 3 cycles.
  8. If the total is less than the NMI interval in cycles, the test passes. The ISR will make it to the INC instruction to cross a page boundary, and not overflow the stack the rest of the time.

The experimental system has a score of 31 and so only guarantees the nominal 32 cycle (16 µs) rate; see above.

Restrictions

The small amount of time available means many of the usual features of a disc ISR must be left out. There is not enough time to count the bytes and discard them after a certain number have been transferred. Thus all transfers are rounded up to whole sectors; this is most critical in OSWORD &7F which can no longer emulate the 8271's Read ID command faithfully, and less so in OSFILE where some memory beyond end-of-file will be overwritten, though in most cases this is not a significant problem. If multiple sector transfer commands (%10x1xxxx) are to be used, the busy loop would have to count the sectors and issue a Force Interrupt after the appropriate number, but the main routine can issue a chain of single sector transfers instead.

Nor is there time to test the FDC status register and determine whether this is a data request (DRQ) or an interrupt request (INTRQ) signalling that the command has terminated. In most WD1770-based boards these two lines are both connected to the CPU's NMI input, but in combination with high speed ISRs this will cause dropped bytes when writing to disc and extra bytes when reading. The INTRQ line will need to be disconnected if using ISRs, which should not but may affect the hardware's compatibility with standard filing system ROMs.

The ISR cannot afford to save the registers it uses. These then become volatile in the main thread and the AUG warns us on p.296, "If they are modified, the main program will suddenly find garbage in its registers in the middle of some important processing. It is probable that a total system âcrashâ would result from this." Therefore all disc operations must be confined within the busy loop with maskable interrupts disabled. As the keyboard will not be read, commands cannot be 'typed ahead' while a file is loading; and the busy loop cannot use the volatile registers except for their side effects, such as setting flags.

On the other hand, the ISR can save state in the volatile registers; they now become out-of-bounds to the busy loop altogether. For instance, X can become a running index register so that only the high address byte of a fetch or store instruction would need to be incremented in any interrupt period. The low byte of the address should be kept at &00, so that fetches do not take an extra cycle to cross a page boundary.

Byte-in-hand

With respect to write operations, to help meet the FDC's service time (11.5 µs) each byte can be prefetched so that the ISR can send it to the controller first thing before fetching the next byte. The byte rests 'in hand' in one of the volatile registers between interrupts; the main routine or busy loop prefetches the first byte of each request before sending the first command.

This means that one extra byte is fetched and discarded per request. When saving I/O memory there is a mild risk of a side effect when the area to be saved ends near a memory-mapped register (though consider also the issue of whole-sector transfers above). Likewise with the Tube, and this is the only issue there when OSWORD &7F is the exclusive user of the data channel.

However if the channel owner (e.g. OSGBPB) calls OSWORD &7F to fetch bytes from the Tube on its behalf, then the owner will find some data has been dropped and subsequent bytes will be out of sequence. In such a case OSGBPB should ideally be structured so that the channel is reopened after OSWORD &7F is called, as EDOSpat now does; however it may initially be easier to revert the ISR to the conventional, fetch-before-store form in the singular case of saving from the Tube. (The above does not apply to OSGBPB saving the file buffers in I/O memory.)

Code

Tested code fragments adapted from the assembly output of EDOSPAT 5.10.

Busy loop

The main thread is confined to this loop while a disc operation is in progress.

              \ Based on NMI disc op routine from EDOS 0.4 by Alan Williams
              \ Target is the Opus WD2791 interface.

              \ On entry A=ROM slot to access
              \          X=Value as required by ISR
              \          Y=FDC command
0D10          .edospat_disc_op
0D10 85 F4    STA mos_romsel_copy
0D12 8D 30 FE STA bbc_romsel

              \ Loop while b7 and b5 both clear.
              \ WD2791 drops some commands otherwise.
0D15          .edospat_disc_op_wait
0D15 AD 80 FE LDA fdc_base+0
0D18 49 5F    EOR # st_sense%EOR disc_op_eor%
0D1A 29 A0    AND # disc_op_and%
0D1C          OPT ps%
0D1C F0 F7    BEQ dest%
0D1E          OPT FNbndrq( ps%, disc_op_bne%, edospat_disc_op_wait)

              \ Disable interrupts
0D1E 08       PHP
0D1F 78       SEI

              \ First address, or load-immediate instructions pasted here.
              \ The address may be in another ROM slot, hence loaded here.
0D20          .edospat_disc_op_addr
0D20 AD 00 0E LDA nmi_form_bytes+0
0D23 EA       NOP

              \ Send command. A and X are now out of bounds
              \ Then wait 50 us for status register to settle
0D24 8C 80 FE STY fdc_base+0
0D27 A0 14    LDY #20
0D29          .edospat_disc_op_settle
0D29 88       DEY
0D2A D0 FD    BNE edospat_disc_op_settle

              \ Loop until controller indicates ready.
0D2C          .edospat_disc_op_loop
0D2C AC 80 FE LDY fdc_base+0
0D2F          OPT ps%
0D2F 10 FB    BPL dest%
0D31          OPT FNbnrdy( ps%, st_sense%, edospat_disc_op_loop)

              \ Loop until controller indicates not busy also.
0D31          .edospat_disc_op_test_busy
0D31 84 A0    STY edos_disc_op_temp
0D33 46 A0    LSR edos_disc_op_temp
0D35          OPT ps%
0D35 90 F5    BCC dest%
0D37          OPT FNbbusy( ps%, st_sense%, edospat_disc_op_loop)

              \ Save ISR's A and X for the next call.
0D37          .edospat_disc_op_exit
0D37 8E 21 0D STX edospat_disc_op_addr+1
0D3A 8D 23 0D STA edospat_disc_op_addr+3
0D3D A9 A2    LDA #&A2   \ =LDX immediate
0D3F 8D 20 0D STA edospat_disc_op_addr
0D42 A9 A9    LDA #&A9   \ =LDA immediate
0D44 8D 22 0D STA edospat_disc_op_addr+2

              \ Our state is saved, restore interrupt state
0D47 28       PLP

              \ Page the EDOS ROM back in
0D48 AD 60 0D LDA edos_disc_op_rom
0D4B          .edospat_disc_op_switch_rom
0D4B 85 F4    STA mos_romsel_copy
0D4D 8D 30 FE STA bbc_romsel

              \ Restore X and Y to values on entry
0D50 98       TYA
0D51 A6 A1    LDX edos_idcount
0D53 AC 63 0D LDY edos_disc_op_cmd

              \ Present controller status in A and set flags
0D56 49 FF    EOR # st_sense%

              \return to OSWORD &7F routine
0D58 60       RTS

Interrupt service routines

              \ Read from disc to I/O memory

              \ On entry X=low byte of destination address
              \          ?&0D07=high byte
0D00          .nmi_rdio
0D00 AD 83 FE LDA fdc_base+3
0D03 49 FF    EOR # sense%
0D05          .nmi_rdio_addr
0D05 9D 00 0D STA mos_nmi AND &FF00,X
0D08 E8       INX
0D09 D0 03    BNE nmi_rdio_exit
0D0B EE 07 0D INC nmi_rdio_addr+2
0D0E          .nmi_rdio_exit
0D0E 40       RTI
              \ Write from I/O memory to disc

              \ On entry A=contents of source address
              \          X=low byte of source address
              \          ?&0D0D=high byte
0D00          .nmi_wrio
0D00 49 FF    EOR # sense%
0D02 8D 83 FE STA fdc_base+3
0D05 E8       INX
0D06 D0 03    BNE nmi_wrio_addr
0D08 EE 0D 0D INC nmi_wrio_addr+2
0D0B          .nmi_wrio_addr
0D0B BD 00 0D LDA mos_nmi AND &FF00,X
0D0E          .nmi_wrio_exit
0D0E 40       RTI
              \ Read from disc to coprocessor

              \ No entry conditions
0D00          .nmi_rdtu
0D00 AD 83 FE LDA fdc_base+3
0D03 49 FF    EOR # sense%
0D05 8D E5 FE STA tube_host_fifo_3
0D08 40       RTI
              \ Write from coprocessor to disc

              \ On entry A=byte to be written
0D00          .nmi_wrtu
0D00 49 FF    EOR # sense%
0D02 8D 83 FE STA fdc_base+3
0D05 AD E5 FE LDA tube_host_fifo_3
0D08 40       RTI
              \ Read sector IDs to coprocessor

              \ On entry X=bytes remaining (initially X=4)
0D00          .nmi_idtu
0D00 AD 83 FE LDA fdc_base+3
0D03 49 FF    EOR # sense%
0D05 CA       DEX
0D06 30 03    BMI nmi_idtu_exit
0D08 8D E5 FE STA tube_host_fifo_3
0D0B          .nmi_idtu_exit
0D0B 40       RTI
              \ Verify disc

              \ No entry conditions
0D00          .nmi_veri
0D00 2C 83 FE BIT  fdc_base+3
0D03 40       RTI
              \ Format disc from RLE table in pages &0E..&0F

              \ On entry A=current byte from table (pre-inverted)
              \          X=current table index (initially 0)
0D00          .nmi_form
0D00 8D 83 FE STA fdc_base+3
0D03 DE 00 0F DEC nmi_form_counts,X
0D06 D0 04    BNE nmi_form_exit
0D08 E8       INX
0D09          .nmi_form_addr
0D09 BD 00 0E LDA nmi_form_bytes,X
0D0C          .nmi_form_exit
0D0C 40       RTI

Polling

Actively testing the FDC's status register to determine when a byte is ready is an equally valid method to sustain high speed data throughput. Polling-based transfers would require a tight loop with interrupts disabled, which we have shown is the minimum requirement with stack-balanced ISRs. The absence of NMI sequences and RTI instructions frees a small amount of time to implement more features, such as measuring and cutting off the transfer. With this approach the DRQ pin would have to be disconnected but INTRQ could remain attached (and a suitable ISR supplied) if desired.

Using SO

A negative-going edge on pin 38 of the 6502 CPU sets the overflow flag in the status register. A BVC instruction branching to itself produces a loop that exits within 3 clock cycles of lowering this pin. By connecting DRQ to this pin through an inverter the CPU can execute a loop transferring a byte to the FDC every 8 µs or slightly less. This is fast enough to consider octal density, also known as extended density (ED) or the 2.8 MB format. In that case all addresses touched must be at 2 MHz and no page boundaries can be crossed, limiting sectors to 128 or 256 bytes. This time the INTRQ signal, similarly inverted, will be needed to raise an NMI to break out of the loop.

        \ On entry X=low byte of address - 1
        \          (or &FF if sector size = 256 bytes)
        \          user=(address - 0) AND &FF00

.disc_op
        SEI
        CLV
.disc_op_loop
        BVC disc_op_loop
.disc_op_service
repeat(11) {
        LDA fdc_data
        CLV
        INX
        STA user,X
        BVC disc_op_loop
}
        LDA fdc_data
        CLV
        INX
        STA user,X
        BVS disc_op_service
        BVS disc_op_service
        BVS disc_op_service
        BVC disc_op_loop
        BVS disc_op_service

This is the fastest and most resilient polling loop possible, limited by the range of relative branching. It supports any mean request interval longer than 7.88 µs: in the worst case twelve requests 6½ to 8½ µs apart can cause the code to reach the penultimate or last instruction and loop back in 189 cycles. Typical request sequences with less jitter will be able to have a mean interval closer to the absolute limit of 7.54 µs.

Self-interrupting ISR

A method with potential when DRQ cannot be disconnected. The concept, due to Tom Seddon, is that NMIs occur in bursts and the ISR is repeated almost back-to-back; so it may as well be made long enough to interrupt itself, freeing the time taken by RTI to implement more features. As the busy loop can be suspended entirely during a burst and the ISR itself is re-entrant after a critical time, there is no need to stack and unstack a working set for every interrupt and we are pretty much back to polling, with the CPU doing the branch for us. From Tom's Mailing List post:

The NMI routine would go a bit like this:

                            \ +7 (7) -- NMI overhead
&D00    LDA FDC_DATA        \ +6 (13)
        STA (&C0),Y         \ +6 (19)
        INY                 \ +2 (21)
        BNE NO_BUMP:INC &C1 \ +7 (worst case) 28 <-- point A
        .NO_BUMP TXS        \ +2 (30)
        NOP                 \ +2 (32)
        NOP:NOP:NOP         \ +6 (cater for best-case timings)
        NOP                 \ the lucky NOP
        JMP READ_END

However, pushing the PC and status register is an unwanted side effect of receiving an NMI, and the ISR must reset the stack pointer with TXS, in place of the RTI. This saves four cycles at the tail end but gives us no more free registers in the ISR. The real advantage is that the ISR can wait for the next call with a string of NOPs, eliminating up to 5 cycles' latency at the head end. Care should be taken to assemble enough NOPs to cover the single density data rate (one NMI every 64 µs). Also the registers remain stable in the busy loop, letting us reduce the maximum length of instructions there. For instance bit 0 of the status register can be polled with:

.test_busy
        LDA #&01        \2 (3) cycles
        BIT fdc_status  \4
        BNE test_busy   \3

rather than

.test_busy
        LDY fdc_status  \4 (5) cycles
        STY temp        \3
        LSR temp        \5
        BCS test_busy   \3

(A taken branch adds one cycle of interrupt latency to the destination instruction.)

Bear in mind that in case of errors such as Sector not found, no DRQs may be issued at all. It remains to be seen how the extra time and freedom can be applied.

Applications

Other ways to use HD discs

Martin Barr reports that commodity 3½-inch high density discs work reliably when formatted at 500 kHz in FM. Often, high density discs cannot hold single- or double-density data for any length of time, even if the high density hole in the jacket is covered. With true high density being a third format this fourth, non-typical signal type was originally used on 8-inch floppies by the IBM 3740.

With the WDC x7xx family and the Intel 8271, obtaining this type is a simple matter of doubling the clock frequency while keeping the ~DDEN input high, if present. Existing ISRs capable of double density service can be reused as the data rate is the same. A double-density sector format such as ADFS, DDOS or Watford DDFS can be employed with few changes, or single density Acorn DFS which uses half of each track and so needs less media rotation per track, increasing access speed. Among the modifications that must be done to the former DFSs is that the disc formatting utility must prepare single density preambles to each sector header and data area.

See also

References

  1. 1.0 1.1 6502.org forum post by Hias Reichl, 3 September 2010
  2. Nesdev forum post by blargg, 18 June 2010

beardo 16:39, 8 October 2010 (UTC)