Difference between revisions of "High density floppy disc access"
m (Added category.) |
(→Balanced-stack ISRs: updated explanation of NMI latency) |
||
(4 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:Hardware]] | [[Category:Hardware]] | ||
− | [[File:High density screenshot.jpg| | + | [[File:High density screenshot.jpg|360px|right|Screenshot of a 510 KB catalogue]] |
It is possible to read, write and format floppy discs on the BBC Micro in | It is possible to read, write and format floppy discs on the BBC Micro in | ||
high density (that is, at 500 kHz bandwidth in MFM) with suitable hardware | high density (that is, at 500 kHz bandwidth in MFM) with suitable hardware | ||
− | and a specially coded floppy disc controller driver. | + | and a specially coded floppy disc controller driver. Besides DMA, which |
would require extensive modifications to the machine, there are many | would require extensive modifications to the machine, there are many | ||
possible approaches to achieving the higher throughput including polling, | possible approaches to achieving the higher throughput including polling, | ||
− | conventional ISRs and a self-interrupting ISR proposed by Tom Seddon. | + | conventional ISRs and a self-interrupting ISR proposed by Tom Seddon. The |
concept has been developed and proved with conventional ISRs and a | concept has been developed and proved with conventional ISRs and a | ||
supporting busy loop, driving a slightly modified disc interface from 1984. | supporting busy loop, driving a slightly modified disc interface from 1984. | ||
Line 13: | Line 13: | ||
==Hardware== | ==Hardware== | ||
− | [[File:High density modified board.jpg| | + | [[File:High density modified board.jpg|1120px|left|An interface modified for high density]] |
− | Prior to the 1983 introduction of the 'high density' | + | Prior to the 1983 introduction of the 'high density' 5 ¼-inch floppy drive |
in the IBM PC-AT, the same signal format was in use on 8-inch floppies in | in the IBM PC-AT, the same signal format was in use on 8-inch floppies in | ||
− | the IBM System/34 from 1978. | + | the IBM System/34 from 1978. Some mid-range controllers such as the WD279x |
series supported this format and were also adopted in early third-party disc | series supported this format and were also adopted in early third-party disc | ||
− | interface boards for the BBC Micro. | + | interface boards for the BBC Micro. Supplying the clock frequency specified |
for 8-inch operation is the only modification needed for such boards; | for 8-inch operation is the only modification needed for such boards; | ||
allowing frequency selection to permit continued single and double density | allowing frequency selection to permit continued single and double density | ||
Line 24: | Line 24: | ||
Later controllers designed exclusively for double density have been found | Later controllers designed exclusively for double density have been found | ||
− | good for high density work when overclocked. | + | good for high density work when overclocked. The Ajax controller fitted to |
the Atari STE is an apparently unmodified WD1772 rated at 16 MHz. | the Atari STE is an apparently unmodified WD1772 rated at 16 MHz. | ||
Experimenters have had promising though imperfect results doubling the clock | Experimenters have had promising though imperfect results doubling the clock | ||
− | rate to a standard | + | rate to a standard WD1770[http://www.stairwaytohell.com/sthforums/viewtopic.php?p=14418#p14418]. |
− | WD1770[http://www.stairwaytohell.com/sthforums/viewtopic.php?p=14418#p14418]. | ||
There is no foreseeable reason why other controller families designed for | There is no foreseeable reason why other controller families designed for | ||
high density should not be usable. | high density should not be usable. | ||
Line 38: | Line 37: | ||
===Timing=== | ===Timing=== | ||
− | [[File:NMI_timing_diagram. | + | [[File:NMI_timing_diagram.png|thumb|678px|right|Diagram of worst case NMI timings. At a conservative rate of 15 µs per interrupt, loading to a 1 MHz address fails, shown by crosses.]] |
The non-maskable interrupt (NMI) service routines have been carefully | The non-maskable interrupt (NMI) service routines have been carefully | ||
studied to ensure they meet the floppy drive controller's timing | studied to ensure they meet the floppy drive controller's timing | ||
requirements at high speed, which they do in all but one occasional | requirements at high speed, which they do in all but one occasional | ||
case. There are two constraints from the FDC: | case. There are two constraints from the FDC: | ||
− | # the ISR must become re-entrant within 15 | + | # the ISR must become re-entrant within 15 − ε µs of the interrupt; |
− | # the ISR must service the FDC within 11.5 | + | # the ISR must service the FDC within 11.5 µs of the interrupt. |
− | The first time interval is nominally 16 | + | The first time interval is nominally 16 µs but allowance is made for jitter |
− | and fast disc drives. | + | and fast disc drives. The ε (epsilon) means that in the worst case the NMIs will |
just miss one decision point and just catch the next. | just miss one decision point and just catch the next. | ||
There are three further constraints from the 6502 CPU and the BBC Micro's | There are three further constraints from the 6502 CPU and the BBC Micro's | ||
clock system: | clock system: | ||
− | # 6502 instructions are atomic, and instruction processing may continue for up to 4.5 | + | # 6502 instructions are atomic, and instruction processing may continue for up to 4.5 µs from the onset of an NMI. (This is composed of two clock cycles and one 7-cycle instruction; see below.) |
− | # After this instruction, the 6502 executes an NMI sequence for 3.5 | + | # After this instruction, the 6502 executes an NMI sequence for 3.5 µs. |
− | # A clock cycle that accesses a 1 MHz memory address is [[Cycle stretching|extended]] to 1 | + | # A clock cycle that accesses a 1 MHz memory address is [[Cycle stretching|extended]] to 1 µs, or 1.5 µs if out-of-sync with an underlying 1 MHz monotonic clock (1MHzE). Once the CPU is synchronised with this clock the length of later extended cycles can be predicted by machine code analysis. |
It has been found impossible to meet all these constraints, unless we | It has been found impossible to meet all these constraints, unless we | ||
Line 61: | Line 60: | ||
interrupts disabled. | interrupts disabled. | ||
− | We can exploit the atomic nature of 6502 instructions to buy time. | + | We can exploit the atomic nature of 6502 instructions to buy time. The ISR |
need only be re-entrant, not complete, by the next interrupt; so as | need only be re-entrant, not complete, by the next interrupt; so as | ||
long as the final state-changing instruction is in progress when an NMI | long as the final state-changing instruction is in progress when an NMI | ||
− | arrives, the behaviour will be correct. | + | arrives, the behaviour will be correct. The time this instruction takes to |
complete has already been budgeted-for as the 'previous' instruction of the | complete has already been budgeted-for as the 'previous' instruction of the | ||
next interrupt, and the stack will not overflow as long as the CPU gets | next interrupt, and the stack will not overflow as long as the CPU gets | ||
partway through RTI, on average. | partway through RTI, on average. | ||
− | According to some documentation, e.g. [http://www.nvg.org/bbc/doc/6502.txt 64doc] | + | According to some documentation, e.g. [http://www.nvg.org/bbc/doc/6502.txt 64doc] by Sonninen ''et al.'', to cause an |
− | by Sonninen et al., to cause an NMI sequence in place of the next | + | NMI sequence in place of the next instruction the NMI must occur before the |
− | instruction the NMI must occur before the last cycle of the current | + | last cycle of the current instruction. This is confirmed by traces<ref name="6502org-interrupt-bug"/> of 6502 |
− | instruction. | + | hardware, and caused by the 6502 sampling the NMI input only on certain |
− | of 6502 hardware, | + | cycles (usually the last) of each instruction. As we just missed the |
− | cycles | + | decision point in the worst case, the FDC service deadline comes 11 µs |
− | missed the decision point in the worst case, the FDC service deadline comes | + | after the start of that next instruction. In all cases currently, we are |
− | + | clear. If 64doc were incorrect then the timings would still hold true; all | |
− | + | the instruction rectangles would just move 1 clock cycle to the left such | |
− | + | that each ISR gets 0.5 µs more time to service the floppy drive controller. | |
− | |||
− | controller. | ||
− | The minimum time between Tube data channel accesses is 11.5 | + | The minimum time between Tube data channel accesses is 11.5 µs, which can be |
− | increased with NOPs to 14.5 | + | increased with NOPs to 14.5 µs. |
====Reading from disc to I/O memory==== | ====Reading from disc to I/O memory==== | ||
− | [[File:Nmi rdio 16us check. | + | [[File:Nmi rdio 16us check.png|thumb|690px|right|Loading slow memory is stable at 16 µs ber byte.]] |
A failure mode has been discovered when reading to 1 MHz memory areas under | A failure mode has been discovered when reading to 1 MHz memory areas under | ||
− | the above timings. | + | the above timings. As <tt>STA absolute,X</tt> instructions always take 5 |
cycles, the ISR may overflow the stack when traversing 1 MHz pages such as | cycles, the ISR may overflow the stack when traversing 1 MHz pages such as | ||
− | [[FRED]] and [[JIM]]. | + | [[FRED]] and [[JIM]]. Also if a 1 MHz address at the end of a page is met |
within a sector, the next few bytes may be written to the wrong page. | within a sector, the next few bytes may be written to the wrong page. | ||
In such cases failure can be avoided as long as NMIs arrive at the nominal | In such cases failure can be avoided as long as NMIs arrive at the nominal | ||
− | interval of 16 | + | interval of 16 µs (see diagram); note that drives with crystal controlled |
− | motors can stay very close to this figure. | + | synchronous motors can stay very close to this figure. 2 MHz memory can still be |
− | successfully loaded at 15 | + | successfully loaded at 15 µs per byte. |
====6502 interrupt bug==== | ====6502 interrupt bug==== | ||
The 6502 has recently been | The 6502 has recently been | ||
− | discovered<ref name=" | + | discovered<ref name="6502org-interrupt-bug">[http://forum.6502.org/viewtopic.php?p=11464#11464 6502.org forum post] by Hias Reichl, 3 September 2010</ref><ref name="nesdev-interrupt-bug">[http://nesdev.parodius.com/bbs/viewtopic.php?p=63094#63094 Nesdev forum post] by blargg, 18 June 2010</ref> to defer its |
response to IRQs or NMIs occurring just before the last cycle of a taken | response to IRQs or NMIs occurring just before the last cycle of a taken | ||
− | branch to the same page. | + | branch to the same page. In that case the branch completes and the |
instruction at the destination is executed before the interrupt sequence is | instruction at the destination is executed before the interrupt sequence is | ||
− | begun. | + | begun. The effect (as explained by Nesdev user ''blargg'') is that the |
branch adds one cycle to the maximum interrupt latency of the destination | branch adds one cycle to the maximum interrupt latency of the destination | ||
instruction. | instruction. | ||
− | This is in addition to the one cycle of latency | + | This is in addition to the usual one cycle of latency between sampling the |
− | the | + | NMI input and fetching the next opcode; here the sample is taken on the |
− | + | penultimate cycle instead of the last, making two cycles of latency in | |
+ | total. | ||
As long as all instructions in the busy loop, which are the target of | As long as all instructions in the busy loop, which are the target of | ||
Line 119: | Line 117: | ||
====Calculating in the general case==== | ====Calculating in the general case==== | ||
Those who wish to develop their own high density system can check if the | Those who wish to develop their own high density system can check if the | ||
− | worst case timings can be met, as follows. | + | worst case timings can be met, as follows. A useful ISR is assumed to have |
at least these elements: a fetch, an indexed store, a register increment | at least these elements: a fetch, an indexed store, a register increment | ||
instruction and a conditional branch for incrementing a pointer. | instruction and a conditional branch for incrementing a pointer. | ||
− | # Determine the minimum expected interval between NMIs, in clock cycles. | + | # Determine the minimum expected interval between NMIs, in clock cycles. The datasheet will list a nominal value but allow some room for jitter and drive speed variations. 30 is a good number for high density, 60 for double, 120 for single. |
− | # Calculate the maximum non-re-entrant service time. | + | # Calculate the maximum non-re-entrant service time. Start with 20 cycles: 7 for the NMI sequence, 4 for LDA, 5 for STA,X, 2 for INX and 2 for BNE. |
− | # Add the number of cycles in the longest running instruction in your busy loop. | + | # Add the number of cycles in the longest running instruction in your busy loop. If you have interrupts enabled, you should take this to be 7. |
# If any of the longest running instructions are the destination of a branch from the same page, add 1 cycle. | # If any of the longest running instructions are the destination of a branch from the same page, add 1 cycle. | ||
# If your FDC is a WD2791 or WD2795, add 2 cycles. | # If your FDC is a WD2791 or WD2795, add 2 cycles. | ||
# If your FDC is on the 1 MHz bus, add 2 cycles. | # If your FDC is on the 1 MHz bus, add 2 cycles. | ||
# If users will be saving or loading data into 1 MHz areas and expecting it to work, add 3 cycles. | # If users will be saving or loading data into 1 MHz areas and expecting it to work, add 3 cycles. | ||
− | # If the total is less than the NMI interval in cycles, the test passes. | + | # If the total is less than the NMI interval in cycles, the test passes. The ISR will make it to the INC instruction to cross a page boundary, and not overflow the stack the rest of the time. |
− | The experimental system has a score of 31 and so only guarantees the nominal 32 cycle (16 | + | The experimental system has a score of 31 and so only guarantees the nominal 32 cycle (16 µs) rate; see above. |
===Restrictions=== | ===Restrictions=== | ||
The small amount of time available means many of the usual features of a | The small amount of time available means many of the usual features of a | ||
disc ISR must be left out. There is not enough time to count the bytes and | disc ISR must be left out. There is not enough time to count the bytes and | ||
− | discard them after a certain number have been transferred. | + | discard them after a certain number have been transferred. Thus all |
transfers are rounded up to whole sectors; this is most critical in OSWORD | transfers are rounded up to whole sectors; this is most critical in OSWORD | ||
&7F which can no longer emulate the 8271's Read ID command faithfully, and | &7F which can no longer emulate the 8271's Read ID command faithfully, and | ||
less so in OSFILE where some memory beyond end-of-file will be overwritten, | less so in OSFILE where some memory beyond end-of-file will be overwritten, | ||
− | though in most cases this is not a significant problem. | + | though in most cases this is not a significant problem. If multiple sector |
transfer commands (%10x1xxxx) are to be used, the busy loop would have to | transfer commands (%10x1xxxx) are to be used, the busy loop would have to | ||
count the sectors and issue a Force Interrupt after the appropriate number, | count the sectors and issue a Force Interrupt after the appropriate number, | ||
Line 147: | Line 145: | ||
Nor is there time to test the FDC status register and determine whether this | Nor is there time to test the FDC status register and determine whether this | ||
is a data request (DRQ) or an interrupt request (INTRQ) signalling that the | is a data request (DRQ) or an interrupt request (INTRQ) signalling that the | ||
− | command has terminated. | + | command has terminated. In most WD1770-based boards these two lines are |
both connected to the CPU's NMI input, but in combination with high speed | both connected to the CPU's NMI input, but in combination with high speed | ||
ISRs this will cause dropped bytes when writing to disc and extra bytes when | ISRs this will cause dropped bytes when writing to disc and extra bytes when | ||
− | reading. | + | reading. The INTRQ line will need to be disconnected if using ISRs, which |
should not but may affect the hardware's compatibility with standard filing | should not but may affect the hardware's compatibility with standard filing | ||
system ROMs. | system ROMs. | ||
− | The ISR cannot afford to save the registers it uses. | + | The ISR cannot afford to save the registers it uses. These then become |
''volatile'' in the main thread and the AUG warns us on p.296, "If they are | ''volatile'' in the main thread and the AUG warns us on p.296, "If they are | ||
modified, the main program will suddenly find garbage in its registers in | modified, the main program will suddenly find garbage in its registers in | ||
− | the middle of some important processing. | + | the middle of some important processing. It is probable that a total system |
− | + | âcrashâ would result from this." Therefore all disc operations must be | |
− | confined within the busy loop with maskable interrupts disabled. | + | confined within the busy loop with maskable interrupts disabled. As the |
keyboard will not be read, commands cannot be 'typed ahead' while a file is | keyboard will not be read, commands cannot be 'typed ahead' while a file is | ||
loading; and the busy loop cannot use the volatile registers except for | loading; and the busy loop cannot use the volatile registers except for | ||
Line 165: | Line 163: | ||
On the other hand, the ISR can save state in the volatile registers; they | On the other hand, the ISR can save state in the volatile registers; they | ||
− | now become out-of-bounds to the busy loop altogether. | + | now become out-of-bounds to the busy loop altogether. For instance, X can |
become a running index register so that only the high address byte of a | become a running index register so that only the high address byte of a | ||
fetch or store instruction would need to be incremented in any interrupt | fetch or store instruction would need to be incremented in any interrupt | ||
− | period. | + | period. The low byte of the address should be kept at &00, so that fetches |
do not take an extra cycle to cross a page boundary. | do not take an extra cycle to cross a page boundary. | ||
===Byte-in-hand=== | ===Byte-in-hand=== | ||
With respect to write operations, to help meet the FDC's service time (11.5 | With respect to write operations, to help meet the FDC's service time (11.5 | ||
− | + | µs) each byte can be prefetched so that the ISR can send it to the | |
− | controller first thing before fetching the next byte. | + | controller first thing before fetching the next byte. The byte rests 'in |
hand' in one of the volatile registers between interrupts; the main routine | hand' in one of the volatile registers between interrupts; the main routine | ||
or busy loop prefetches the first byte of each request before sending the | or busy loop prefetches the first byte of each request before sending the | ||
first command. | first command. | ||
− | This means that one extra byte is fetched and discarded per request. | + | This means that one extra byte is fetched and discarded per request. When |
saving I/O memory there is a mild risk of a side effect when the area to be | saving I/O memory there is a mild risk of a side effect when the area to be | ||
saved ends near a memory-mapped register (though consider also the issue of | saved ends near a memory-mapped register (though consider also the issue of | ||
− | whole-sector transfers above. | + | whole-sector transfers above). Likewise with the Tube, and this is the only |
issue there when OSWORD &7F is the exclusive user of the data channel. | issue there when OSWORD &7F is the exclusive user of the data channel. | ||
However if the channel owner (e.g. OSGBPB) calls OSWORD &7F to fetch bytes | However if the channel owner (e.g. OSGBPB) calls OSWORD &7F to fetch bytes | ||
from the Tube on its behalf, then the owner will find some data has been | from the Tube on its behalf, then the owner will find some data has been | ||
− | dropped and subsequent bytes will be out of sequence. | + | dropped and subsequent bytes will be out of sequence. In such a case OSGBPB |
should ideally be structured so that the channel is reopened after OSWORD | should ideally be structured so that the channel is reopened after OSWORD | ||
&7F is called, as EDOSpat now does; however it may initially be easier to | &7F is called, as EDOSpat now does; however it may initially be easier to | ||
revert the ISR to the conventional, fetch-before-store form in the singular | revert the ISR to the conventional, fetch-before-store form in the singular | ||
− | case of saving from the Tube. | + | case of saving from the Tube. (The above does not apply to OSGBPB saving |
the file buffers in I/O memory.) | the file buffers in I/O memory.) | ||
Line 199: | Line 197: | ||
====Busy loop==== | ====Busy loop==== | ||
The main thread is confined to this loop while a disc operation is in progress. | The main thread is confined to this loop while a disc operation is in progress. | ||
− | \Based on NMI disc op routine from EDOS 0.4 by Alan Williams | + | \ Based on NMI disc op routine from EDOS 0.4 by Alan Williams |
− | \Target is the Opus WD2791 interface. | + | \ Target is the Opus WD2791 interface. |
− | \On entry A=ROM slot to access | + | \ On entry A=ROM slot to access |
− | \ | + | \ X=Value as required by ISR |
− | \ | + | \ Y=FDC command |
− | 0D10 . edospat_disc_op | + | 0D10 .edospat_disc_op |
0D10 85 F4 STA mos_romsel_copy | 0D10 85 F4 STA mos_romsel_copy | ||
0D12 8D 30 FE STA bbc_romsel | 0D12 8D 30 FE STA bbc_romsel | ||
− | \Loop while b7 and b5 both clear. | + | \ Loop while b7 and b5 both clear. |
− | \WD2791 drops some commands otherwise. | + | \ WD2791 drops some commands otherwise. |
− | 0D15 . edospat_disc_op_wait | + | 0D15 .edospat_disc_op_wait |
0D15 AD 80 FE LDA fdc_base+0 | 0D15 AD 80 FE LDA fdc_base+0 | ||
− | 0D18 49 5F EOR# st_sense%EOR disc_op_eor% | + | 0D18 49 5F EOR # st_sense%EOR disc_op_eor% |
− | 0D1A 29 A0 AND# disc_op_and% | + | 0D1A 29 A0 AND # disc_op_and% |
0D1C OPT ps% | 0D1C OPT ps% | ||
0D1C F0 F7 BEQ dest% | 0D1C F0 F7 BEQ dest% | ||
0D1E OPT FNbndrq( ps%, disc_op_bne%, edospat_disc_op_wait) | 0D1E OPT FNbndrq( ps%, disc_op_bne%, edospat_disc_op_wait) | ||
− | \Disable interrupts | + | \ Disable interrupts |
0D1E 08 PHP | 0D1E 08 PHP | ||
0D1F 78 SEI | 0D1F 78 SEI | ||
− | \First address, or load-immediate instructions pasted here. | + | \ First address, or load-immediate instructions pasted here. |
− | \The address may be in another ROM slot, hence loaded here. | + | \ The address may be in another ROM slot, hence loaded here. |
− | 0D20 . edospat_disc_op_addr | + | 0D20 .edospat_disc_op_addr |
0D20 AD 00 0E LDA nmi_form_bytes+0 | 0D20 AD 00 0E LDA nmi_form_bytes+0 | ||
0D23 EA NOP | 0D23 EA NOP | ||
− | \Send command. | + | \ Send command. A and X are now out of bounds |
− | \Then wait 50 us for status register to settle | + | \ Then wait 50 us for status register to settle |
0D24 8C 80 FE STY fdc_base+0 | 0D24 8C 80 FE STY fdc_base+0 | ||
− | 0D27 A0 14 LDY#20 | + | 0D27 A0 14 LDY #20 |
− | 0D29 . edospat_disc_op_settle | + | 0D29 .edospat_disc_op_settle |
0D29 88 DEY | 0D29 88 DEY | ||
0D2A D0 FD BNE edospat_disc_op_settle | 0D2A D0 FD BNE edospat_disc_op_settle | ||
− | \Loop until controller indicates ready. | + | \ Loop until controller indicates ready. |
− | 0D2C . edospat_disc_op_loop | + | 0D2C .edospat_disc_op_loop |
0D2C AC 80 FE LDY fdc_base+0 | 0D2C AC 80 FE LDY fdc_base+0 | ||
0D2F OPT ps% | 0D2F OPT ps% | ||
Line 244: | Line 242: | ||
0D31 OPT FNbnrdy( ps%, st_sense%, edospat_disc_op_loop) | 0D31 OPT FNbnrdy( ps%, st_sense%, edospat_disc_op_loop) | ||
− | \Loop until controller indicates not busy also. | + | \ Loop until controller indicates not busy also. |
− | 0D31 . edospat_disc_op_test_busy | + | 0D31 .edospat_disc_op_test_busy |
0D31 84 A0 STY edos_disc_op_temp | 0D31 84 A0 STY edos_disc_op_temp | ||
0D33 46 A0 LSR edos_disc_op_temp | 0D33 46 A0 LSR edos_disc_op_temp | ||
Line 252: | Line 250: | ||
0D37 OPT FNbbusy( ps%, st_sense%, edospat_disc_op_loop) | 0D37 OPT FNbbusy( ps%, st_sense%, edospat_disc_op_loop) | ||
− | \Save ISR's A and X for the next call. | + | \ Save ISR's A and X for the next call. |
− | 0D37 . edospat_disc_op_exit | + | 0D37 .edospat_disc_op_exit |
0D37 8E 21 0D STX edospat_disc_op_addr+1 | 0D37 8E 21 0D STX edospat_disc_op_addr+1 | ||
0D3A 8D 23 0D STA edospat_disc_op_addr+3 | 0D3A 8D 23 0D STA edospat_disc_op_addr+3 | ||
− | 0D3D A9 A2 LDA#&A2 \=LDX immediate | + | 0D3D A9 A2 LDA #&A2 \ =LDX immediate |
0D3F 8D 20 0D STA edospat_disc_op_addr | 0D3F 8D 20 0D STA edospat_disc_op_addr | ||
− | 0D42 A9 A9 LDA#&A9 \=LDA immediate | + | 0D42 A9 A9 LDA #&A9 \ =LDA immediate |
0D44 8D 22 0D STA edospat_disc_op_addr+2 | 0D44 8D 22 0D STA edospat_disc_op_addr+2 | ||
− | \Our state is saved, restore interrupt state | + | \ Our state is saved, restore interrupt state |
0D47 28 PLP | 0D47 28 PLP | ||
− | \Page the EDOS ROM back in | + | \ Page the EDOS ROM back in |
0D48 AD 60 0D LDA edos_disc_op_rom | 0D48 AD 60 0D LDA edos_disc_op_rom | ||
− | 0D4B . edospat_disc_op_switch_rom | + | 0D4B .edospat_disc_op_switch_rom |
0D4B 85 F4 STA mos_romsel_copy | 0D4B 85 F4 STA mos_romsel_copy | ||
0D4D 8D 30 FE STA bbc_romsel | 0D4D 8D 30 FE STA bbc_romsel | ||
− | \Restore X and Y to values on entry | + | \ Restore X and Y to values on entry |
0D50 98 TYA | 0D50 98 TYA | ||
0D51 A6 A1 LDX edos_idcount | 0D51 A6 A1 LDX edos_idcount | ||
0D53 AC 63 0D LDY edos_disc_op_cmd | 0D53 AC 63 0D LDY edos_disc_op_cmd | ||
− | \Present controller status in A and set flags | + | \ Present controller status in A and set flags |
− | 0D56 49 FF EOR# st_sense% | + | 0D56 49 FF EOR # st_sense% |
\return to OSWORD &7F routine | \return to OSWORD &7F routine | ||
Line 282: | Line 280: | ||
====Interrupt service routines==== | ====Interrupt service routines==== | ||
− | \Read from disc to I/O memory | + | \ Read from disc to I/O memory |
− | \On entry X=low byte of destination address | + | \ On entry X=low byte of destination address |
− | \ | + | \ ?&0D07=high byte |
− | 0D00 . nmi_rdio | + | 0D00 .nmi_rdio |
0D00 AD 83 FE LDA fdc_base+3 | 0D00 AD 83 FE LDA fdc_base+3 | ||
− | 0D03 49 FF EOR# sense% | + | 0D03 49 FF EOR # sense% |
− | 0D05 . nmi_rdio_addr | + | 0D05 .nmi_rdio_addr |
− | 0D05 9D 00 0D STA mos_nmi AND&FF00,X | + | 0D05 9D 00 0D STA mos_nmi AND &FF00,X |
0D08 E8 INX | 0D08 E8 INX | ||
0D09 D0 03 BNE nmi_rdio_exit | 0D09 D0 03 BNE nmi_rdio_exit | ||
0D0B EE 07 0D INC nmi_rdio_addr+2 | 0D0B EE 07 0D INC nmi_rdio_addr+2 | ||
− | 0D0E . nmi_rdio_exit | + | 0D0E .nmi_rdio_exit |
0D0E 40 RTI | 0D0E 40 RTI | ||
− | \Write from I/O memory to disc | + | \ Write from I/O memory to disc |
− | \On entry A=contents of source address | + | \ On entry A=contents of source address |
− | \ | + | \ X=low byte of source address |
− | \ | + | \ ?&0D0D=high byte |
− | 0D00 . nmi_wrio | + | 0D00 .nmi_wrio |
− | 0D00 49 FF EOR# sense% | + | 0D00 49 FF EOR # sense% |
0D02 8D 83 FE STA fdc_base+3 | 0D02 8D 83 FE STA fdc_base+3 | ||
0D05 E8 INX | 0D05 E8 INX | ||
0D06 D0 03 BNE nmi_wrio_addr | 0D06 D0 03 BNE nmi_wrio_addr | ||
0D08 EE 0D 0D INC nmi_wrio_addr+2 | 0D08 EE 0D 0D INC nmi_wrio_addr+2 | ||
− | 0D0B . nmi_wrio_addr | + | 0D0B .nmi_wrio_addr |
− | 0D0B BD 00 0D LDA mos_nmi AND&FF00,X | + | 0D0B BD 00 0D LDA mos_nmi AND &FF00,X |
− | 0D0E . nmi_wrio_exit | + | 0D0E .nmi_wrio_exit |
0D0E 40 RTI | 0D0E 40 RTI | ||
− | \Read from disc to coprocessor | + | \ Read from disc to coprocessor |
− | \No entry conditions | + | \ No entry conditions |
− | 0D00 . nmi_rdtu | + | 0D00 .nmi_rdtu |
0D00 AD 83 FE LDA fdc_base+3 | 0D00 AD 83 FE LDA fdc_base+3 | ||
− | 0D03 49 FF EOR# sense% | + | 0D03 49 FF EOR # sense% |
0D05 8D E5 FE STA tube_host_fifo_3 | 0D05 8D E5 FE STA tube_host_fifo_3 | ||
0D08 40 RTI | 0D08 40 RTI | ||
− | \Write from coprocessor to disc | + | \ Write from coprocessor to disc |
− | \On entry A=byte to be written | + | \ On entry A=byte to be written |
− | 0D00 . nmi_wrtu | + | 0D00 .nmi_wrtu |
− | 0D00 49 FF EOR# sense% | + | 0D00 49 FF EOR # sense% |
0D02 8D 83 FE STA fdc_base+3 | 0D02 8D 83 FE STA fdc_base+3 | ||
0D05 AD E5 FE LDA tube_host_fifo_3 | 0D05 AD E5 FE LDA tube_host_fifo_3 | ||
0D08 40 RTI | 0D08 40 RTI | ||
− | \Read sector IDs to coprocessor | + | \ Read sector IDs to coprocessor |
− | \On entry X=bytes remaining (initially X=4) | + | \ On entry X=bytes remaining (initially X=4) |
− | 0D00 . nmi_idtu | + | 0D00 .nmi_idtu |
0D00 AD 83 FE LDA fdc_base+3 | 0D00 AD 83 FE LDA fdc_base+3 | ||
− | 0D03 49 FF EOR# sense% | + | 0D03 49 FF EOR # sense% |
0D05 CA DEX | 0D05 CA DEX | ||
0D06 30 03 BMI nmi_idtu_exit | 0D06 30 03 BMI nmi_idtu_exit | ||
0D08 8D E5 FE STA tube_host_fifo_3 | 0D08 8D E5 FE STA tube_host_fifo_3 | ||
− | 0D0B . nmi_idtu_exit | + | 0D0B .nmi_idtu_exit |
0D0B 40 RTI | 0D0B 40 RTI | ||
− | \Verify disc | + | \ Verify disc |
− | \No entry conditions | + | \ No entry conditions |
− | 0D00 . nmi_veri | + | 0D00 .nmi_veri |
0D00 2C 83 FE BIT fdc_base+3 | 0D00 2C 83 FE BIT fdc_base+3 | ||
0D03 40 RTI | 0D03 40 RTI | ||
− | \Format disc from RLE table in pages &0E..&0F | + | \ Format disc from RLE table in pages &0E..&0F |
− | \On entry A=current byte from table (pre-inverted) | + | \ On entry A=current byte from table (pre-inverted) |
− | \ | + | \ X=current table index (initially 0) |
− | 0D00 . nmi_form | + | 0D00 .nmi_form |
0D00 8D 83 FE STA fdc_base+3 | 0D00 8D 83 FE STA fdc_base+3 | ||
0D03 DE 00 0F DEC nmi_form_counts,X | 0D03 DE 00 0F DEC nmi_form_counts,X | ||
0D06 D0 04 BNE nmi_form_exit | 0D06 D0 04 BNE nmi_form_exit | ||
0D08 E8 INX | 0D08 E8 INX | ||
− | 0D09 . nmi_form_addr | + | 0D09 .nmi_form_addr |
0D09 BD 00 0E LDA nmi_form_bytes,X | 0D09 BD 00 0E LDA nmi_form_bytes,X | ||
− | 0D0C . nmi_form_exit | + | 0D0C .nmi_form_exit |
0D0C 40 RTI | 0D0C 40 RTI | ||
Line 371: | Line 369: | ||
The absence of NMI sequences and RTI instructions frees a small amount of | The absence of NMI sequences and RTI instructions frees a small amount of | ||
time to implement more features, such as measuring and cutting off the | time to implement more features, such as measuring and cutting off the | ||
− | transfer. | + | transfer. With this approach the DRQ pin would have to be disconnected but |
INTRQ could remain attached (and a suitable ISR supplied) if desired. | INTRQ could remain attached (and a suitable ISR supplied) if desired. | ||
Line 377: | Line 375: | ||
A negative-going edge on pin 38 of the 6502 CPU sets the overflow flag in | A negative-going edge on pin 38 of the 6502 CPU sets the overflow flag in | ||
the status register. A BVC instruction branching to itself produces a loop | the status register. A BVC instruction branching to itself produces a loop | ||
− | that exits within 3 clock cycles of lowering this pin. | + | that exits within 3 clock cycles of lowering this pin. By connecting DRQ to |
this pin through an inverter the CPU can execute a loop transferring a byte | this pin through an inverter the CPU can execute a loop transferring a byte | ||
− | to the FDC every 8 | + | to the FDC every 8 µs or slightly less. This is fast enough to consider |
− | octal density, also known as extended density (ED) or the 2.8 MB format. | + | octal density, also known as extended density (ED) or the 2.8 MB format. In |
that case all addresses touched must be at 2 MHz and no page boundaries can | that case all addresses touched must be at 2 MHz and no page boundaries can | ||
− | be crossed, limiting sectors to 128 or 256 bytes. | + | be crossed, limiting sectors to 128 or 256 bytes. This time the INTRQ |
signal, similarly inverted, will be needed to raise an NMI to break out of | signal, similarly inverted, will be needed to raise an NMI to break out of | ||
the loop. | the loop. | ||
− | \On entry X=low byte of address - 1 | + | \ On entry X=low byte of address - 1 |
− | \ | + | \ (or &FF if sector size = 256 bytes) |
− | \ | + | \ user=(address - 0) AND &FF00 |
.disc_op | .disc_op | ||
Line 414: | Line 412: | ||
This is the fastest and most resilient polling loop possible, limited by the | This is the fastest and most resilient polling loop possible, limited by the | ||
− | range of relative branching. | + | range of relative branching. It supports any mean request interval longer |
− | than 7.88 | + | than 7.88 µs: in the worst case twelve requests 6½ to 8½ µs apart can |
cause the code to reach the penultimate or last instruction and loop back in | cause the code to reach the penultimate or last instruction and loop back in | ||
− | 189 cycles. | + | 189 cycles. Typical request sequences with less jitter will be able to have |
− | a mean interval closer to the absolute limit of 7.54 | + | a mean interval closer to the absolute limit of 7.54 µs. |
==Self-interrupting ISR== | ==Self-interrupting ISR== | ||
− | A method with potential when DRQ cannot be disconnected. | + | A method with potential when DRQ cannot be disconnected. The concept, due |
to Tom Seddon, is that NMIs occur in bursts and the ISR is repeated almost | to Tom Seddon, is that NMIs occur in bursts and the ISR is repeated almost | ||
back-to-back; so it may as well be made long enough to interrupt itself, | back-to-back; so it may as well be made long enough to interrupt itself, | ||
− | freeing the time taken by RTI to implement more features. | + | freeing the time taken by RTI to implement more features. As the busy loop |
can be suspended entirely during a burst and the ISR itself is re-entrant | can be suspended entirely during a burst and the ISR itself is re-entrant | ||
after a critical time, there is no need to stack and unstack a working set | after a critical time, there is no need to stack and unstack a working set | ||
for every interrupt and we are pretty much back to polling, with the CPU | for every interrupt and we are pretty much back to polling, with the CPU | ||
− | doing the branch for us. | + | doing the branch for us. From Tom's |
[http://mdfs.net/Archive/BBCMicro/2005/10/30/005559.htm Mailing List post]: | [http://mdfs.net/Archive/BBCMicro/2005/10/30/005559.htm Mailing List post]: | ||
The NMI routine would go a bit like this: | The NMI routine would go a bit like this: | ||
− | + | \ +7 (7) -- NMI overhead | |
− | &D00 LDA FDC_DATA \ +6 (13) | + | &D00 LDA FDC_DATA \ +6 (13) |
− | STA (&C0),Y \ +6 (19) | + | STA (&C0),Y \ +6 (19) |
− | INY \ +2 (21) | + | INY \ +2 (21) |
BNE NO_BUMP:INC &C1 \ +7 (worst case) 28 <-- point A | BNE NO_BUMP:INC &C1 \ +7 (worst case) 28 <-- point A | ||
− | .NO_BUMP TXS \ +2 (30) | + | .NO_BUMP TXS \ +2 (30) |
− | NOP \ +2 (32) | + | NOP \ +2 (32) |
− | NOP:NOP:NOP \ +6 (cater for best-case timings) | + | NOP:NOP:NOP \ +6 (cater for best-case timings) |
− | NOP \ the lucky NOP | + | NOP \ the lucky NOP |
JMP READ_END | JMP READ_END | ||
However, pushing the PC and status register is an unwanted side effect of | However, pushing the PC and status register is an unwanted side effect of | ||
receiving an NMI, and the ISR must reset the stack pointer with TXS, in | receiving an NMI, and the ISR must reset the stack pointer with TXS, in | ||
− | place of the RTI. | + | place of the RTI. This saves four cycles at the tail end but gives us no |
− | more free registers in the ISR. | + | more free registers in the ISR. The real advantage is that the ISR can wait |
for the next call with a string of NOPs, eliminating up to 5 cycles' latency | for the next call with a string of NOPs, eliminating up to 5 cycles' latency | ||
− | at the head end. | + | at the head end. Care should be taken to assemble enough NOPs to cover the |
− | single density data rate (one NMI every 64 | + | single density data rate (one NMI every 64 µs). Also the registers remain |
stable in the busy loop, letting us reduce the maximum length of | stable in the busy loop, letting us reduce the maximum length of | ||
− | instructions there. | + | instructions there. For instance bit 0 of the status register can be polled |
with: | with: | ||
Line 471: | Line 469: | ||
Bear in mind that in case of errors such as <tt>Sector not found</tt>, no | Bear in mind that in case of errors such as <tt>Sector not found</tt>, no | ||
− | DRQs may be issued at all. | + | DRQs may be issued at all. It remains to be seen how the extra time and |
freedom can be applied. | freedom can be applied. | ||
Line 480: | Line 478: | ||
==Other ways to use HD discs== | ==Other ways to use HD discs== | ||
− | Martin Barr reports that commodity | + | Martin Barr reports that commodity 3½-inch high density discs work reliably |
− | when formatted at 500 kHz in FM. | + | when formatted at 500 kHz in FM. Often, high density discs cannot hold |
single- or double-density data for any length of time, even if the high | single- or double-density data for any length of time, even if the high | ||
− | density hole in the jacket is covered. | + | density hole in the jacket is covered. With true high density being a third |
format this fourth, non-typical signal type was originally used on 8-inch | format this fourth, non-typical signal type was originally used on 8-inch | ||
floppies by the IBM 3740. | floppies by the IBM 3740. | ||
Line 489: | Line 487: | ||
With the WDC x7xx family and the Intel 8271, obtaining this type is a simple | With the WDC x7xx family and the Intel 8271, obtaining this type is a simple | ||
matter of doubling the clock frequency while keeping the ~DDEN input high, | matter of doubling the clock frequency while keeping the ~DDEN input high, | ||
− | if present. | + | if present. Existing ISRs capable of double density service can be reused |
− | as the data rate is the same. | + | as the data rate is the same. A double-density sector format such as ADFS, |
DDOS or Watford DDFS can be employed with few changes, or single density | DDOS or Watford DDFS can be employed with few changes, or single density | ||
Acorn DFS which uses half of each track and so needs less media rotation per | Acorn DFS which uses half of each track and so needs less media rotation per | ||
− | track, increasing access speed. | + | track, increasing access speed. Among the modifications that must be done |
to the former DFSs is that the disc formatting utility must prepare single | to the former DFSs is that the disc formatting utility must prepare single | ||
density preambles to each sector header and data area. | density preambles to each sector header and data area. |
Latest revision as of 19:37, 6 October 2019
It is possible to read, write and format floppy discs on the BBC Micro in high density (that is, at 500 kHz bandwidth in MFM) with suitable hardware and a specially coded floppy disc controller driver. Besides DMA, which would require extensive modifications to the machine, there are many possible approaches to achieving the higher throughput including polling, conventional ISRs and a self-interrupting ISR proposed by Tom Seddon. The concept has been developed and proved with conventional ISRs and a supporting busy loop, driving a slightly modified disc interface from 1984. Naturally any of the techniques mentioned here require a high density disc and disc drive.
Contents
Hardware
Prior to the 1983 introduction of the 'high density' 5 ¼-inch floppy drive in the IBM PC-AT, the same signal format was in use on 8-inch floppies in the IBM System/34 from 1978. Some mid-range controllers such as the WD279x series supported this format and were also adopted in early third-party disc interface boards for the BBC Micro. Supplying the clock frequency specified for 8-inch operation is the only modification needed for such boards; allowing frequency selection to permit continued single and double density operation is a useful extra.
Later controllers designed exclusively for double density have been found good for high density work when overclocked. The Ajax controller fitted to the Atari STE is an apparently unmodified WD1772 rated at 16 MHz. Experimenters have had promising though imperfect results doubling the clock rate to a standard WD1770[1]. There is no foreseeable reason why other controller families designed for high density should not be usable.
Balanced-stack ISRs
In this instance balanced-stack means that the ISR anticipates completion before the next interrupt, on average, and does not interfere with the stack so as to defray overflow.
Timing
The non-maskable interrupt (NMI) service routines have been carefully studied to ensure they meet the floppy drive controller's timing requirements at high speed, which they do in all but one occasional case. There are two constraints from the FDC:
- the ISR must become re-entrant within 15 − ε µs of the interrupt;
- the ISR must service the FDC within 11.5 µs of the interrupt.
The first time interval is nominally 16 µs but allowance is made for jitter and fast disc drives. The ε (epsilon) means that in the worst case the NMIs will just miss one decision point and just catch the next. There are three further constraints from the 6502 CPU and the BBC Micro's clock system:
- 6502 instructions are atomic, and instruction processing may continue for up to 4.5 µs from the onset of an NMI. (This is composed of two clock cycles and one 7-cycle instruction; see below.)
- After this instruction, the 6502 executes an NMI sequence for 3.5 µs.
- A clock cycle that accesses a 1 MHz memory address is extended to 1 µs, or 1.5 µs if out-of-sync with an underlying 1 MHz monotonic clock (1MHzE). Once the CPU is synchronised with this clock the length of later extended cycles can be predicted by machine code analysis.
It has been found impossible to meet all these constraints, unless we ensure that no instruction longer than 6 cycles is executing when the NMI occurs, which can be arranged by carefully coding a busy loop with interrupts disabled.
We can exploit the atomic nature of 6502 instructions to buy time. The ISR need only be re-entrant, not complete, by the next interrupt; so as long as the final state-changing instruction is in progress when an NMI arrives, the behaviour will be correct. The time this instruction takes to complete has already been budgeted-for as the 'previous' instruction of the next interrupt, and the stack will not overflow as long as the CPU gets partway through RTI, on average.
According to some documentation, e.g. 64doc by Sonninen et al., to cause an NMI sequence in place of the next instruction the NMI must occur before the last cycle of the current instruction. This is confirmed by traces[1] of 6502 hardware, and caused by the 6502 sampling the NMI input only on certain cycles (usually the last) of each instruction. As we just missed the decision point in the worst case, the FDC service deadline comes 11 µs after the start of that next instruction. In all cases currently, we are clear. If 64doc were incorrect then the timings would still hold true; all the instruction rectangles would just move 1 clock cycle to the left such that each ISR gets 0.5 µs more time to service the floppy drive controller.
The minimum time between Tube data channel accesses is 11.5 µs, which can be increased with NOPs to 14.5 µs.
Reading from disc to I/O memory
A failure mode has been discovered when reading to 1 MHz memory areas under the above timings. As STA absolute,X instructions always take 5 cycles, the ISR may overflow the stack when traversing 1 MHz pages such as FRED and JIM. Also if a 1 MHz address at the end of a page is met within a sector, the next few bytes may be written to the wrong page.
In such cases failure can be avoided as long as NMIs arrive at the nominal interval of 16 µs (see diagram); note that drives with crystal controlled synchronous motors can stay very close to this figure. 2 MHz memory can still be successfully loaded at 15 µs per byte.
6502 interrupt bug
The 6502 has recently been discovered[1][2] to defer its response to IRQs or NMIs occurring just before the last cycle of a taken branch to the same page. In that case the branch completes and the instruction at the destination is executed before the interrupt sequence is begun. The effect (as explained by Nesdev user blargg) is that the branch adds one cycle to the maximum interrupt latency of the destination instruction.
This is in addition to the usual one cycle of latency between sampling the NMI input and fetching the next opcode; here the sample is taken on the penultimate cycle instead of the last, making two cycles of latency in total.
As long as all instructions in the busy loop, which are the target of branches from the same page, are shorter in duration than the longest instruction in the loop, then there is no impact on the worst-case interrupt latency of the system.
Calculating in the general case
Those who wish to develop their own high density system can check if the worst case timings can be met, as follows. A useful ISR is assumed to have at least these elements: a fetch, an indexed store, a register increment instruction and a conditional branch for incrementing a pointer.
- Determine the minimum expected interval between NMIs, in clock cycles. The datasheet will list a nominal value but allow some room for jitter and drive speed variations. 30 is a good number for high density, 60 for double, 120 for single.
- Calculate the maximum non-re-entrant service time. Start with 20 cycles: 7 for the NMI sequence, 4 for LDA, 5 for STA,X, 2 for INX and 2 for BNE.
- Add the number of cycles in the longest running instruction in your busy loop. If you have interrupts enabled, you should take this to be 7.
- If any of the longest running instructions are the destination of a branch from the same page, add 1 cycle.
- If your FDC is a WD2791 or WD2795, add 2 cycles.
- If your FDC is on the 1 MHz bus, add 2 cycles.
- If users will be saving or loading data into 1 MHz areas and expecting it to work, add 3 cycles.
- If the total is less than the NMI interval in cycles, the test passes. The ISR will make it to the INC instruction to cross a page boundary, and not overflow the stack the rest of the time.
The experimental system has a score of 31 and so only guarantees the nominal 32 cycle (16 µs) rate; see above.
Restrictions
The small amount of time available means many of the usual features of a disc ISR must be left out. There is not enough time to count the bytes and discard them after a certain number have been transferred. Thus all transfers are rounded up to whole sectors; this is most critical in OSWORD &7F which can no longer emulate the 8271's Read ID command faithfully, and less so in OSFILE where some memory beyond end-of-file will be overwritten, though in most cases this is not a significant problem. If multiple sector transfer commands (%10x1xxxx) are to be used, the busy loop would have to count the sectors and issue a Force Interrupt after the appropriate number, but the main routine can issue a chain of single sector transfers instead.
Nor is there time to test the FDC status register and determine whether this is a data request (DRQ) or an interrupt request (INTRQ) signalling that the command has terminated. In most WD1770-based boards these two lines are both connected to the CPU's NMI input, but in combination with high speed ISRs this will cause dropped bytes when writing to disc and extra bytes when reading. The INTRQ line will need to be disconnected if using ISRs, which should not but may affect the hardware's compatibility with standard filing system ROMs.
The ISR cannot afford to save the registers it uses. These then become volatile in the main thread and the AUG warns us on p.296, "If they are modified, the main program will suddenly find garbage in its registers in the middle of some important processing. It is probable that a total system âcrashâ would result from this." Therefore all disc operations must be confined within the busy loop with maskable interrupts disabled. As the keyboard will not be read, commands cannot be 'typed ahead' while a file is loading; and the busy loop cannot use the volatile registers except for their side effects, such as setting flags.
On the other hand, the ISR can save state in the volatile registers; they now become out-of-bounds to the busy loop altogether. For instance, X can become a running index register so that only the high address byte of a fetch or store instruction would need to be incremented in any interrupt period. The low byte of the address should be kept at &00, so that fetches do not take an extra cycle to cross a page boundary.
Byte-in-hand
With respect to write operations, to help meet the FDC's service time (11.5 µs) each byte can be prefetched so that the ISR can send it to the controller first thing before fetching the next byte. The byte rests 'in hand' in one of the volatile registers between interrupts; the main routine or busy loop prefetches the first byte of each request before sending the first command.
This means that one extra byte is fetched and discarded per request. When saving I/O memory there is a mild risk of a side effect when the area to be saved ends near a memory-mapped register (though consider also the issue of whole-sector transfers above). Likewise with the Tube, and this is the only issue there when OSWORD &7F is the exclusive user of the data channel.
However if the channel owner (e.g. OSGBPB) calls OSWORD &7F to fetch bytes from the Tube on its behalf, then the owner will find some data has been dropped and subsequent bytes will be out of sequence. In such a case OSGBPB should ideally be structured so that the channel is reopened after OSWORD &7F is called, as EDOSpat now does; however it may initially be easier to revert the ISR to the conventional, fetch-before-store form in the singular case of saving from the Tube. (The above does not apply to OSGBPB saving the file buffers in I/O memory.)
Code
Tested code fragments adapted from the assembly output of EDOSPAT 5.10.
Busy loop
The main thread is confined to this loop while a disc operation is in progress.
\ Based on NMI disc op routine from EDOS 0.4 by Alan Williams \ Target is the Opus WD2791 interface. \ On entry A=ROM slot to access \ X=Value as required by ISR \ Y=FDC command 0D10 .edospat_disc_op 0D10 85 F4 STA mos_romsel_copy 0D12 8D 30 FE STA bbc_romsel \ Loop while b7 and b5 both clear. \ WD2791 drops some commands otherwise. 0D15 .edospat_disc_op_wait 0D15 AD 80 FE LDA fdc_base+0 0D18 49 5F EOR # st_sense%EOR disc_op_eor% 0D1A 29 A0 AND # disc_op_and% 0D1C OPT ps% 0D1C F0 F7 BEQ dest% 0D1E OPT FNbndrq( ps%, disc_op_bne%, edospat_disc_op_wait) \ Disable interrupts 0D1E 08 PHP 0D1F 78 SEI \ First address, or load-immediate instructions pasted here. \ The address may be in another ROM slot, hence loaded here. 0D20 .edospat_disc_op_addr 0D20 AD 00 0E LDA nmi_form_bytes+0 0D23 EA NOP \ Send command. A and X are now out of bounds \ Then wait 50 us for status register to settle 0D24 8C 80 FE STY fdc_base+0 0D27 A0 14 LDY #20 0D29 .edospat_disc_op_settle 0D29 88 DEY 0D2A D0 FD BNE edospat_disc_op_settle \ Loop until controller indicates ready. 0D2C .edospat_disc_op_loop 0D2C AC 80 FE LDY fdc_base+0 0D2F OPT ps% 0D2F 10 FB BPL dest% 0D31 OPT FNbnrdy( ps%, st_sense%, edospat_disc_op_loop) \ Loop until controller indicates not busy also. 0D31 .edospat_disc_op_test_busy 0D31 84 A0 STY edos_disc_op_temp 0D33 46 A0 LSR edos_disc_op_temp 0D35 OPT ps% 0D35 90 F5 BCC dest% 0D37 OPT FNbbusy( ps%, st_sense%, edospat_disc_op_loop) \ Save ISR's A and X for the next call. 0D37 .edospat_disc_op_exit 0D37 8E 21 0D STX edospat_disc_op_addr+1 0D3A 8D 23 0D STA edospat_disc_op_addr+3 0D3D A9 A2 LDA #&A2 \ =LDX immediate 0D3F 8D 20 0D STA edospat_disc_op_addr 0D42 A9 A9 LDA #&A9 \ =LDA immediate 0D44 8D 22 0D STA edospat_disc_op_addr+2 \ Our state is saved, restore interrupt state 0D47 28 PLP \ Page the EDOS ROM back in 0D48 AD 60 0D LDA edos_disc_op_rom 0D4B .edospat_disc_op_switch_rom 0D4B 85 F4 STA mos_romsel_copy 0D4D 8D 30 FE STA bbc_romsel \ Restore X and Y to values on entry 0D50 98 TYA 0D51 A6 A1 LDX edos_idcount 0D53 AC 63 0D LDY edos_disc_op_cmd \ Present controller status in A and set flags 0D56 49 FF EOR # st_sense% \return to OSWORD &7F routine 0D58 60 RTS
Interrupt service routines
\ Read from disc to I/O memory \ On entry X=low byte of destination address \ ?&0D07=high byte 0D00 .nmi_rdio 0D00 AD 83 FE LDA fdc_base+3 0D03 49 FF EOR # sense% 0D05 .nmi_rdio_addr 0D05 9D 00 0D STA mos_nmi AND &FF00,X 0D08 E8 INX 0D09 D0 03 BNE nmi_rdio_exit 0D0B EE 07 0D INC nmi_rdio_addr+2 0D0E .nmi_rdio_exit 0D0E 40 RTI
\ Write from I/O memory to disc \ On entry A=contents of source address \ X=low byte of source address \ ?&0D0D=high byte 0D00 .nmi_wrio 0D00 49 FF EOR # sense% 0D02 8D 83 FE STA fdc_base+3 0D05 E8 INX 0D06 D0 03 BNE nmi_wrio_addr 0D08 EE 0D 0D INC nmi_wrio_addr+2 0D0B .nmi_wrio_addr 0D0B BD 00 0D LDA mos_nmi AND &FF00,X 0D0E .nmi_wrio_exit 0D0E 40 RTI
\ Read from disc to coprocessor \ No entry conditions 0D00 .nmi_rdtu 0D00 AD 83 FE LDA fdc_base+3 0D03 49 FF EOR # sense% 0D05 8D E5 FE STA tube_host_fifo_3 0D08 40 RTI
\ Write from coprocessor to disc \ On entry A=byte to be written 0D00 .nmi_wrtu 0D00 49 FF EOR # sense% 0D02 8D 83 FE STA fdc_base+3 0D05 AD E5 FE LDA tube_host_fifo_3 0D08 40 RTI
\ Read sector IDs to coprocessor \ On entry X=bytes remaining (initially X=4) 0D00 .nmi_idtu 0D00 AD 83 FE LDA fdc_base+3 0D03 49 FF EOR # sense% 0D05 CA DEX 0D06 30 03 BMI nmi_idtu_exit 0D08 8D E5 FE STA tube_host_fifo_3 0D0B .nmi_idtu_exit 0D0B 40 RTI
\ Verify disc \ No entry conditions 0D00 .nmi_veri 0D00 2C 83 FE BIT fdc_base+3 0D03 40 RTI
\ Format disc from RLE table in pages &0E..&0F \ On entry A=current byte from table (pre-inverted) \ X=current table index (initially 0) 0D00 .nmi_form 0D00 8D 83 FE STA fdc_base+3 0D03 DE 00 0F DEC nmi_form_counts,X 0D06 D0 04 BNE nmi_form_exit 0D08 E8 INX 0D09 .nmi_form_addr 0D09 BD 00 0E LDA nmi_form_bytes,X 0D0C .nmi_form_exit 0D0C 40 RTI
Polling
Actively testing the FDC's status register to determine when a byte is ready is an equally valid method to sustain high speed data throughput. Polling-based transfers would require a tight loop with interrupts disabled, which we have shown is the minimum requirement with stack-balanced ISRs. The absence of NMI sequences and RTI instructions frees a small amount of time to implement more features, such as measuring and cutting off the transfer. With this approach the DRQ pin would have to be disconnected but INTRQ could remain attached (and a suitable ISR supplied) if desired.
Using SO
A negative-going edge on pin 38 of the 6502 CPU sets the overflow flag in the status register. A BVC instruction branching to itself produces a loop that exits within 3 clock cycles of lowering this pin. By connecting DRQ to this pin through an inverter the CPU can execute a loop transferring a byte to the FDC every 8 µs or slightly less. This is fast enough to consider octal density, also known as extended density (ED) or the 2.8 MB format. In that case all addresses touched must be at 2 MHz and no page boundaries can be crossed, limiting sectors to 128 or 256 bytes. This time the INTRQ signal, similarly inverted, will be needed to raise an NMI to break out of the loop.
\ On entry X=low byte of address - 1 \ (or &FF if sector size = 256 bytes) \ user=(address - 0) AND &FF00 .disc_op SEI CLV .disc_op_loop BVC disc_op_loop .disc_op_service repeat(11) { LDA fdc_data CLV INX STA user,X BVC disc_op_loop } LDA fdc_data CLV INX STA user,X BVS disc_op_service BVS disc_op_service BVS disc_op_service BVC disc_op_loop BVS disc_op_service
This is the fastest and most resilient polling loop possible, limited by the range of relative branching. It supports any mean request interval longer than 7.88 µs: in the worst case twelve requests 6½ to 8½ µs apart can cause the code to reach the penultimate or last instruction and loop back in 189 cycles. Typical request sequences with less jitter will be able to have a mean interval closer to the absolute limit of 7.54 µs.
Self-interrupting ISR
A method with potential when DRQ cannot be disconnected. The concept, due to Tom Seddon, is that NMIs occur in bursts and the ISR is repeated almost back-to-back; so it may as well be made long enough to interrupt itself, freeing the time taken by RTI to implement more features. As the busy loop can be suspended entirely during a burst and the ISR itself is re-entrant after a critical time, there is no need to stack and unstack a working set for every interrupt and we are pretty much back to polling, with the CPU doing the branch for us. From Tom's Mailing List post:
The NMI routine would go a bit like this: \ +7 (7) -- NMI overhead &D00 LDA FDC_DATA \ +6 (13) STA (&C0),Y \ +6 (19) INY \ +2 (21) BNE NO_BUMP:INC &C1 \ +7 (worst case) 28 <-- point A .NO_BUMP TXS \ +2 (30) NOP \ +2 (32) NOP:NOP:NOP \ +6 (cater for best-case timings) NOP \ the lucky NOP JMP READ_END
However, pushing the PC and status register is an unwanted side effect of receiving an NMI, and the ISR must reset the stack pointer with TXS, in place of the RTI. This saves four cycles at the tail end but gives us no more free registers in the ISR. The real advantage is that the ISR can wait for the next call with a string of NOPs, eliminating up to 5 cycles' latency at the head end. Care should be taken to assemble enough NOPs to cover the single density data rate (one NMI every 64 µs). Also the registers remain stable in the busy loop, letting us reduce the maximum length of instructions there. For instance bit 0 of the status register can be polled with:
.test_busy LDA #&01 \2 (3) cycles BIT fdc_status \4 BNE test_busy \3
rather than
.test_busy LDY fdc_status \4 (5) cycles STY temp \3 LSR temp \5 BCS test_busy \3
(A taken branch adds one cycle of interrupt latency to the destination instruction.)
Bear in mind that in case of errors such as Sector not found, no DRQs may be issued at all. It remains to be seen how the extra time and freedom can be applied.
Applications
Other ways to use HD discs
Martin Barr reports that commodity 3½-inch high density discs work reliably when formatted at 500 kHz in FM. Often, high density discs cannot hold single- or double-density data for any length of time, even if the high density hole in the jacket is covered. With true high density being a third format this fourth, non-typical signal type was originally used on 8-inch floppies by the IBM 3740.
With the WDC x7xx family and the Intel 8271, obtaining this type is a simple matter of doubling the clock frequency while keeping the ~DDEN input high, if present. Existing ISRs capable of double density service can be reused as the data rate is the same. A double-density sector format such as ADFS, DDOS or Watford DDFS can be employed with few changes, or single density Acorn DFS which uses half of each track and so needs less media rotation per track, increasing access speed. Among the modifications that must be done to the former DFSs is that the disc formatting utility must prepare single density preambles to each sector header and data area.
See also
- 'High density disks' thread on STH
References
- ↑ 1.0 1.1 6502.org forum post by Hias Reichl, 3 September 2010
- ↑ Nesdev forum post by blargg, 18 June 2010
beardo 16:39, 8 October 2010 (UTC)