14th of April – NMI debugging continued…

I want to document my current search, mainly for myself. But as I document it anyway – I might as well “make it open” – perhaps others can use it…
Yesterday I had a crash with the new “faulty resistent” Vectorblade. This means I took back all changes, and Vectorblade is in the same state, than about a week ago.

The taking back was more demanding than one would anticipate, since I could not just go back to my old sources – I already changed too much – so in the long run it was easier to just take all “faulty” stuff out manually.

For the faulty handling to be inserted, I had to “free” quite a lot of memory – and I want to put that “new” memory to good use.

Since I am running out of options – I am going back to my NMI handling.
This time – because I have more space available – I chose to “do it right”.

a) Hardware:

I glued the damn switch to the card – and soldered everything “permanent” – the lose wires were – well lose!

b) Software
I wrote an NMI handler, that can

display stack address
display the bank where the NMI happened
display all registers
display the state of VIA
browse/display memory and give out “hex” values of memory locations (button 1+2)

I had to keep it short due to memory restrictions – but it still took about 400 bytes.

This is the source:

                    direct   $ff 
 if  NMI_HANDLER = 1 
                    struct   NMI_struct 
                    ds       ORG_STACK, 2 
                    ds       ORG_BANK, 1 
                    ds       ORG_VIA_PB, 1 
                    ds       ORG_VIA_PA, 1 
                    ds       ORG_VIA_DB, 1 
                    ds       ORG_VIA_DA, 1 
                    ds       ORG_VIA_ACR, 1 
                    ds       ORG_VIA_CNTL, 1 
                    ds       ORG_VIA_IF, 1 
                    ds       ORG_VIA_IE, 1 
                    end struct 

                    struct   NMI_Stack_struct 
                    ds       ORG_CC, 1 
                    ds       ORG_D, 2 
                    ds       ORG_DP, 1 
                    ds       ORG_X, 2 
                    ds       ORG_Y, 2 
                    ds       ORG_U, 2 
                    ds       ORG_PC, 2 
                    end struct 

NMI_RAM             =        playershotobject_list 
NMI_STACK           =        enemyobject_list_end 
                    bss      

                    org      NMI_RAM 
nmi_struct          ds       NMI_struct 
nmi_ypos            ds       1 
nmi_xpos            ds       1 
nmi_print_buffer    ds       7 
nmi_adress_mon      ds       2 
nmi_tmp             ds       1 
                    code     


NMI_HANDLER_FUNCTION 
NMI_HANDLER_STACK_DUMMY 
                    cmps     #NMI_HANDLER_STACK_DUMMY-12  ; minus stored regs length 
                    beq      bouncy_bouncy 
                    sts      nmi_struct + ORG_STACK       ; game_stack_ptr ;game stack save upon first entry 
bouncy_bouncy 
                    lds      #NMI_HANDLER_STACK_DUMMY 
                    ldx      #60000 
bounce_delay 
                    leax     -1,x 
                    bne      bounce_delay 
                    ldy      #NMI_RAM 
                    lda      VIA_DDR_b                    ; if $40 set than bit 1 switch is 0 (this is bit 0 of switch) 
                    ldb      VIA_int_flags                ; if $80 set than bit 1 switch is 0 (this is bit 1 of switch) 
                    sta      ORG_VIA_DB,y 
                    stb      ORG_VIA_IF,y 
; bank in "plain" sight
                    clr      ORG_BANK, y 
                    bita     #$40 
                    bne      noBit0                       ; if != 0, bit is set, than jump (bit 0 = 0) 
                    inc      ORG_BANK, y                  ; bs switch bit 0 = 1 
noBit0 
                    bitb     #$80 
                    bne      noBit1                       ; if != 0, bit is set, than jump (bit 1 = 0) 
                    inc      ORG_BANK, y                  ; bs switch bit 1 = 1 
                    inc      ORG_BANK, y 
noBit1                                                    ;        bit as in "bankswitch bit" 00, 01, 10, 11 (banks) 
                    ldd      VIA_port_b 
                    std      ORG_VIA_PB,y 
                    lda      VIA_DDR_a 
                    sta      ORG_VIA_DA,y 
                    ldd      VIA_aux_cntl 
                    std      ORG_VIA_ACR,y 
                    lda      VIA_int_enable 
                    sta      ORG_VIA_IE,y 
                    lds      #NMI_STACK                   ; stack frame for nmi handler 
                    ldd      #$c800 
                    std      nmi_adress_mon 
                    lda      #$30 
                    sta      Vec_Text_Width 
                    CLR      Vec_Music_Flag               ; no music is playing ->0 (is placed in rottis! 
                    JSR      Init_Music_Buf               ; shadow regs 
                    JSR      Do_Sound                     ; ROM function that does the sound playing, here used to clear all regs 
                    jsr      Init_VIA 
nmi_msg 
                    JSR      Wait_Recal                   ; Vectrex BIOS recalibration 
                    JSR      Intensity_5F                 ; Sets the i 
                    JSR      Read_Btns 
                    ldu      #nmi_output_start 
                    ldy      #nmi_struct 
                    bsr      outPutFrame 
                    ldy      ORG_STACK,y 
                    bsr      outPutFrame 
                    lda      #16 
                    sta      nmi_tmp 
                    ldd      nmi_adress_mon 
                    ldu      #nmi_print_buffer 
                    jsr      dhex_to_uascii 
                    ldd      # ':'*256 + $80
                    std      ,u+ 
                    ldd      #$b080 
                    std      nmi_ypos 
                    ldu      #nmi_print_buffer 
                    jsr      Print_Str_d 
                    ldd      #$a080-16 
                    std      nmi_ypos 
                    ldy      nmi_adress_mon 
monLoop 
                    lda      ,y+ 
                    ldu      #nmi_print_buffer 
                    bsr      ahex_to_uascii 
                    ldd      # ' '*256 + $80
                    std      ,u 
                    ldd      nmi_ypos 
                    addd     #16 
                    std      nmi_ypos 
                    ldu      #nmi_print_buffer 
                    jsr      Print_Str_d 
                    dec      nmi_tmp 
                    bne      monLoop 
                    lda      Vec_Btn_State 
                    bita     #1 
                    beq      noPlus_1 
                    ldd      nmi_adress_mon 
                    addd     #16 
                    std      nmi_adress_mon 
noPlus_1 
                    lda      Vec_Btn_State 
                    bita     #2 
                    beq      noMinus_1 
                    ldd      nmi_adress_mon 
                    subd     #16 
                    std      nmi_adress_mon 
noMinus_1 
                    jmp      nmi_msg 

outPutFrame 
outputContinue 
                    lda      ,u+ 
                    beq      nmi_msg1_done 
                    deca     
                    beq      nmi_do_1byte 
                    jsr      out_2 
                    bra      outputContinue 

nmi_do_1byte 
                    jsr      out_1 
                    bra      outputContinue 

nmi_msg1_done 
                    rts      
AHEX_TOUASCII       macro    
                    pshs     a 
                    lsra     
                    lsra     
                    lsra     
                    lsra     
                    adda     # '0'
                    cmpa     # '9'
                    ble      ok1\? 
                    adda     #( 'A'-'0'-10)
ok1\? 
                    sta      ,u+ 
                    lda      ,s 
                    anda     #$f 
                    adda     # '0'
                    cmpa     # '9'
                    ble      ok2\? 
                    adda     #( 'A'-'0'-10)
ok2\? 
                    sta      ,u+ 
                    leas     1,s 
                    endm     

BHEX_TOUASCII       macro    
                    pshs     b 
                    lsrb     
                    lsrb     
                    lsrb     
                    lsrb     
                    addb     # '0'
                    cmpb     # '9'
                    ble      ok3\? 
                    addb     #( 'A'-'0'-10)
ok3\? 
                    stb      ,u+ 
                    ldb      ,s 
                    andb     #$f 
                    addb     # '0'
                    cmpb     # '9'
                    ble      ok4\? 
                    addb     #( 'A'-'0'-10)
ok4\? 
                    stb      ,u+ 
                    leas     1,s 
                    endm     

DHEX_TOUASCII       macro    
                    AHEX_TOUASCII  
                    BHEX_TOUASCII  
                    endm     

ahex_to_uascii 
                    AHEX_TOUASCII  
                    rts      

out_1 
                    ldd      ,u++ 
                    pshs     d 
                    jsr      Print_Str_d 
                    lda      ,u+ 
                    pshs     u 
                    ldu      #nmi_print_buffer 
                    lda      a,y 
                    bsr      ahex_to_uascii 
                    ldd      # ' '*256 + $80
                    sta      ,u+ 
                    bra      entryOut1 

out_2 
                    ldd      ,u++ 
                    pshs     d 
                    jsr      Print_Str_d 
                    lda      ,u+ 
                    pshs     u 
                    ldu      #nmi_print_buffer 
                    ldd      a,y 
                    bsr      dhex_to_uascii 
                    ldd      # ' '*256 + $80
entryOut1 
                    std      ,u++ 
                    ldd      2,s 
                    addd     #$30 
                    ldu      #nmi_print_buffer 
                    jsr      Print_Str_d 
                    puls     u 
                    puls     d,pc 
dhex_to_uascii 
                    DHEX_TOUASCII  
                    rts      

; format:
; XX byte count or exit 1,2 or 0
; y,x pos
; String
; pointer to data that is output (offset to Y reg)
nmi_output_start 
                    db       2, $70, $a0, "S :", $80, ORG_STACK
                    db       1, $60, $a0, "BS:", $80, ORG_BANK
                    db       1, $40, $10, "PB:", $80, ORG_VIA_PB
                    db       1, $30, $10, "PA:", $80, ORG_VIA_PA
                    db       1, $20, $10, "DB:", $80, ORG_VIA_DB
                    db       1, $10, $10, "DA:", $80, ORG_VIA_DA
                    db       1, $00, $10, "AC:", $80, ORG_VIA_ACR
                    db       1, $f0, $10, "CN:", $80, ORG_VIA_CNTL
                    db       1, $e0, $10, "IF:", $80, ORG_VIA_IF
                    db       1, $d0, $10, "IE:", $80, ORG_VIA_IE
                    db       0 
nmi_output_start2 
                    db       1, $40, $a0, "CC:", $80, ORG_CC
                    db       2, $30, $a0, "D :", $80, ORG_D
                    db       1, $20, $a0, "DP:", $80, ORG_DP
                    db       2, $10, $a0, "X :", $80, ORG_X
                    db       2, $00, $a0, "Y :", $80, ORG_Y
                    db       2, $f0, $a0, "U :", $80, ORG_U
                    db       2, $e0, $a0, "PC:", $80, ORG_PC
                    db       0 
 endif

(The debounce code was Thomas idea – and it works good!)

I played again a couple of hours VB today, to provoke the crash.

First crash that happened – the NMI did not respond. WHAT THE F***!
(corrupt NMI handler address?)

Than I had the game “reset” during play – this never happened before – new bug?

Than finally – I had a crash, where the NMI worked!

The NMI was executed in Bank 3, with PC at: $8429 – which well – first looks sort of “ok”.

All VIA registers look “normal” or at least not suspicous.
AC = $98 is default, CN = $CE means the beam can move. Data direction registers are ok (DB = $9f in this case means (together with the 7bit of IF not set) -> bank 3)… etc

What is slightly suspicious is Stack = $cac4:

The “location” seems to suggest, that S is somewhere “in” an enemy object – that is not so strange, since within the behaviour routines, I reference the enemy data with the stack pointer. What IS strange though, is that I ALLWAYS leave the stack pointer at position 4 of a enemy structure – in above “enemyobjectc” – this should be $cabb.

The stack position is one, that should NOT happen within my program!

CORRECTION!
The stack is “ok” – I didn’t account in my thoughts (at 1:30 in the night), that the output stack pointer als INCLUDES the 12 bytes stored by the NMI. If I take that into account, the stack actually should read $cad0, which is the propper stack address for an entry to enemyObjectD!.

But which would also suggest, that we actually were able to “reach” enemyObjectD…

Since we are “obviously” near the enemy handling – I chose to investigate further.

Enemies are kept in memory with a “list” structure.
The head of the list is located at $c888 (enemylist_objects_head) – with the NMI handler “browsing” – I investigated that and it pointed to “enemyobjectc” – which is ok, the list addresses get garbeled during execution so it does not start with object 1.
That enemyobject is located at $CACC – again I browsed to that location:

The last 4 bytes shown are the first 4 bytes of enemyobjectc.

This is the corresponding part of the enemyStructure:

So in the above image you see, that enemyC has the position of y=$6d and x= $91 – and that its “behaviour” starts at address $8429.

Wait – WHAT?

$8429 is NOT a behaviour routine of mine!

$8429 is right in the middle of a “shot”-“collision handling” routine.
And since registers X/Y are certainly not set to the right value – the four marked lines constitute an endless loop!

That endless loop, is right where I pressed my NMI button. Somehow the ~~“stack”~~ and the RAM of my poor enemyC had been corrupted!

In conclusion:

the crash still only occurs on the live system not the emulator
the crash seems to have to do with memory (RAM) corruption
those two taken together mean – I still do not have a clue!
What kind of RAM corruption can happen, that is not emulated?
Usually RAM corruption is a a typical coding bug!

to be continued…

Tagged on: Debug, vectorblade

By Malban | April 14, 2020 | Uncategorized |

31 thoughts on “14th of April – NMI debugging continued…”

Phillip Eaton April 15, 2020 at 4:39 pm

Stupid question… Have you tried using totally different hardware, to decide if it’s a hardware or software problem? Maybe it’s a faulty RAM?Also, not sure it’s useful but have you considered utilising the NoICE debugger? The licence for 6809 is free. https://www.noicedebugger.com/index.html
You’d need a serial port, but you could use the one on a VecFever, if you can make that part of your test setup. Or add an FT232 or (better) FT245R to get an I/O port.
1. Malban Post authorApril 15, 2020 at 5:34 pm
  
  The problem occurs on at least 3 different Vectrex.
  
  I haven’t tried any external debugger yet – I will consider the options, when I run out of my own :-)…
  1. Phillip Eaton April 16, 2020 at 12:18 am
    
    With a different cartridge and flashrom? Maybe the Zif socket is dodgy?
    1. Malban Post authorApril 16, 2020 at 7:00 am
      
      yep, different cards… 🙂
Graham Toal April 17, 2020 at 9:15 am

This has to be immensely frustrating for you. I really sympathize.

Any chance that the reason it works in the emulator is because the emulator isn’t 100% accurate with some obscure corner case. The sort of thing like pushing a 2-byte value at FFFF where it wraps around. Not that specific case but that kind of thing – something where the emulator takes a shortcut because ‘no-one would ever do that so it’s not worth checking for’ but eventually someone actually does do it…?

I don’t suppose you can get your hands on a logic analyser/data logger and record the full execution in a cyclic buffer using hardware external to the CPU? Then find a way to stop recording when the crash happens before the old trace is cycled out of the buffer? I can’t help but think that if you can’t replicate the fault in an emulator, then you’re going to need hardware support to finally track this down. The NMI button was a clever idea and a good start but I have a feeling this is really going to need something like a logic analyzer or a bus sniffer of some description. Maybe you could program a bare-metal PI to sniff the bus and log everything on bus clock transitions?

We’re all rooting for you!

G
Graham Toal April 17, 2020 at 9:29 am

So if the problem is that some data in RAM is being overwritten, it might be possible to come up with hardware support that can trap when that ram address is being written to. You would need some mechanism to signal when writes were legitimate and when they were unexpected – some sort of hardware semaphore that you would set before a legit write and unset afterwards. Is that at all feasible?

Do you have any free RAM? If you do, this might work: put unused bytes of ram between objects as a tombstone/land mine. If they are ever written to, you’ve experienced corruption and have a chance of either debugging it or – with hardware support – examining a trace of the most recent instruction execution.

Aha! You can get that trace with Thomas’s help! Since the Vecfever is supplying the ROM data, have the Vecfever log to a cyclic buffer the address whenever an instruction fetch is made (ie a read from ROM within a certain range). It doesn’t have to be a huge buffer in the vecfever RAM, just large enough that you can see back to what caused the corruption at the point the corruption is detected. You could even do the corruption detection in software at some safe point (like on WaitRecal) and know that at most 30,000 cycles have been executed since the corruption happened. Then if Thomas has enough ram to store 30,000 addresses in a cyclic buffer, you can be sure that the history is sufficient.

Do you think that might work and could Thomas be able to support it?

Graham
1. Malban Post authorApril 17, 2020 at 12:01 pm
  
  Thx for your suggestions. I will keep that in the back of my mind. Although at the moment have exactly 0 bytes RAM free.
  If all else fails… I need to make room and try “desperate” measures, ehm, I mean – even more desperate measures…
  1. Graham Toal April 17, 2020 at 4:08 pm
    
    I’ve just now suggested adding data logging to the Vextreme cart – even if not useful or in time for your project, it’ll be good to have in the future. https://github.com/technobly/vextreme/issues/51 – the logging part is definitely worth having, it’s the triggering mechanism that is problematic for using it to debug your program. If you can think of any other way of detecting the memory corruption…
    1. Malban Post authorApril 17, 2020 at 5:23 pm
      
      No, no current brilliant ideas :-).
      Other than providing a list of allowed values, or min/max…
Phillip Eaton April 17, 2020 at 8:29 pm

How often does it crash? Does it crash in attract mode/self play mode? If you cut the whole thing down to fit into 32k and use a standard cart & ROM, would it still crash? I would always suspect custom hardware 🙂 Are you still using the spare I/O line? (PB6?) Maybe you could make a cart with a couple of latches to extend address space instead of using the spare I/O line, might make it a bit more robust.
1. Malban Post authorApril 17, 2020 at 8:35 pm
  
  Like I said, it seems to be not predictable. Sometimes after minutes, sometimes after dozen hours. (Minutes very seldom!)
  No, can’t cut it. There is no really custom hardware. Its just a 256kb flash and 2 lines for bankswitching, PB6 and IRQ.
Graham Toal April 18, 2020 at 2:30 am

Have you only run this on your own carts or can you run it on a vecfever and has it shown the bug on both platforms?

If it only happens on your own cartridges, is this an area worth looking at?:

“The 6809 actually latches the address bus and R/W at a particular time after the falling edge of E. A real 6809 is combinatorial, but the latching minimizes the transition effect of one value to another. This actually isn’t guaranteed in the MC6809 datasheet, but some designs depend on it. (Depending on it is somewhat dangerous. You might very rarely see a problem, but during a transition you might have an instant of ‘write’ to an address other than what is intended – even with the latching.) The correct design strategy is to only depend on the Address bus, the Data bus, and the R/W signal (as well as signals such as BS and BA) when they’re guaranteed to be valid (if E and Q are both low, they’re in a state of transition).”
1. Malban Post authorApril 18, 2020 at 8:10 am
  
  The crash happens on both versions!
2. EF April 20, 2020 at 12:30 pm
  
  Old VIA depend on this behavior.
  WDC VIA fixed it.
gauze April 25, 2020 at 1:24 am

I recall there was a sound routine I was working on that worked OK in VIDE but not on real hardware, it corrupted RAM because I was ignorantly writing to some bits that were listed as being “ignored” but really were “undefined” and caused undefined behavior which sounds similar to your issue here. is there anywhere you are writing to a mapped memory location that only uses less than 8 bits of a byte and you are somehow writing more into it?
1. Malban Post authorApril 25, 2020 at 5:09 pm
  
  Hm – can’t think of any “dubious” regions, other the once where VIA and RAM are shadowed…
  Any hints, of what that might have been?
  
  I seem to remember that at one stage you sent me something that was not working 100% in Vide… wasn’t that Joystick related?
  1. gauze May 26, 2020 at 12:21 pm
    
    it affected the joystick but was from doing soemthing stupid with a sound routine I think? I have to dig man sorry 🙁
    1. Malban Post authorMay 26, 2020 at 1:52 pm
      
      No harm done. You did send it to me – I remember, I just also put it somewhere save :-).
Phillip Eaton May 2, 2020 at 10:00 pm

In case you’re not a member, there’s a guy posting in the Facebook group “6502 CPU Family” an in depth look at the 6522, a few logic analyzer screen dumps and commentary on the signals, e.g. PB6. I haven’t read it all yet, but looks quite interesting.
1. Malban Post authorMay 3, 2020 at 8:47 am
  
  I am not a member… but heading over to it now – thx for the tip!
  1. Graham Toal May 14, 2020 at 2:11 pm
    
    Is this relevant? tip #2 in http://forum.6502.org/viewtopic.php?p=2310 – analog problems with the interrupt pin…
    1. Malban Post authorMay 14, 2020 at 2:32 pm
      
      Probably not… I have some timing buffer, when using the IRQ.
Graham Toal May 14, 2020 at 2:23 pm

Something I found extremely helpful when I first wrote tailgunner was to record all inputs at the point where they were read by the game, and replay them with cycle-perfect accuracy on re-runs of the game. If it is not a truly random occurrence, but just one that is hard to duplicate, this should make it possible to duplicate. Unfortunately you will need some sort of hardware support to save those input values.
1. Malban Post authorMay 14, 2020 at 2:32 pm
  
  If you look at the recent posts in the Vectorblade Forum…
  This is exactly what I am doing right now!
  1. Graham Toal May 14, 2020 at 3:24 pm
    
    > If you look at the recent posts in the Vectorblade Forum…
    > This is exactly what I am doing right now!
    
    Ha! Yes those posts were after I had read that page, I’m currently working backwards (on page 14 now, there’s a lot to read) Peer is right, run-length encoding of button changes is the most efficient storage. But you need to either store the entire game, or a save state plus all changes since the save-state.
Graham Toal May 14, 2020 at 3:18 pm

One more off-the-wall idea… would there be any benefit in running on the Pitrex, with the version of the emulator that accesses the VIA? If it didn’t fail I don’t think that would tell you anything but if it did fail it would point strongly at the VIA.
1. Malban Post authorMay 14, 2020 at 3:45 pm
  
  Doubt it – emulation is not so good – it would not be comparable …
Graham Toal May 14, 2020 at 3:33 pm

Last comment for today: is peepholing turned on? For example could a “ldd” followed by “sta” and “stb” be turning one of them into a CLR if the constant for ldd was 0x00nn ? Do you have enough rom space to compile with peepholing disabled?
1. Malban Post authorMay 14, 2020 at 3:44 pm
  
  For ASM peep holing is never active.
Phillip Eaton November 13, 2020 at 12:49 am

Hi Malban, I have bolted an FT245R UART onto my Vectrex. It mostly works, but for reading incoming bytes, it has as interrupt line that needs to be connected to the CPU e.g. on NMI or IRQ, with a handler. Looking at the above code, I can’t make out where you setup the interrupt, it looks like it is only the code that runs when you trigger the interrupt. The Vectrex interrupt handlers are in ROM and jump to CBF2-CBFB, where there are 3 bytes for each interrupt. I’m guessing those 3 bytes are for a JMP with a handler address? I also looked for any interrupt hander setup code in the BIOS code and, apart from the hardware vectors, there isn’t any. Any chance you could share your interrupt handler setup code for the above?

Thanks in advance!
1. Malban Post authorNovember 16, 2020 at 9:56 am
  
  Contacted you on Discord…

Vectrex Blog

and home of Vectrex Integrated Development Environment

14th of April – NMI debugging continued…

31 thoughts on “14th of April – NMI debugging continued…”