Baremetal “Pipeline”

21.01.2020
I guess I just called it pipeline without knowing what it might imply.

I have a tiny little problem here… I am trying to implement 3 things at once:

a) the „pipeline“ concept
b) getting rid of T1 timers
c) completely new set of drawing routines

b) + c) go together…

a) „Pipeline“
I want to implement a two stage concept:

  • i) all drawing code of the game/emulators are collected in an array – mostly unpreprocessed
  • ii) after the game round is done – nothing is drawn yet, but all possible drawings are collected
  • iii) than I can examine the complete drawing code and build a preprocessed* drawing array, which than in a short switch … case routine can be handled very fast

preprocessed*
Putting zeros where needed, bundle lines to consecutive lists etc.
especially (at a later stage) this phase can be influenced by „hints“ from the emulator
(after analyzing the drawing output)…
E.g. knowing when text is printed … or in cases when there is too much information is on the screen
(e.g. the cockpit in StarWars) what can be left out etc,…
It might also be, that the emulator can tell us that vectors 100-176 are the same than „last round“, than all preprocessing is done and we can just copy the old vector array.

The prepocessed array, can also be kept over rounds to redraw a screen completely… without needing another emulation loop.
And so on…

In principle a) is finished. (but only a „pass through“ preprocessing…)

Actually b + c is also finished – but in a heavy debugging phase.
The output as of now looks about ten times worse than the old routines. Only positive thing is, since these are completely new routines… you can switch back and forth on the fly via the command line :-).

I am not 100% sure the new routines will ever be as good as the old ones – it is very VERY fiddlesticky…
All timing (except game round Hz) is done using the Pi timers… So T1 as of now is completely free.
But I sort of have a hard time to keep everything stable…

On the other side I am sometimes like a Terrier – … I won’t let lose till I have it… (or give up)

——
21.01.2020 (later in the day)

Regarding b) and c) I give up and throw the towel.
It’s the same thing as with the raster routines:


If I do the timing with PiTrex timers I have that uncertain „CLK+” while accessing the Vectrex VIA… and the result drawing vectors is the same as drawing raster -> it leads to shaking (in the image… – each lines starts with „zero”- each string is printed in one „go”- the further right we are, the more „CLK+” accumulate – and the shaking gets worse).
… and I can’t think of any way around it.

(… and btw the latch function of the PiTrex – while very clever and thought thru… sadly doesn’t help at allsince (for analog or whatever reasons) there must be a delay of several cycles (a fixed delay) before writing the value…so doing it as soon as possible after the interrupt – is also wrong)

I don’t know if it would be „feasable” – but the only way I can think of out of the box…
If the pitrex itself would implement a vectrex „cyclecounter” – and I could ask the Pi… for how many cycles have passed since last…

Feature for Pi V2?
Anyways… I will keep to the old routines… which is a pitty – it seemed (if they worked fine) that the new routines would have been20-30 % faster.


22.01.2020

So after hours of fiddling… I have again new drawing routines (using T1 again), which make use of the pipeline.
The pipeline as of now does still not do any sensible preprocessing… 
At least something „stupid” I will add next, so that game are playable again.

My testobject up until now was the Pitrex titlescreen with all the text.
The title screen is in fact about 30% faster – and probably a bit nicer. I still can think of some more options to optimize the drawing… but for now I am glad the routines are working.

Naturally I want the routines to be speedy – that means the user have to intervene and tell the pitrex how it looks best. I still have to do such a calibration screen. As of now I use the command line interface.


With the available options I get very good results on 3 of my vectrex.
a) cranky
b) normal
c) noBuzz
But actually some different options must be set. But this should be a one time thing (at least in the final version).

At the moment the important configuration items are:

  • setMaxStrength  -> set maximum strength for optimal scales
    This one is only neccessary for „cranky” – and this one is the one I still want to optimize. If there are appearingly „stray” vectors (or longer than supposed to (vertically) in a vector cluster, than this might be fiddled with.The lower the value, the slower the drawing. The lower the value – the better the cranky output though. Good vectrex can output with a speed of 127! (my noBuzz e.g.)
  • setT1OffDelay -> after a draw – how many cycles till light is switched off
    This I will also optimize later on further, it appears the higher the scale, the longer the delay must be. I must find a way to calculate the higher values from one single given value. If this is set to high the vectorlists have bright spots in between different vectors, if to low, than there are spaces between vectors
  • setScaleStrength -> set correction value to scale/strength conversion
    This one was „unexpected” – I was surpised there is a difference between vectrex. My vectrex appear to use values between 2-4. If a vector list which should consist of vector parts with a „straight” line – appear to have vertical offsets (down – value to low, up value to high)… than this might be an option to change. (my NoBuzz uses 4, my cranky one 2, and my normal actually 3).

ATM I am sort of happy with the results.I sort of fear to try the new routines on an actual emulator. Who knows what faults than appear :-(.


22.01.2020 (5 minutes later)

A short test with tailgunner.
Is running fairly good.
It shakes like hell – but that was to be expected, since I didn’t preprocess correctly.

I just inserted a dummy „force ZEROING” every 30 vectors… which I know is stupid… but that could be done in 1 Minute.
Tailgunner also appears to be running about 30% faster. So running two frames is no problem.

Having 3 ships plus shield (the maximum display items) has still about 25% idle time even when running in 76Hz.


22.01.2020 (a few hours later)
More thoughts on “why pure PiTrex timing might not work…”

I am very much a no hardware guy. All vectrex stuff I do via experiments and observation.
So in my explanations I may be just plainly wrong, because I interpret observations falsly.

Fact is I could not shake off the shaking 🙂

As “timer” routines I lately use almost exclusivly the cycle counter of the ARM, which is more or less 1 nano second accurate.

With all waits I DID try to synchronize with the vectrex as:

  • I always wait “vectrex” cycles, the delay is implemented as:
      delayVectrexCycles (X) -> delayARMCycles(X*666);
  • I gave the “vsync()” a whole new meaning
    – as in “vectrex Sync”,   in vsync I waited (tried different “targets”) for e.g. the RDY signal to become low/hi….

Why are this small delays (less than a vectrex cycle) causing “shakes”?

Well a typical character consists of say 5 lines. A “word” has say 8 letters. That are 40 opportunities to collect (several – see below) “extra” fractions. Per String.
If these 40 opportunities result in a “Vectrex cycle delay” of 2-3 vectrex cycles it WILL shake!

Thing is – due to the fact that I want the drawing to be as efficient as possible – I also draw all vectors as “fast” as possible. And drawing fast (fast as in beam speed) naturally means (small) timing issues have a greater effect (than they would have when drawing slow).

I tried using the PiTrex inbuilt latch function to be as exact as possible. But that does not work well… because…

The latching does not help at all!

More on that…
(observation – I have no rational explanation)

Drawing a Vector with T1 goes like this:

  • put y,x values to the vectrex
  • Set T1, start T1 (ramping starts with the timer)
  • Switch light on
  • … T1 counts down to zero (ramping stops when timer reaches zero)
  • Switch light off

This sounds easy and nice and straight forward.
The REALITY is:

  • put y value to VIA
    wait a bit YY (length: depending on VIA – or long enough to fit all VIA)
  • put x value to VIA
    wait a bit XX (length: depending on VIA – or long enough to fit all VIA)
  • Set T1 and start the timer
    … after TT cycles ramping starts
  • switch the beam on
    … after BB cycles the beam is lit
  • Wait for timer to expire, timer reaches Zero
    … after RR cycles the ramping stops
  • Switch the beam off
    … after OO cycles the light is actually off

Values of:
YY, XX, TT, BB, RR, OO might slightly differ for each vectrex.

Classic values of my vectri are:
YY between 4-10
XX between 0-2
BB between 0-4
RR between 15-20
OO between don’t know

The timing of the BLANK (beam on/off) seems also to be depended whether
you use the CNTL register or the SHIFT register (which is utterly stupid – I know!).
In my current routines I use the CNTL register to manually control the BEAM. If you use the Shift register the difference between “RR” and “OO” (timing wise) is less significan – but still in the 6-10 cycle range.

You can “forget” all that and do functions that always draw “right”.
But than you have to live with either lines that have “bright dots” at the end – or
have gaps “in between” lines.

Using the latch register after a timer IRQ “ensures” in the above example, that “RR” is only 1 or two cycles -> all vectors will have gaps at the end.

Again:
I cannot explain why that is so. But all my observations seem to indicate that it IS.
(see also: http://vide.malban.de/20th-of-february-something-always-comes-up)

See also various BIOS routines (using the SHIFT register, for blank control) one example “draw line”.
(look for !!!!)

Draw_Line_d     STA     <VIA_port_a     ;Send Y to A/D
                CLR     <VIA_port_b     ;Enable mux
                LEAX    2,X             ;Point to next coordinate pair
!!!!            NOP                     ;Wait a moment
                INC     <VIA_port_b     ;Disable mux
                STB     <VIA_port_a     ;Send X to A/D
                LDD     #$FF00          ;Shift reg=$FF (solid line), T1H=0
LF3ED:          STA     <VIA_shift_reg  ;Put pattern in shift register
                STB     <VIA_t1_cnt_hi  ;Set T1H (scale factor?)
                LDD     #$0040          ;B-reg = T1 interrupt bit
LF3F4:          BITB    <VIA_int_flags  ;Wait for T1 to time out
                BEQ     LF3F4
!!!!            NOP                     ;Wait a moment more
                STA     <VIA_shift_reg  ;Clear shift register (blank output)
                LDA     $C823           ;Decrement line count
                DECA
                BPL     Draw_VL_a       ;Go back for more points
                JMP     Check0Ref       ;Reset zero reference if necessary

(It is also NO coincident, that between
“CLR     <VIA_port_b”
and
“INC     <VIA_port_b”
 there are two additional instructions that not strictly belong there … the access to VIA PORT B, the switching “target” of the MUX is about 8-10 cycles “away” from each other (have no cycle chart here atm),  this gives the Y-Integrator Input time to “settle”).

In conclusion.
I may put it in “Vide” terms…
There is a difference between the VIA lines/states and the states of the analog hardware.

In Vide e.g. the Bit 8 of VIA_Port_B is called “via_pb7”.
The state of the vector hardware that controls ramping is called “RAMP”.
Often those two “overlap” – but when “switching” the state – they do not… than there is a timelaps between the two…
(and also for other pieces of the analog hardware)

VIA_STATE != VECTOR_ENGINE_STATE

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.