Long time – no blog. People watching this site “closely” or inspecting youtube now and than will have noticed, that I was not vectrex lazy the last couple of weeks. But there was nothing really all that specific going on.
Theoretically my “sale” is still ongoing I also updated the sales page, but I have not put it online yet – there is always something that comes in between.
I have had my ups and downs with this thing. But I still rather like it, especially the baremetal side of it. Officially I have stepped out of the project – but unofficially I guess (at least at the time being) I am one of the most active people around it :-). While Kevin an Graham like to focus on the Raspbian side – I have comitted myself to the baremetal side. My last “official” stuff I “released” at: https://groups.io/g/pitrex-dev/topics
Even to that extend, that I do not try to maintain “my” code in such a way, that it is compatible with the raspbian side anymore. I really hate these #ifdef #else … statements lingering in the code. You never know what is defined, and what not, and can’t “fluently” read the code. But it is so “crossplatform” and so well “maintainable”…
This is not out of ill intend… but I really like it when I can decipher code I am writing and reading.
Anyway… why this blog today? Actually – it is not per se for you my readers, it is for me. I must again document some stuff I am currently doing, lest I forget it in the future. I have to put down my thoughts and my ressources in a central location, so I can (if if have to) find them again. So – here we go.
I have written about this before. I don’t know if this is a new thing or not. But “interleaved emulation” is something I thought of alone in my dark hours. If other people have done so before… just the better :-).
The thing is (I have also written this somewhere before)… the piTrex setup is at a slight disadvantage compared to other so called “smart cards”. Like VecFever, VeXtreme, and the new Jason card with no name using a Teensy.
They can “multitask” – sort of. They can do powerfull computations on the “smart” processor, and let the 6809 do all the work of drawing on the Vectrex. We (as in piTrex programmers, ok – not a large we… ) can not do that. The PiTrex “halts” the 6809 and must draw everything on its own.
Also the PiZero does have only one CORE, so there is no parallelism here either.
Until now everything we did was sequential, usually something like:
- emulate something
- output everything
- go to top and start again
Depending on the thing you emulate (or “process” as native application) and have to draw, the time distribution is something like:
10% emulation / application
90% vector output
The major limiting factor is usually the output to the vectrex. Let us be optimistic and say the vectrex could at most output 500 vectors in 50Hz – meaning during the 1/50th of a second that we have time in our round… we do nothing else but pumping vectors as fast as we can to the vectrex.
The vector output routines in the PiTrex interface are (IMHO) pretty fast, and I doubt they can be made much faster. The last quench of speed one can squeeze out of them are individual for each vectrex and this depends on whether they are “cranky”, need calibration, zero a bit faster, need less time for integrator to settle etc etc etc.
For all of this there are by now configuration items, so you CAN setup your PiTrex output to be the most speedable for your individual vectrex.
Talking about the above mentioned 500… that individual squeezing will probably give/take you +- 10 vectors.
There are no “big jumps” being made anymore. All there is left is to squeeze the last possible bit of optimization out of the system. Here is were the “interleaved emulation” steps in.
The thing is, when drawing “500 vectors” – there is much time within the “drawing time” that is unused, as in we are waiting till the beam reaches a certain location. With out standardized 500 vectors, lets us assume there will be at least 10 ZEROings and thus also at least 10 positioinings.
Positioning tends to be done with a rather large scale factor. Zeroing usually also take a some cycles to finish. And most often there are a couple of vectors which are not just tiny – but also take a bit of time to be drawn.
Let us postulate:
- move scale factor $80, ten moves: 1280
- 10 zeros, using 30 cycles: 300
- 20 non “small” vectors, using a scale of $50: 1600
- sum: 3180
This is already more than 10% of cycles that we have for one “round”. The obvious thought now – how can we use this idle time to our advantage?
Actually I want to let the program/emulator “run” in that time I want to interleave the emulation with the output. If we can successfully pull that off, than we have “won” the last possible 10%, and we can output vectors 100% of the (vectrex) time. If we can pull that off, we can be certain, that other devices will not be “faster” than we are (provided we really do have good output routines 🙂 ).
Actually – instead of the term “interleaved emulation” – you might as well call it cooperative multitasking or cooperative “threads”.
First try – calling by “contract” – “hey, you… function… do not use more than 4 vectrex cycles!!!”
My first thought to implement this was to devide the “emulation” in singular emulation calls. Each of these calls must ensure not to take longer than XXX time. I did a few measurements – and even with the simplest of emulation it was rather unpredictable.
- about 90% of the calls used less than 1/2 of a vectrex cycle
- 9.9% used between 1-4 vectrex cycles
- but there were also calls every now and than that were 40-60 vectrex cycles
This major unreliability made it very difficult to implement a usuable system. After a couple of hours, I gave that up.
Second try – interrupts
I wanted to avoid interrupts – but after the not so successful first try, I thought I should give them a try. Interrupts, as the name suggests “interrupt” the current program flow, do something completely different – and than return to the location they interrupted – if all goes well, the interrupted program does not notice the intervention…
The sort of thing you must use for our “interleaved emulation” is a timer interrupt.
a) fantastically exact
The ARM offers a cycle counter, which is increased with every processor cycle.
(see: www.raspberrypi.org/forums and ARM1176JZF-S Technical Reference Manual page 170)
This is fantastic in exactness, since one cycle equals one nano second! But you can not use this as an interrupt source.
b) unreliable – ARM periphals timer (based on SP804)
Sounds “ok” from what one reads, but experimenting with it yields rather unreliable results. Also this one depends on the system timer. It can generate interrupts, but I have not been able to set “small” values and get reliable enough results.
c) System Timer
This is a 1MHz timer, with four different compare channels, which can be used for interrupts.
The setup of the interrupt and the system timer is surprisingly sparsly documented. The “best” documentation is found by reading the code at: blinker07.c and matyukevich: timer.html.
(this is 1000 times less exact that a)!
Vectrex runs at 1.5Mhz
PiZero runs at 1000Mhz
One vectrex cycle is 666 nanoseconds
One PiZero cycle is 1 nanoseconds
The interrupt “overhead” time of the system timer (setting up, saving states – restoring states, the processor actually recognizing an interrupt…): seems to be between 200-800 nanoseconds.
So for a “sensible” interrupt timing, the idle time of the vectrex should be at least 3-4 vectrex cycles (4*666 = 2666 nanoseconds), everything less results in immensly wasted time for little result.
I use the “system timer” of the ARM, this one runs at 1Mhz and is able to fire interrupts. To setup an interrupt, you get the current system time, add to it the time you want to be “called” at, save it to the compare register… and wait (ok, you also have to enable it… at various places).
Now… interrupting… WHAT do we interrupt, and with what?
In principle we want to run two programs at the same time:
- the emulator
- a vectrex display loop
One of them will have to run as the main program (we will call that “normal space)”, that gets interrupted, the other one will run as the interrupt program (and we will call that “hyper space”).
As mentioned tons of times before… with vectrex it is bad (VERY BAD!) to interrupt it during the time it accesses the VIA. This must be avoided, otherwise there will be garbage on the screen. So it follows, that the vectrex display loop will be in hyper space (which is the interruptee) while the emulator will reside in normal space (the interrupted).
There are a couple of different things we have to keep in the back of our minds:
- global and static variables are shared
- stack between normal and hyper space is different
- the hyper space program can not use the “stack” while giving control to “normal space”, the stack will be changed by the next interrupt, since the “normal space” registers will be saved on that stack
- “C” is a functional language – we have to circumvent that while in hyperspace, because “logically” we also interrupt the hyperspace while executing normal space programs (we actually want to use a busy wait loop to execute another program, and than continue where we gave control over to the “normal space”)
(this is what in other multitasking environments a “scheduler” would sort out for us)
- it should still be “easy” to use…
Some code, this sets up the interrupt handling:
This is the actual interrupt handler:
Where “handleVectrexOutput()”, is the function that does handle (as the name suggests) ALL vectrex output, as in:
- get joystick/button information
- do sound output
- draw all vectors
The “normal” functions for these tasks will not function anymore, when the variable “isIRQMode” is 1 – they are just ignored!
(for the time being this provides compatibility)
The handleVectrexOutput() returns to the interrupt handler, when inside “it” a busy wait loop is pending. The interrupt handler can than decide, whether the wait is “long enough” to return to normal space, or if in an endless loop just “return” to the handleVectrexOutput().
The busy loop(s) inside the handleVectrexOutput() is ARM cycle counter based and independend of the timer. It waits till a certain number of cycles have passed, but 1000 times more exact than the timer. As long as the timer returns before the COMPLETE wait time is up… the cycle exact timer in the handleVectrexOutput() will be 100% correct!
A part of the handleVectrexOutput():
The first entry assumes a WaitRecal is pending. Since reading T2_lo clears the interrupt flag – and we might need that – it is checked first. If the interrupt flag is set, no waiting is needed, and we continue direcrly to our waitRecal (written down in actual code).
If the vectrex timer T2 has not run out yet, we calculate the vectrex cycles it has still left (variable t2).
(and do a safty check again, that it has not expired in the last two instruction lines)
Than we “wait” for t2 vectrex cycles – this is what the “MAIN_TAKEOVER(t2)” does.
Is a bit “tricky”, what it boils down to is:
- it generates a unique label
- the void *ptr is set to that labels address
- it returns to the interrupt handler, which decides whether to go to normal space or not
if the handleVectrexOutput() is called again later… the first lines check, whether ptr is zero, if it is not, than the program jumps to that location… and continues executing…
- reset the ptr to zero again
- and continue with the waitRecal, which now should be due soon
All of the above is implemented and working.
The principle of “interleaved emulation” is working… but this is just the first step. I have only implemented the “waitRecal” as a tryout… next I must implement the complete pipeline execution with all its different cases and waits. But I am pretty confident now, that this is feasable.
Thanks for “listening”.