4th of January – Spidex & optimization

I was asked for some optimization hints for a “C” project. I made a video of some of my findings. See link below.

Some additional points:

  • “dp_” only really works for VIA IO ports (and BIOS RAM locations), since these are defined by Peers “C” system
  • the player-shot is a “Star” which consists of 12 vectors. One such shot takes about 1300 cycles to draw – and it is really fast and not discernable. If it were a “rotating triangle”, I don’t think the player would notice – but would probably save at least 2000 cycles for 3 displayed shots
  • other characters also have “many” vectors. the bee e.g. has nearly 30 vectors, which on a real machine are also barely discernable. Half the size would probably suffice.
  • for the small sprites – try to use the vectorlists which do not need to “sync”. This can be done by either setting the sync max in vecci, building shorter lists – or just DELETE the sync entries. Each sync costs about 200 cycles. Good taste and manual optimizations come in handy here :-).
  • Look at the vide profiler:
  • here one can see that these tiny little sprites (player + enemies) 7 at the moment use 10872 cycles to be drawn (first in list) – which is MORE than the complete web. These sprites have just to many vectors.
  • It seems the “hit_object” is not the huge cycle waster – 14 tests only need 2187 cylces.
  • If strengths of vectors are really big and scale is really low (<6?) -> the resulting list will look “broken” – this you have to test on a real vectrex. There can be function written especially for those lists – but I have at the moment (I think) not made them public in/with “C”
Tagged on: ,

19 thoughts on “4th of January – Spidex & optimization

  1. Peer

    Hey, great to see that our C development environment is actually being used by someone outside 🙂 Would love to learn more about the project. Let me know if I can be of any additional help .Cheers, Peer

  2. Alexander

    Hello,
    I actually followed your previous advice and focused on optimizing the collision detection, now there is an experimental branch that implements the “Spatial Partition” Optimization Pattern: (using a 8×8 cell grid) http://gameprogrammingpatterns.com/spatial-partition.html

    Unfortunately this comes with a memory usage penalty, but if I am able to optimize the vectorlists further as you describe above, there could be hopefully be more objects on the screen at the same time (if memory allows, currently the compilation says: iRam size: 0, uRam size: 488, PF: 0) When I reach above 600 the game won’t start in the emulator anymore.

    Opportunity:
    I am also thinking about doing a cell based drawing algorithm in order to avoid having to move to a random location on the screen before drawing each object. Instead all the objects that are within the same cell can be drawn at the same time, with minimal movement between each object. Then move on to the next cell if it has any objects in it. Now most of the cells are actually empty since there are so few objects on the screen at the same time. Since I still have to walk trough the whole cell grid when checking for collisions, the drawing can be done on the run anyway.

    1. Malban Post author

      If I can I’ll gladly try to help out further…

      (BTW the comments are moderated – so they are displayed only if I approve)

      1. Malban

        The RAM problem is strange – you should have about 1024-128-Stack as RAM and unless you have done quite recursive functions that should be more than 600 – if you want you can send me a branch with that – and I can explore further ( Sunday at earliest though)

  3. Peer

    Regarding the “RAM problem”: Where do those numbers (iRam / uRam) come from? If they are derived from my original C compilation script, then those numbers can be inaccurate, if the RAM variables are not compile-time initialized (this mechanism for deriving the RAM size was the best I could come up with using gcc6809). So you might in fact be using more RAM than is shown.

    Also, C syntax can be quite tricky when using the “const” qualifier. The compiler will allocate data (i.e. variables) qualified as constant to the ROM section. However, if you are using things like “an array of constant pointers to constant arrays of pointers to constant integers” (and I am using such things a lot in my own C code), then the correct way to write this in C is a bit ugly and prone to errors. I do not know if you are using such constructs, and I also do not know anything about your C coding abilities, but this might be another reason why things unintentionally go to RAM instead of ROM (without being correctly indicated by the compile script).

    Let me know if you want me to take a look at this and I’d be happy to help.

    Cheers,
    Peer

    1. Malban Post author

      The RAM values are derived from the MAP file that is produced by the linker. That *should* be accurate – how else should it build a correct binary?

      1. Peer

        Ok, then the values should indeed be accurate. My original script did not inspect the MAP file (as I did not find a simple way to do this in a windows batch script) but used the plain file size of the *_ram.bin file instead (and this is inaccurate if non-initialized non-constant global variables are used).

    2. Alexander

      I do use arrays of pointers to other const arrays. However I learned that declaring them as “const signed char* const data[]” (note the second const) reduced the iRAM usage to zero. Now I only have a non-zero value for uRAM, which I assume is the uninitialised RAM, e.g. the game data structures. However, the last value PF=? printed during compilation, I have no idea what that is?

      I do assume that my program uses some stack memory also. I use a kind of manual object oriented approach in C where the object initialization functions are called in several layers, sometimes with a lot of arguments passed between them.

      I can share the sources if you contact me via e-mail (leputa6 (at) hotmail (dot) com)

      1. Malban Post author

        You are correct.
        PF – is a Vide internal thing and describes the number of “Peephole corrections” Vide does for your program. 0 Means gcc works 100% correct! (strange!).

        However…

        Hi just tried it, and I changed the header you described to:

        #ifndef _GRID_H
        #define _GRID_H

        #define GRID_NUM_CELLS 16
        #define GRID_CELL_SIZE 32

        #define GRID_NUM_CELLS_SHIFT 3
        #define GRID_CELL_SIZE_SHIFT 5

        Than the RAM used is 872 – and with that value I debugged, that the initialization of the GRID actually overwrites the stack.
        The maximum available RAM is 896 (1024-128) – and that does not include the stack, so at this point in that function the stack usage is actually 26 bytes and as such overwritten -> CRASH.

        Which values did you use for „688“?

        1. Alexander

          You have to change the other defines accordingly, e.g. GRID_CELL_SIZE to: 16, and also both the shift values to: 4 This gives a 16×16 grid with each cell containing 16×16 units.
          I currently use a 8×8 grid with each cell containing 32×32 units.

          If the shift values are not set correctly, there will probably be accessed outside the matrix.

          Maybe I don’t remember the memory figure correctly?

        2. Alexander

          I tried it using the following values, and the game uses 872 bytes of uRAM and crashes:
          #define GRID_NUM_CELLS 16 // was 8
          #define GRID_CELL_SIZE 16 // was 32

          #define GRID_NUM_CELLS_SHIFT 4 //was 3
          #define GRID_CELL_SIZE_SHIFT 4 //was 5

          So the best option would be a 8×8 grid, with each cell covering 32×32 units. In total 64 cells to process.

          1. Malban Post author

            Whatever you need to do to keep RAM memory available for the stack.
            The grid 16*16 are word pointers so the size of that is 512byte – which is quite huge if you only have 872 bytes available over all.

            If you can reduce RAM usage of some other stuff, you might be able to pull it of though…

  4. Alexander

    Some thoughts about gcc optimization:

    Currently I am working on a grid based collision detection algorithm for the game. It uses a 2-dimensional array of cells. Each cell consist of 32×32 units, and the grid is 8×8 cells.

    I thought about optimizing the grid processing by using a 1-dimensional array instead. Also I tried to manually unroll the two level nested for loops which are processing the grid cells. However, it turned out that the initial approach, using a 2-dimensional array, and nested for loops consumed slightly less cycles. So gcc probably did a better job optimizing the original algorithm, which is also good because the original algorithm is much more straightforward…

  5. phoboz

    Just a short comment to the video:
    In general very educative, the main task is to really focus on the vectors being drawn in an efficient way.
    Meaning with large strength and a small scale factor, e.g. with higher voltage applied to the deflection coil, less time needs to be wasted by the CPU to wait for the electron beam to reach the target.

    Regarding the Tempest clone I started (tempex) I put that project on ice after I realized that there was already a quite good Tempest clone for Vectrex. It is called Tsunami, and I have not been able to complete all the levels of that game yet (just make sure you get the latest version) I probably played about 10 levels. I don’t know how many there are in total, but probably as much as you can squeeze in 32 kBytes? The only weakness is the sound, which is very minimal.

  6. phoboz

    The grid based collision detection mechanism was a dead end, since there are so few objects on the screen at the same time. It took more cycles to walk trough all the cells, rather that testing everything using brute force.

    The other option to not check for example all player shots every frame was not so good either for this game, since the shots are moving so fast, it was really recognizable, and a lot of enemies was missed that looked like they where supposed to be hit.

    However, I did optimize the player shot by drawing a rotating rectangle, this gave some effect. I also further tried to reduce the number of vectors in the enemy shapes. The largest one (the bug) is still over 20, but it doesn’t look good with less.

    There is still some room for optimization, and I will explore it when possible.
    Now the game runs between 28000, and 41000 cycles each update.
    I will try it on a real Vectrex to see how it feels quite soon.

    Currently I am focusing on the game-play, I have added some more enemy types and a new type of enemy movement. More things are happening in the 3:rd wave, and some new, randomly moving enemies has increased the difficulty a little bit. I have also learned to create shapes that use less vectors. I am using the method Malban describes in the video for all the shapes now.

    There are also a few things left to add to the game engine. For example the egg containing the bug should be possible to push of the net before it hatches, and the exploding enemies shall also kill other enemies within the explosion range. I hope I can achieve these 2 things without consuming to many additional cycles.

    1. Malban Post author

      Hi,
      great that you are continuing. Reduction of once finished sprites is always “difficult”. I know.
      When you have reached your “beta test” phase – I’ll gladly (if you allow) look again at your source. Perhaps I discover something.
      As long as those “high cycle” numbers are peaks only – they don’t matter too much.

      If they are more or less constant – we might look closer at them.
      But I totally agree with you to concentrate first on game play – and think about optimization later.

  7. phoboz

    I tried the game for the first time on a real Vectrex yesterday. I noticed that movements up and down on joystick #1 caused shots being accidentally fired in different directions. In the game code the fire direction are either read from joystick #2, or read from joystick #1 when button 4 is held. I only have one joystick connected, so I wold assume that there should be no readings from the second joystick?
    Does anyone know what might trigger these readings from the second joystick?
    Note that sideways movements does not cause shots being fired accidentally, only up and down movement does this.
    It also seems that the accidental fire direction does not seem to be in the same direction as joystick #1 is moved, e.g. shots are being fired to the sides when moving up and down.

    1. Malban Post author

      Yes, you are right there is such a bug with Vectrex. (which Vide as of yet does not emulate – sorry!)

      I have experienced the behaviour you mentioned. It was also recently mentioned on “Atari Age” (see Review from Vectrexer):

      Normally a single joystick connection does not allow for the controller to port 1 to be centered. This is the reason the Madtronix Vectrex controller port “Center Dongle” terminator was created for use in controller port 2 to improve the accuracy for single port analog game

      http://atariage.com/forums/topic/286847-minivex-newly-produced-controllers-for-vectrex-atari/

      “Simple” solution is – make an option screen where the player can chose the input method, and only get the analog values from port 1 (counting from 0 upwards 🙂 ), when player DECIDES to play with two joysticks…

Leave a Reply to Malban Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.