Vectrex Aklabeth (4)

A few words as an intro to my optimzations. As long as the game is running good and bug free, I do not really care much about how the source looks. I don’t care if it is half “C”, half macro, inlined or assembler.

With the end result – a binary for the vectrex – you do NOT see a difference! Therefor to achieve my goal to get the program to a state where it displays good and plays “well” – I use all “tricks” and dirty coding I can think of.

From now on I will just postulate, that the code we got running so far is “bug free” – and we just do everything we can to get it running faster.

This WILL at some stages result in dirty coding, ugly sources, internal knowledge and most probably some changes to the UI!

World display

84316 cycles!

Worst things first. Or was it world things first? Anyways… the world with the first “basic” display routine displays in worst cases in (lucky number 42) OVER 80000 cycles.

Our goal is to reduce that to under 30000 cycles.

And while you might think this is an impossible stupid goal, I invite you to look at the second image – which states a display of the same place with exactly 28499 cycles. So it IS possible!

Two things are most noticable:

a) the complete map is a little bit smaller. Yes I use a slightly reduced scale. This actually was not really on purpose, but I had to reduce the scale to fit the “changed” tiles to the correct positions. Anyway the reduced “scale” accounts to exactly 308 cycles – so while this can be seen – it is not a “good” cycle reduction scheme.

b) only “one” string line is displayed instead of two
Ok, let’s go into this little thing first. This change/optimization consists of two parts. First to display only one line, second optimize the one line.

Getting rid of one line isn’t that hard – you just do not print it any more. The problem is, the player wants to have the information that you just omitted anyway, and he sometimes actually NEEDS

28499 cycles!

the information!
What I did is, to display “immediate” status information (like “go north”, “the path is blocked”…) in a “timed” manor. Immediate status information is displayed for one second and than is replaced with the usual status line.

If the player didn’t get the information, he can press button two to redisplay the immediate information.

Yes – I know – this is not perfect… but as said, we have to use some dirty tricks sometimes. The old status information needs 12718 cycles to be displayed. This is an outrages amount of cycles and just HAD to be reduced!

I did one other thing to the string – I optimized the display routine. Remember what it looked like?
The status line was printed with three different print statements, and the status information numbers were inserted “on the fly”:





void HWStatus(unsigned long Food, unsigned long HP, unsigned long Gold)
{
	Reset0Int();
	_fs("Food %", ltoa(Food));
	Print_Str_d(-128+5, -128, stringBuffer40);
	Reset0Int();
	_fs("HP %", ltoa(HP));
	Print_Str_d(-128+5, -20, stringBuffer40);
	Reset0Int();
	_fs("Gold %", ltoa(Gold));
	Print_Str_d(-128+5, 80, stringBuffer40);
	Print_Str_d(-128+5+6, -128, (void* const) messageBuffer);
        CLS;
}

We replaced this with the following:

char statusString[] = "FOOD 12345  HP 12345  GOLD 12345\x80";
void HWStatus()
{
	dp_VIA_t1_cnt_lo = MESSAGE_SCALE; // scale
	if (messageTime>0)
	{
		messageTime--;
		Print_Str_d(127, -128, (void* const) messageBuffer);
	}
	else
	{
		Print_Str_d(127, -128, statusString);
	}
}

a) We have a “complete” string line in RAM so we use only one “print”.

b) The status numbers are not inserted every single time the string is displayed. The update to the string is done at the time of CHANGE of the corresponding status – with following MACROS:

#define UPDATE_FOOD(f) ltoaP((f), statusString+5)
#define UPDATE_HP(h) ltoaP((h), statusString+15)
#define UPDATE_GOLD(g) ltoaP((g), statusString+27)

The overal cycles “saved” with these string changes are dazzling more than 8600 cycles!

While this is “cool” – in the grand scheme of things we are at about 75000 cycles – which is still more than sluggishly slow.

We have to insert a “game changer” otherwise this is not doable!

Game changer – “C”

The original world draw loop:

void WORLDDrawOnce(PLAYER *p,WORLDMAP *m,int ShowAsMap)
{
	int x1,y1;
	unsigned     int x,y;
	unsigned int Grid,w,h;
	RECT r;
	
	Grid = 7;                            // Number of cells in grid
	
	if (MAINSuper() == 0) Grid = 3;        // Standard Aklabeth
	if (ShowAsMap) Grid = m->MapSize+1;    // Displaying as a map ?
	
	w = (unsigned int) (((unsigned int)255) / (unsigned int)Grid);
	h = (unsigned int) (((unsigned int)255) / (unsigned int)Grid);    // Get grid sizes
	for (x = 0;x < Grid;x++)            // For all grid cells
	
        	for (y = 0;y < Grid;y++)
        	{
        		x1 = (int)p->World.x-((int)Grid)/2+(int)x;    // Which cell ?
        		y1 = (int)p->World.y+((int)Grid)/2-(int)y;
        		
        		if (ShowAsMap)                // If map, not centred around us
        		x1 = (int)x,y1 = (int)Grid-1-(int)y;
        		unsigned char toDraw = WORLDRead(m,x1,y1);
        		
        		if (toDraw != WT_SPACE)
        		{
        			DRAWSetRect(&r,(int)(x*w),(int)(y*h),(int)(x*w+w-1),(int)(y*h+h-1));// Work out the drawing rect
        			DRAWTile(&r,toDraw);
        			
        			if (x1 == (int)p->World.x && y1 == (int)p->World.y)    // Draw us if we're there
        			DRAWTile(&r,(unsigned char)WT_PLAYER);
        		}
        		else
        		{
        			
        			if (x1 == (int)p->World.x && y1 == (int)p->World.y)    // Draw us if we're there
        			{
        				DRAWSetRect(&r,(int)(x*w),(int)(y*h),(int)(x*w+w-1),(int)(y*h+h-1));// Work out the drawing rect
        				DRAWTile(&r,(unsigned char)WT_PLAYER);
        			}
        		}
        	}
}

With:

unsigned char WORLDRead(WORLDMAP *w, int x, int y)
{
	if (x < 0 || y < 0) return WT_SPACE;// WT_MOUNTAIN;
	if ((unsigned int)x > w->MapSize) return WT_SPACE;
	if ((unsigned int)y > w->MapSize) return WT_SPACE;
	return w->Map[x][y];
}

To optimize it is good to first look at the generated code.

e.g. the “simple”: “w->Map[x][y]” generates assembler code like:

	ldb	14,s	;, y1
	sex		;extendqihi2: R:b -> R:d	;,
	std	5,s	;,
	ldb	13,s	;, x1
	sex		;extendqihi2: R:b -> R:d	;,
	std	3,s	;,
	aslb	;
	rola	;
	addd	3,s; addhi3,3	; tmp79,
	std	1,s	; tmp79,
	aslb	;
	rola	;
	aslb	;
	rola	;
	aslb	;
	rola	;
	subd	1,s	;subhi: R:d -= 1,s	; tmp82,
	leay	,x	;,
	leax	d,x	; tmp83, tmp82, tmp2
	exg	d,x	;, tmp84
	addd	5,s; addhi3,3	;,
	exg	d,x	;, tmp84
	ldb	1,x	;, <variable>.Map
	stb	17,s	;, toDraw

Which is about 100 cycles (each loop!).

This can be vastly optimized. We create a char pointer outside the loop and do not reference the map with “coordinates”, but with a pointer.

After each inner “loop” the pointer is increased by 1, after each outer loop the pointer is increased by “GRID” – and the actual char can than be got with a simple “*toDraw_p” (you will see in the resulting function).

The postiion of the player does not have to be tested each loop – we can do that completely outside of the loop! Moreover – the player position NEVER moves! It is always the center of the screen – no need to test at all!

The complete reference to some “rectangle” to draw to – can be thrown out, since only the x1,y1 coordinates are used anyway.

The out of bounds checks from WORLDRead() can be done inside the loop and actually “infront” of the inner loop, so the inner loop does not have to run at all – if the outer is OOB already.

The outer loop can be adjusted BEFORE execution to reflect OOB status.

The grid need not be a variable (in our case) – so we can work with constants.

The loop over the gridwidth – which actually is later used to calculate the actual screen position can be done DIRECTLY using screen coordinates – no need to calculate them at all!

Due to later explained optimizations – the MoveTo which is originally done inside the “DRAWTile()” can be moved into the loop.

The resulting optimized function than is:

void WORLDDrawOnce(PLAYER *p,WORLDMAP *m,int ShowAsMap)
{
	int x1; 
	int y1;
	int x=-128,y;
	unsigned char *toDraw_p;
	
	x1 = (int)p->World.x-(GRID/2);    // Which cell ?
	y1 = (int)p->World.y+(GRID/2);
	
	while (x1<0)
	{
		x1++;
		x+=_W_;
	}
    
	toDraw_p = (unsigned char *)( &m->Map[x1][y1]);
	
	for (;x < GRID*_W_-128;x+=_W_)            // For all grid cells
	{
		if (x1>20) break;
		for (y=-128;y < GRID*_H_-128;y+=_H_)
		{
			if ((y1<0) || (y1>20)) {y1--;toDraw_p--; continue;}
			if (*toDraw_p != WT_SPACE)
			{
				
				dp_VIA_t1_cnt_lo = 0x4d; // scale
				
				Moveto_d(y,x);
				y1--;
                   DRAWTile(*toDraw_p--);
				dp_VIA_cntl = (unsigned int)0xcc;            // enable zero, enable all blank
			}
			else
			{
				y1--;
				toDraw_p--;
			}
		}
		y1+=GRID;
		toDraw_p+=21+GRID;
		x1++;
	}
	// playeris always right in the middle!
	
	
	VIA_t1_cnt_lo = 0x4d; // scale
	
     Moveto_d(y1,x1);
	DRAWTile((unsigned char)WT_PLAYER);
}

The result is – we saved over 10000 cycles – just by adjusting the “C” code by doing some analyses of what needs to be done, and by a look at the assembler code on how arrays are decoded!

Now we are round about at 65000 cycles worst case – which still is earthshatteringly slow – even though we already optimized for about 20000 cycles.

Ok – it seems we do not only need a game changer – we need a WORLD CHANGER (pun intended 🙂 ).

to be continued…

Tagged on: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.