Baremetal bootloader (3)

Gee today was a waste…
reading about Grahams problems – I chose to investigate further.
Layout of NOOBS:

4 Partitions.
First partition is FAT 32 – RECOVERY.
Seems the recovery is really booted first, and switches than to „boot” – that is where the kernel is loaded from.

Layout of „standard” raspbian:

Two partitions, first is the boot partition. Normally I work with a raspbian system. Only when I tried dual booting I had a NOOBs tested.
Things is the FAT filesystem I use in baremetal has no support for multiple partitions (I learned today). It mounts always the first partition. Nothing else. ever.
For some hours today I tried convincing it to mount or show other partitions.
Although it seems to say it supports multiple volumes and partitions. It really doesn’t. And I won’t change that!
I eMailed with other baremetal people – and they also don’t have any code for more than one „freely” available.(Also the „big” projects, like „circle” don’t).
So – Noobs will not be supported, by other means that Graham made out today (doubling the files).
After that I thought…
Hey – perhaps that was the reason I could not load a linux.img (kernel).I programmed a multi boot system again – and tried loading the kernel.

And it stuck … the MMU seemed to crash once I crossed the page boundaries (0x100000) … after some heavy debugging (all you ever have is rebooting and looking at the serial output)… 

I discovered, that the MMU initialization routine I use – is located at a „non standard” memory location. Usually to not disturb anyone, the page table is located at 0x4000 memory location.
I assumed wrong. It was placed somewhere around 0x44000 – and thus was overwritten by the kernel. This was tedious to debug – but a good discovery anyway, since any program, that we write in baremetal larger than 1MB would have led to the same failure.

Anyway – 2 hours gone by – and I could load the linux kernel.
But it just did nothing. NOTHING! No blinking LED, no loading… it appears to be utterly dead!
I tried setting the pi to as close to the original booting setup as I could – but to no avail. 
I tried getting more information – and wanted to examine whether some exception was thrown… but my exception code was also overwritten by the loader (well – naturally the kernel must be loaded to 0x8000.
Than I implemented new exception handlers at the upper memory region (where my loader is located). Setting the vector table to new memory (which should have been x = y;) was quite a challenge which took me the better part of 4 hours.
First I discovered a gcc bug – gcc actually generated the assembler instruction „UDF” (undefined) – which promptly thru the relevant exception.
To circumvent I used inline assembler code – but that was a very bad idea – because gcc thought „hey assembler? -> I can do better!!!” and better it did – so that everything was messed up!

After some more reading I discovered the assembler attribute „__asm__ __volatile__” – which prevents gcc from changing the assembler commands to his liking!
And voila! I can now set new exception vectors.

But… the kernel loading does not throw any exception – it just sits there quietly and does nothing after „starting” it.
I cannot think of any more ways to debug (except maybe get a raspberry emulator and trace what happens).
But really – I would rather get on with making the output better, than doing these fiddlestick stuff. So for now I call quits on that subject.

I have been at my computer for nearly 10 hours today – and nothing but „doesn’t work” for show… wasted day :-).
Well of I go to play one or two games and than to bed…

I so thought I would have a breakthrough tonight.
There can not be much wrong anymore…
I posted a question in the baremetal forum – hoping for help…

I can boot Raspbian from my menu!
it is 02:30 o’clock and I am very tired – but also very happy!


Following is true for the linux kernel image I use, which I downloaded as a full raspberry Pi installation from the official site (non NOOBS) on January 10th. 2020.
I have not altered the cmndline.txt or config.txt in my setup.

The file: kernel.img (Linux) is position independend – you can load it to any address you like.
The only restriction is not to overwrite the Device Tree blob, and if you have the MMU active, do not overwrite the page table.

The kernel.img consists of a decompression header, which is written position independend and which unpacks the “final” linux kernel to a place where it deems it save. You have to pass the mentioned r0,r1,r2 registers to the decompressor – so that in turn that one can later pass the information on to the kernel itself.

I have found many references on the net, that when starting the kernel you have to do so with a “clean” setup (no MMU, caches flushed, default 700Mhz etc.)
In my “final” version I leave all that enabled, when I start the “decompressor” and it works like a charm. It might be, that the kernel itself needs these default settings, if it is so – than the decompressor takes care of that.

Why didn’t my previous version run?

The answer is twofold:
a) The version of sources I posted in my first posting had a “bug”:


  __asm__ __volatile__(
      "mov r5, #0x0080   \n\t"
      "str r0, [r5]      \n\t"
      "mov r5, #0x0084   \n\t"
      "str r1, [r5]      \n\t"
      "mov r5, #0x0088   \n\t"
      "str r2, [r5]      \n\t"
      "ldr pc, = 0x8000   \n\t"

This – of course does not set the correct values to the registers r0 – r2, this STORES the entry of the registers, it should read:

  __asm__ __volatile__(
      "mov r5, #0x0080   \n\t"
      "ldr r0, [r5]      \n\t"
      "mov r5, #0x0084   \n\t"
      "ldr r1, [r5]      \n\t"
      "mov r5, #0x0088   \n\t"
      "ldr r2, [r5]      \n\t"
      "ldr pc, = 0x8000   \n\t"

This was a simple copy/paste error.

If I had not done that stupid copy/paste error (late night work…) – everything I had done would have worked out of the box.

Now… during my debugging I had the kernel.img as a binary include with my startup files.
And with that debugging code I HAD the above code correct! Nonetheless after jumping to my main program – the loader did again not work…
But this was an alltogether different bug:

The second “bug” I had was… (which was quite difficult to find)
I use parts of the library.

And one of the first things I do, is to print out the current speed settings of the ARM. For that I use the library function:

int32_t arm_clock = lib_bcm2835_vc_get_clock_rate(BCM2835_VC_CLOCK_ID_ARM); 

Within that function the mailbox is called, and the mailbox is passed a structure, which it fills.
That structure is at a memory address defined somewhere in some header… which turned out to be:

#define MEM_COHERENT_REGION		0x400000	///< 

And that given address is actually right WITHIN the region the binary included kernel lies – so the decompression resulted in an error.

So finaly:
I did a stupid copy/paste mistake in my first code. During my search – I added another buggy code (well – “buggy” is a hard word hear, I only use the library… and do not know it by heart. So the fixed “MEM_COHERENT_REGION” memory region was not really in front of my eyes…)…

But now it works and I can load the original kernel and it boots up.

More about PiTrex boot can be read on:
Pitrex Wiki (bottom of the page: Raspberry baremetal -> PiTrex)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.