Build and run a boot-loader
Principles of computing and operating systems.
What to expect here.
If you’re a curious guy like me, you’ve probably wondered how an operating system works. Here, I’ll share some research and practical experiments I’ve done to understand computing and operating systems better. After reading, you will create your bootable program that works in any virtual machine application like Virtual Box.
Important note
This article is not intended to explain everything about the boot loader with its complexities. This example is just a starting point based on the x86 architecture. It must be difficult reading for most people, which requires basic knowledge of microprocessors and computer programming.
What is a bootloader?
In simple words, a boot-loader is a piece of software loaded into a computer’s working memory after booting.
In more detail, after pressing the start button on a computer, many things must be done. Then, a firmware called BIOS (Basic Input Output System) kicks in and does its job. After that, the BIOS gives control to the boot loader installed on any available media, that is, USB, hard drive, CD drive, etc. The BIOS goes through the data media found in sequence, checking for a unique signature — the so-called boot signature (or ‘boot record’). When the boot record is found and loaded into the computer’s memory, the processor starts to function from that point. To be more precise, at address 0x7C00, save this memory address; this is important in building the boot loader.
Work inside of the first sector with only 512 bytes.
During the BIOS initialization process, the BIOS looks in the bootable devices’ first sector for a single signature, as mentioned before. This unique value is 0xAA55 and must be in the last two bytes of the first sector. Despite 512 bytes available in the master boot record, we cannot use all of them; we need to subtract the partition table schema and signature, and only 440 bytes remain. It doesn’t seem like good memory space, but you can write code to load more data from other sectors into memory and solve the problem
The initialization steps in a simplified way
- The BIOS boots computers and their peripherals;
- The BIOS searches for bootable devices;
- When the BIOS finds the signature 0xAA55 in the MBR (master boot record), it loads that sector into memory at position 0x7C00 and gives control to this entry point, that is, it starts executing instructions from this point 0x7C00 in memory ;
Let’s start coding
As you can imagine, this is assembly language and needs to be compiled into machine code with an assembly compiler to generate machine code, as you can see in the next block of code. Note that 512 in hexadecimal notation is 0x200 and the last two bytes are 0x55 and 0xAA; it is inverted compared to the assembly code above; this is related to the storage ordering system known as endianness. For example, in a big-endian system, the two bytes required for the hexadecimal number 0x55AA would be stored as 0x55AA in storage (if 55 is kept at storage address 0x1FE, AA will be at address 0x1FF). On a little endian system, it would be stored as 0xAA55 (AA at address 0x1FE, 55 at 0x1FF).
How this code works
I will explain this code line by line in case you are not familiar with assembly language.
1-) Specifying target processor mode, this BITS directive specifies whether NASM should generate code designed to run on a processor operating in 16-bit mode, 32-bit mode, or 64-bit mode. The syntax is BITS XX, where XX is 16, 32, or 64.
2-) Specifying the binary file program origin, this ORG directive is to specify the origin address, which NASM will assume the program begins at when it is loaded into memory. When this code is translated to machine code, the compiler and linker will determine and organize all data structures needed by the program; this reference address will be used for this purpose.
3-) This is just a label; when defined in code, it refers to a memory position you can point to; it is used together with jump instructions to control the application’s flow; This idea will make more sense in the next line.
After explaining the fourth line, we need to describe the concept of registers:
A processor register is a quickly accessible location available to a computer’s processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions and may be read-only or write-only. In computer architecture, registers are typically addressed by mechanisms other than main memory but may, in some cases, be assigned a memory address. This definition was extracted from Wikipedia.
4-) Assigning data with MOV instruction, this instruction is used to move data; in this case, we are moving the value of the memory address of the label message to the SI register; Which will point to the text “Hey! This code is my boot loader operating.”. If you look in the image below, you will see that this text is stored at poison 0x7C10 when translated to machine code.
5-) We will use the BIOS video services to display the text on the screen, so we are settings how we want this work. It moves the byte 0x0E to the register AH.
6-) Another label reference that allows controlling the execution flow, later we will use it to create a loop.
7-) This instruction loads a byte from the source operand into the AL register. Remember the fourth line, in which the SI was set with a text address position; now, this instruction is getting the character stored at memory space 0x7C10. It is vital to notice this is behaving like an array, and we are point to the first position, which contains the character ‘H’; as we can see in figure 03 below. This text presentation will occur in a vertically iterative manner, and each character will be set each time. Besides, the second character was not presented extracted snapshot from the IDA program; 0x65 in ASCII represents the character ‘e’.
8-) Performing OR boolean operation between (AL | AL), at first glance, it does not seem to make any sense, but it does. We need to check if the result of this operation is zero, based on logic boolean, the result will be the same after this operation, for example, [1 | 1 = 1] or [0 | 0 = 0]. In the next line, you are going to understand why this is necessary.
9-) Jump to halt label (Line 12) if the result of the last OR operation is zero, in the first moment, the value of AL is [0x48 = ‘H’] based in the last LODSB instruction, do you remember that on Line 7? So, it will not jump to halt label in the first moment. Why that? (0x48 OR 0x48) = 0x48, then it will go to the next instruction in the next line. It is important to say that JZ instruction is not related only to OR instruction. There is another register called FLAGS, which is observed during jump operations, i.e., the result of OR operation is stored in this FLAG register and observed by JZ instruction.
10 -) Invoking BIOS interruption, the instruction INT 0x10 displays the value of AL on the screen; remember line 5, we set the value of AH with 0x0E; this is a combination to present the value of AL on the screen.
11-) Jump to .loop label, that’s it without any condition, it is like GOTO instruction compared to a high-level language.
12-) We are in line 7 more one time, LODSB will retake action. After the byte is transferred from the memory location into the AL register, the SI register is incremented. The second time, it is pointing to the address 0x7C11 = [0x65 ‘e’], then the character ‘e’ is presented on the screen. This loop will run until it reaches the address 0x7C3B = [0x00 0], and when the JZ is executed again on line 9, the flow will be driven to the halt label.
13-) Here, we finish our journey. The CLI and HLT instructions halt the execution.
14-) At the seventeenth line you see an instruction that pads the remaining 510 bytes with zeroes after that adds the boot record signature 0xAA55.
Let’s build and run
In the first step, you must make sure you have installed the NASM compiler and QEMU on your computer, using your favorite dependency manager or downloading it from the internet; QEMU is a virtual machine emulator.
If you have Linux, you can type on terminal:
sudo apt-get install nasm qemu
On a mac you can use homebrew:
brew install nasm qemu
After accomplishing the first step, you must create a file with the assembly code presented in Code 01 block. Let’s name this file as boot.asm and then run NASM command:
nasm -f bin boot.asm -o boot.bin
It will produce the binary file you need to run your virtual machine, let’s run it on QEMU:
qemu-system-x86_64 -fda boot.bin
You should see the following screen:
Running it from Virtual box
Firstly you need to create a virtual an empty floppy disk
dd if=/dev/zero bs=1024 count=0 > floppy.img
And then append the binary content inside of it:
cat boot.bin >> floppy.img
Now you can create a Virtual Box machine and boot it using your image file.
I was not able to explain many things here for the sake of brevity; if this is your first time with this type of content, probably many questions have arisen in your head, ok, this is not an easy subject, and I hope it can serve as a starting point for many studies. A book that I recommend is Operating Systems Design and Implementation by Andrew S. Tanenbaum, a useful reference to better understand many principles of computing and operating systems.