Kapow - Making a Game in Real-Mode x86 Assembly

Why Real-Mode Assembly?
The Hard Parts
- The Addressing Problem
- Bootlegged Effective Addressing
The Toolchain
Graphics: Mode 13h
The Asset Pipeline Problem
- PCX → BMP → Binary
The Bootloader
Animations
What I Learned

Kapow is a remake of Kaboom! (1981) for the Atari 2600. It runs on an x86 machine without an operating system, exclusively in 16 bit real-mode. I made it as a bonus assignment for CSE1400 at TU Delft. The source code is available at Github.

Why Real-Mode Assembly?

The assignment called for a simple game in assembly. Most students wrote Linux CLI programs in x86. I decided to go a bit further and write something that would boot directly from disk, no operating system required. This meant working in Real Mode—the state your CPU enters on startup, where it behaves like a 1978 Intel 8086 with a much higher clock speed. Sixteen bit registers, one megabyte of RAM, and none of the conveniences of a modern operating system.

Was this a good idea? Probably not. Was it educational? Absolutely.

The Hard Parts

Working in real-mode with a modern assembler presented some unexpected challenges. NASM is designed for 32 and 64 bit code, and its 16 bit support has some rough edges.

The Addressing Problem

You might notice that in Real Mode we have access to a whole megabyte of RAM, but we are using 16 bit registers to address it. How is it possible to address one million bytes, if in 16 bits we can only represent 64 thousand distinct values? The answer lies in Memory Segmentation—another register provides an additional 4 bits for addressing, giving us 20 bits total. Every 64 kilobytes of RAM is referred to as one segment.

The main problem with using NASM in 16 bit code starts when you try to modify the value at a label:

var: dw 0x123
mov ax, [var]
mul 20
mov [var], ax

Normally, you would expect this to work. However, when dealing with 16 bit code NASM refuses to do this. Upon further digging, it seemed possible to use seg to load the segment of the label, but this requires Microsoft COM files. In order to preserve the debugging advantages of using a modern ELF format, I had to find another solution.

Bootlegged Effective Addressing

The workaround was to define an area in memory where variables would be stored. This was done in game/var_locs.asm. A base offset is defined (0xC000), and subsequently the location of every single variable used by the game is defined manually. This also necessitates the initialization of each memory location in the respective file where the variables are used.

The same approach applies to sprites. The easiest solution would be:

sprite: incbin "./mysprite.bin"
mov ax, seg sprite
mov es, ax
mov ax, sprite
call my_draw_routine

But due to the aforementioned problems, this is not possible. Instead, sprite locations are stored in game/sprite_constants.asm, and the sprites themselves are loaded into a separate segment. To make this work, the code segment is padded to exactly 64 kilobytes:

times 0xFFFF - ($-$$) db 0

This way we know exactly where the sprites will be—in the next segment after the code.

The Toolchain

With these constraints in mind, the toolchain ended up being:

NASM: Assembler to transform source into binary.
QEMU: x86 emulator for running and debugging.
DOSBox: DOS emulator for running Deluxe Paint.

Why Deluxe Paint? That requires explaining the graphics situation.

Graphics: Mode 13h

In the late 80's and early 90's, the most common graphics hardware on a PC was the VGA Chip. To this day, practically every GPU implements its functionality. The most popular mode was Mode 13h, and it is trivial to set up:

mov ax, 0x13
int 0x10

Mode 13h gets its name from the value put into =ax=—13 hexadecimal.

Drawing to the Screen

Once set, the screen can be written to as a linear buffer at segment 0xA0000. The screen is 320x200 pixels, taking up an entire 64 kilobyte segment. Each byte represents a single pixel, with the value being an index into the color palette.

Since this is a linear framebuffer, drawing a pixel at coordinates (x, y) is straightforward:

offset = x + y * 320

A simple pixel-drawing function:

; Draws a pixel on the screen
; al - color
; bx - x-coordinate
; cx - y-coordinate
put_pixel:
        push bp
        mov bp, sp

        push ax
        mov ax, 320
        mul cx
        add bx, ax
        pop ax

        mov dx, 0xA000
        mov gs, dx
        mov [gs:bx], al

        mov sp, bp
        pop bp
        ret

The Palette

With one byte per pixel, we only have 256 possible values. If we used direct RGB, we would have around 2.5 bits per component—not great for artwork. Instead, Mode 13h uses indexed color: each byte is an index into a 256-entry palette, where each entry specifies an RGB color.

Setting a palette entry involves direct communication with the VGA card:

; Sets an index in a palette to a color
; ah - index
; bl - red (0-63)
; bh - green (0-63)
; cl - blue (0-63)
set_palette_index:
        push bp
        mov bp, sp

        mov al, 0xFF
        mov dx, 0x3C6           ; PEL Mask Register
        out dx, al

        mov al, ah
        mov dx, 0x3C8           ; PEL Address Write Mode Register
        out dx, al

        mov dx, 0x3C9           ; PEL Data Register
        mov al, bl
        out dx, al
        mov al, bh
        out dx, al
        mov al, cl
        out dx, al

        mov sp, bp
        pop bp
        ret

The VGA card only uses the lower 6 bits of each color component, hence values 0-63.

VSync and Double Buffering

Writing directly to the framebuffer with moving elements causes severe screen tearing. The solution is double buffering: write to a separate memory segment (0x70000), then copy to the framebuffer during the vertical blank period.

Before copying, we wait for the right moment:

vt_set:
        in al, dx
        and al, 8
        jnz vt_set

vt_clr:
        in al, dx
        and al, 8
        jz vt_clr

Then copy the entire buffer:

copy_buffer:
        push bp
        mov bp, sp

        mov cx, 0x7000
        mov ds, cx
        xor si, si             ;ds:si = source

        mov cx, 0xa000
        mov es, cx
        xor di, di             ;es:di = destination

        mov cx, 32000          ;32k words to fill the screen
        mov dx, 0x3da          ; VGA status register

        rep movsw

        mov sp, bp
        pop bp
        ret

The Asset Pipeline Problem

Sprite creation is normally straightforward: draw in GIMP, export, load. But Mode 13h has some complications:

No libraries means writing your own image format parsers.
256 indexed colors is not trivial to work with in modern tools.
Mode 13h does not have square pixels. 320x200 is stretched to 4:3 aspect ratio. What you see in GIMP is not what appears on screen.

The solution was to use Deluxe Paint, a sprite editor from the early 90's that runs in Mode 13h itself. The pixels display correctly, and the program is specifically designed for this graphics mode.

There is still a problem: the most modern format Deluxe Paint supports is PCX.

PCX → BMP → Binary

PCX is RLE encoded, which is annoying to decompress. The Makefile uses ImageMagick to convert to uncompressed BMP:

convert -compress none assets/$(1).PCX out/assets/$(1).bmp

BMP is simpler, but still has issues: it contains header and palette data we do not need, and pixels are stored bottom-to-top. A small C program (src/assets/conv_asset.c) converts BMP to raw binary with top-to-bottom pixel order:

struct BMPHeader head;
fread(&head, sizeof(head), 1, source);
skip_bytes(head.offset-sizeof(head), source);

char out[height][width];

// Read bottom-to-top (BMP format)
for(int i = height-1; i >= 0; i--){
  for(int z = 0; z < width; z++){
    out[i][z] = fgetc(source);
  }
}

// Write top-to-bottom (our format)
for(int i = 0; i < height; i++){
  for(int z = 0; z < width; z++){
    fputc(out[i][z], output);
  }
}

Finally, the binary files are concatenated onto the disk image:

cat out/assets/$(1).bin >> out/HD.img

The Bootloader

When making a bootable program, the Bootloader is the first code to run. The BIOS loads the first 512 bytes from the storage device into memory at 0x7c00, then jumps to it.

Kapow's bootloader (bootup/booter.asm) does the following:

Load the game code from disk into memory:

mov ax, game_start
mov es, ax              ; segment address to copy to
mov ah, 0x2             ; read sectors from drive
mov al, 128             ; amount of sectors to read
mov ch, 0               ; cylinder
mov dh, 0               ; head
mov cl, 2               ; sector
mov dl, 0x80            ; disk
mov bx, 0               ; address to copy to
int 0x13

Set up the timer using the PIT to fire approximately 100 times per second:

mov al,00110100b          ;channel 0, lobyte/hibyte, rate generator
out 0x43, al

mov ax,PIT0_reload        ;ax = 16 bit reload value
out 0x40,al               ;Set low byte of PIT reload value
mov al,ah
out 0x40,al               ;Set high byte of PIT reload value

Hook the timer interrupt by writing to the IVT. This makes the game loop run on every timer tick:

mov ax, game_start
mov [cs:0x1c*4+2], ax   ; move segment of game to IVT
mov ax, game_tirq
mov [cs:0x1c*4], ax     ; move address of irq to IVT

Animations

The only real animation in Kapow is bomb explosions, of which there can be many on screen simultaneously. They are stored in a circular queue.

The Data Structure

Each explosion has a position and an animation state (a fixed-point integer representing the current frame):

%define explosion_state bomb_y+2*number_of_bombs
%define explosion_x explosion_state+2*number_of_explosions
%define explosion_y explosion_x+2*number_of_explosions
%define explosion_start_index explosion_y+2*number_of_explosions
%define explosion_end_index explosion_start_index+1

Creating an Explosion

When a bomb explodes, we add it to the queue. If the queue is full, the explosion is simply dropped:

; Starts an explosion animation where a bomb is
; ax - bomb index
explode_bomb:
        push bp
        mov bp, sp
        push si
        push di
        push bx

        mov si, ax

        xor bx, bx
        mov bl, [cs:explosion_end_index]
        push bx
        inc bl
        mov ax, 0
        cmp bl, number_of_explosions
        cmove bx, ax
        cmp bl, [cs:explosion_start_index]
        pop bx
        je expb_e

        mov di, bx
        inc byte [cs:explosion_end_index]

        cmp di, number_of_explosions-1
        jne expb_c
        mov byte [cs:explosion_end_index], 0

expb_c:
        shl di, 1
        shl si, 1
        mov ax, [cs:bomb_x+si]
        mov [cs:explosion_x+di], ax
        mov ax, [cs:bomb_y+si]
        mov [cs:explosion_y+di], ax
        mov word [cs:explosion_state+di], 0

expb_e:
        pop bx
        pop di
        pop si
        mov sp, bp
        pop bp
        ret

Updating and Rendering

Each tick, explosions advance through their frames. When an explosion completes, it is removed from the queue. The rendering code iterates through the queue and draws each explosion at the correct frame:

render_explosions:
        push bp
        mov bp, sp

        xor ax, ax
        mov al, [cs:explosion_start_index]
        xor di, di
        mov di, ax

        xor cx, cx
        mov cl, [cs:explosion_end_index]

r_ex_l:
        cmp di, cx
        je r_ex_e

        push cx
        push di

        shl di, 1

        mov dx, bomb_width*bomb_height
        mov ax, [cs:explosion_state+di]
        shr ax, 8
        mul dx
        mov dx, expls_loc
        add dx, ax

        mov ax, [cs:explosion_y+di]
        shr ax, 4
        mov bx, [cs:explosion_x+di]
        mov ch, bomb_width
        mov cl, bomb_height
        call draw_sprite

        pop di
        pop cx

        inc di
        cmp di, number_of_explosions
        jne r_ex_l
        mov di, 0
        jmp r_ex_l

r_ex_e:
        mov sp, bp
        pop bp
        ret

What I Learned

Writing a game without an operating system teaches you what operating systems actually do for you. Memory management, file loading, graphics drivers, input handling—all of it has to be built from scratch.

The biggest surprise was the tooling problems. Modern assemblers assume modern environments. Working around NASM's 16-bit limitations took more time than expected.

Would I do it again? For a university bonus assignment, probably not. But it was a good way to understand what happens between pressing the power button and seeing something on screen.

At the end of the course there was a competition where everyone voted on each other's games. Kapow won by popular vote, so I suppose the extra effort paid off.

The source code is at Github.