Binary encoding

April 29, 2017 ยท View on GitHub

Status: Initial draft

The virtual machine doesn't have a name yet, but to keep it short, we refer to it as the machine.

This document contains the specification of the binary encoding of instructions, immediate values, constants, registers, jump addresses, block ordering and global constants.

Content ordering

+----------+
| Header   |
+----------+
| Segments |
+----------+
NameOffsetSizeDescription
magic0x004ascii encoded string NICE
entry_addr0x044Initial value of the ip register
load_table_size0x084Number of entries in the load table
load_table0x0cload_table_size * 12Load table

Load table

The load table in the programs header section includes 1 or more entries. Each entry is structured as follows:

NameOffsetSizeDescription
Offset0x04Start offset of section
Size0x44Size of section
Load0x84Target offset in machine's memory

Given the following load table:

IDOffsetSizeLoad
00x000000000x000002aa0x00337a00
10x000002aa0x00337a000x00000000
20x00337caa0x000000500x00000050
30x00337cfa0x000001000x00700000

The executable would consist of four segments.

+-----------+ <- 0x00000000
| Segment 0 |
+-----------+ <- 0x000002aa
| Segment 1 |
+-----------+ <- 0x00337caa
| Segment 2 |
+-----------+ <- 0x00337cfa
| Segment 3 |
+-----------+ <- 0x00337dfa

This is how they would ultimately be laid out in memory:

+-------------+ <- 0x00000000
| Segment 1   |
|             |
+-------------+ <- 0x00000050
| Segment 2   | <----------------- Because the load address of Segment 2 is set to 0x00000050,
+-------------+ <- 0x000000a0      it overlaps Segment 1 which was loaded here before.
| Segment 1   |
|             |
|             |
|             |
|             |
|             |
|             |
+-------------+ <- 0x00337a00
| Segment 0   |
|             |
|             |
+-------------+ <- 0x00337caa
|             |
|             |
|             |
|             |
|             |
|             |
|     ...     |
| Empty space |
|     ...     |
|             |
|             |
|             |
|             |
|             |
|             |
+-------------+ <- 0x00700000
| Segment 3   |
+-------------+ <- 0x00700100

Registers

Registers are represented as 8 bit values. The first two bits make up the mode, the rest is the register code.

   +- Register code
   |
   v
00 000000
^
|
+- Mode

Register modes define which part of the register is being accessed.

00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
01: 00000000 00000000 00000000 00000000
10: 00000000 00000000
11: 00000000

Instructions

Instructions are represented as 8 bit values.

+- Opcode
|
v
00000000

Size specifiers

Size specifiers are encoded as unsigned dword values and are used to denote a given amount of bytes in an instruction argument.

Addresses

Addresses are encoded as unsigned dword values.

Offsets

Relative offsets are encoded as signed dword values.