Bare Bones Emulator

From CPUDev Wiki
Jump to: navigation, search

This tutorial aims to provide the reader with instructions of how to write a simple CPU emulator for the Bare Bones ISA. We will use C as the programming language.

To test the emulator, we will also write a ROM image that will contain code to read a byte from a predefined memory address, increment it and write it to the TTY device (emulator's stdout).

Programming Environment

We will use gcc as the compiler (available in most Linux distros). We will use -Wall -Wextra and possibly -Wshadow for warnings, as these are the minimum warning flags that should be passed to gcc.

Programming Practices

Return values

We made up a convention so that most functions return 1 on success and -1 on failure. The only function this convention doesn't apply to is main() because it has different semantics. In this case, success is 0 and failure is 1.

Error reporting

We made up a convention so that we will do error reporting in the most nested function possible. That's to avoid information loss about the error.

Bitfields vs. Masks and Shifts

The C standard does not specify the order of bitfields in a variable, which makes them mostly useless here. So we will resort to using masks and shifts.

Implementation

Let's start implementing then!

Emulator Code

The first lines contain the boring #include stuff.

#include <sys/stat.h> /* Need stat() */

#include <errno.h> /* Need errno */
#include <stddef.h> /* Need NULL */
#include <stdint.h> /* Need uintN_t */
#include <stdio.h> /* Need much stdio stuff */
#include <string.h> /* Need strerror() */

The next lines contain various variables and statically allocated arrays:

  • regs: We have defined an array of 16 elements, each representing the value of an 8-bit register.
  • ip: We have defined an 8-bit variable, representing the instruction pointer. Note that its initial value is 0xC0 since we have defined in the Bare Bones ISA that execution starts from 0xC0.
  • ram: We have defined an array of 0x80 elements, each representing a byte in the RAM device. Ironically; this device will not be used by the emulated code provided later in this tutorial.
  • rom: We have defined an array of 0x40 elements, each representing a byte in the ROM device. (TODO: Should it be const?)
uint8_t regs[16]; /* CPU registers */
uint8_t ip = 0xC0; /* CPU instruction pointer */

uint8_t ram[0x80]; /* Statically allocated RAM memory */
uint8_t rom[0x40]; /* Statically allocated ROM memory */

Then we have the memread() function. Notice that reading to 0x80 causes an error. Notice that reading from inexistent memory also causes an error.

int memread(uint8_t addr, uint8_t* data)
{
	if (addr < 0x80)
	{
		*data = ram[addr];
		return 1;
	}
	if (addr == 0x80)
	{
		fprintf(stderr, "error: Invalid memory read from write-only address 0x%02hhX\n", addr);
		errno = EINVAL;
		return -1;
	}
	if (addr >= 0xC0)
	{
		*data = rom[addr - 0xC0];
		return 1;
	}
	fprintf(stderr, "error: Invalid memory read from invalid address 0x%02hhX\n", addr);
	errno = EINVAL;
	return -1;
}

Next comes the memwrite() function. Notice that writing to 0x80 actually writes to emulator's stdout and that writing to 0xC0 causes an error. Notice that writing to inexistent memory also causes an error.

int memwrite(uint8_t addr, uint8_t data)
{
	if (addr < 0x80)
	{
		ram[addr] = data;
		return 1;
	}
	if (addr == 0x80)
	{
		if (fputc(data, stdout) != EOF)
		{
			return 1;
		}
		fprintf(stderr, "error: stdout: %s\n", strerror(errno));
		return -1;
	}
	if (addr >= 0xC0)
	{
		fprintf(stderr, "error: Invalid memory write to read-only address 0x%02hhX\n", addr);
		errno = EINVAL;
		return -1;
	}
	fprintf(stderr, "error: Invalid memory write to invalid address 0x%02hhX\n", addr);
	errno = EINVAL;
	return -1;
}

The init() function only has to take care of loading the ROM image into the ROM array. However, to understand the details, a moderate knowledge of the C/POSIX standard library will be required. The procedure is outlined as follows:

  • The ROM image file is opened.
  • Its size is determined.
  • If it's bigger than 64 bytes:
    • Only the first 64 bytes of the image are read into the ROM array (as there is no more place in the ROM array).
  • Else
    • The full image is read into the ROM array (as the image is smaller than the ROM array).
  • The ROM image file is closed.
int init(void)
{
	/* Currently we only need to initialise contents of ROM */
	FILE* fp = fopen("rom.img", "r");
	if (!fp)
	{
		fprintf(stderr, "error: rom.img: %s\n", strerror(errno));
		return -1;
	}
	struct stat st;
	if (stat("rom.img", &st) < 0)
	{
		fprintf(stderr, "error: rom.img: %s\n", strerror(errno));
		fclose(fp);
		return -1;
	}
	size_t size = st.st_size;
	if (st.st_size > 64)
	{
		fprintf(stderr, "warning: rom.img larger than 64 bytes\n");
		size = 64;
	}
	if (fread(rom, 1, size, fp) < size)
	{
		fprintf(stderr, "error: rom.img: %s\n", strerror(errno));
		fclose(fp);
		return -1;
	}
	fclose(fp);
	return 1;
}

The emu() function is the central function of the emulator. It's also more verbose. For these two reasons, it will be explained more thoroughly than other functions.

int emu(void)
{

We start with an infinite loop. Inside this loop we will fetch, decode and execute instructions.

	while (1)
	{

Let's read the current instruction. Remembering that each instruction occupies two bytes, we read the two bytes separately and we reconstruct them into a 16-bit word (as big endian).

		uint8_t ins_high;
		uint8_t ins_low;
		if (memread(ip + 0, &ins_high) < 0)
		{
			return -1;
		}
		if (memread(ip + 1, &ins_low) < 0)
		{
			return -1;
		}
		uint16_t ins = (ins_high << 8) | (ins_low << 0);

Let's isolate the opcode. As the opcode takes bits 15:12, all we need to do is to mask out the rest of bits and right-shift the masked value by 12 bits (note that actually there is no reason to do the masking, but we still do it for clarity).

		uint8_t opcode = (ins & 0xF000) >> 12;

Let's handle HLT. Nothing special apart from breaking from the infinite loop.

		if (opcode == 0x1)
		{
			/* HLT */
			break;
		}

Let's handle LDA. We isolate the DEST register index (bits 11:8) and the SRC register index (bits 7:4). We read a byte from the memory address contained in SRC register into the DEST register.

		if (opcode == 0x2)
		{
			/* LDA */
			uint8_t dest = (ins & 0x0F00) >> 8;
			uint8_t src = (ins & 0x00F0) >> 4;
			if (memread(regs[src], &regs[dest]) < 0)
			{
				return -1;
			}
		}

Let's handle STA. We isolate the DEST register index (bits 11:8) and the SRC register index (bits 7:4). We write a byte from the SRC register into the memory address contained in the DEST register.

		if (opcode == 0x3)
		{
			/* STA */
			uint8_t dest = (ins & 0x0F00) >> 8;
			uint8_t src = (ins & 0x00F0) >> 4;
			if (memwrite(regs[dest], regs[src]) < 0)
			{
				return -1;
			}
		}

Let's handle LDI. We isolate the DEST register index (bits 11:8) and the SRC immediate (bits 7:0). We write the value of the SRC immediate into the DEST register.

		if (opcode == 0x4)
		{
			/* LDI */
			uint8_t dest = (ins & 0x0F00) >> 8;
			uint8_t src = (ins & 0x00FF) >> 0;
			regs[dest] = src;
		}

Let's handle ADD. We isolate DEST register index (bits 11:8), SRC1 register index (bits 7:4) and SRC2 register index (bits 3:0). Then we add together the values of the SRC1 and SRC2 registers and we write the result into the DEST register.

		if (opcode == 0x5)
		{
			/* ADD */
			uint8_t dest = (ins & 0x0F00) >> 8;
			uint8_t src1 = (ins & 0x00F0) >> 4;
			uint8_t src2 = (ins & 0x000F) >> 0;
			regs[dest] = regs[src1] + regs[src2];
		}

Let's handle an invalid opcode. We just error out.

		if (opcode >= 0x6)
		{
			/* INVAL */
			fprintf(stderr, "error: invalid opcode at address 0x%02hhX\n", ip);
			errno = EINVAL;
			return -1;
		}

Here we increment instruction pointer by two, in order for it to point to the next instruction.

		ip += 2;

And we finish the infinite loop body.

	}

And we finally return successfully.

	return 1;
}

Lastly, we have the main() function.

int main(void)
{
	if (init() < 0)
	{
		return 1;
	}
	if (emu() < 0)
	{
		return 1;
	}
	return 0;
}

Let's up it all together now.

#include <sys/stat.h> /* Need stat() */

#include <errno.h> /* Need errno */
#include <stddef.h> /* Need NULL */
#include <stdint.h> /* Need uintN_t */
#include <stdio.h> /* Need much stdio stuff */
#include <string.h> /* Need strerror() */

uint8_t regs[16]; /* CPU registers */
uint8_t ip = 0xC0; /* CPU instruction pointer */

uint8_t ram[0x80]; /* Statically allocated RAM memory */
uint8_t rom[0x40]; /* Statically allocated ROM memory */

int memread(uint8_t addr, uint8_t* data)
{
	if (addr < 0x80)
	{
		*data = ram[addr];
		return 1;
	}
	if (addr == 0x80)
	{
		fprintf(stderr, "error: Invalid memory read from write-only address 0x%02hhX\n", addr);
		errno = EINVAL;
		return -1;
	}
	if (addr >= 0xC0)
	{
		*data = rom[addr - 0xC0];
		return 1;
	}
	fprintf(stderr, "error: Invalid memory read from invalid address 0x%02hhX\n", addr);
	errno = EINVAL;
	return -1;
}

int memwrite(uint8_t addr, uint8_t data)
{
	if (addr < 0x80)
	{
		ram[addr] = data;
		return 1;
	}
	if (addr == 0x80)
	{
		if (fputc(data, stdout) != EOF)
		{
			return 1;
		}
		fprintf(stderr, "error: stdout: %s\n", strerror(errno));
		return -1;
	}
	if (addr >= 0xC0)
	{
		fprintf(stderr, "error: Invalid memory write to read-only address 0x%02hhX\n", addr);
		errno = EINVAL;
		return -1;
	}
	fprintf(stderr, "error: Invalid memory write to invalid address 0x%02hhX\n", addr);
	errno = EINVAL;
	return -1;
}

int init(void)
{
	/* Currently we only need to initialise contents of ROM */
	FILE* fp = fopen("rom.img", "r");
	if (!fp)
	{
		fprintf(stderr, "error: rom.img: %s\n", strerror(errno));
		return -1;
	}
	struct stat st;
	if (stat("rom.img", &st) < 0)
	{
		fprintf(stderr, "error: rom.img: %s\n", strerror(errno));
		fclose(fp);
		return -1;
	}
	size_t size = st.st_size;
	if (st.st_size > 64)
	{
		fprintf(stderr, "warning: rom.img larger than 64 bytes\n");
		size = 64;
	}
	if (fread(rom, 1, size, fp) < size)
	{
		fprintf(stderr, "error: rom.img: %s\n", strerror(errno));
		fclose(fp);
		return -1;
	}
	fclose(fp);
	return 1;
}

int emu(void)
{
	while (1)
	{
		uint8_t ins_high;
		uint8_t ins_low;
		if (memread(ip + 0, &ins_high) < 0)
		{
			return -1;
		}
		if (memread(ip + 1, &ins_low) < 0)
		{
			return -1;
		}
		uint16_t ins = (ins_high << 8) | (ins_low << 0);
		uint8_t opcode = (ins & 0xF000) >> 12;
		if (opcode == 0x1)
		{
			/* HLT */
			break;
		}
		if (opcode == 0x2)
		{
			/* LDA */
			uint8_t dest = (ins & 0x0F00) >> 8;
			uint8_t src = (ins & 0x00F0) >> 4;
			if (memread(regs[src], &regs[dest]) < 0)
			{
				return -1;
			}
		}
		if (opcode == 0x3)
		{
			/* STA */
			uint8_t dest = (ins & 0x0F00) >> 8;
			uint8_t src = (ins & 0x00F0) >> 4;
			if (memwrite(regs[dest], regs[src]) < 0)
			{
				return -1;
			}
		}
		if (opcode == 0x4)
		{
			/* LDI */
			uint8_t dest = (ins & 0x0F00) >> 8;
			uint8_t src = (ins & 0x00FF) >> 0;
			regs[dest] = src;
		}
		if (opcode == 0x5)
		{
			/* ADD */
			uint8_t dest = (ins & 0x0F00) >> 8;
			uint8_t src1 = (ins & 0x00F0) >> 4;
			uint8_t src2 = (ins & 0x000F) >> 0;
			regs[dest] = regs[src1] + regs[src2];
		}
		if (opcode >= 0x6)
		{
			/* INVAL */
			fprintf(stderr, "error: invalid opcode at address 0x%02hhX\n", ip);
			errno = EINVAL;
			return -1;
		}
		ip += 2;
	}
	return 1;
}

int main(void)
{
	if (init() < 0)
	{
		return 1;
	}
	if (emu() < 0)
	{
		return 1;
	}
	return 0;
}

Assuming our source is in emu.c and we want to have an executable named emu, we need the following command to compile the emulator:

gcc -Wall -Wextra -Wshadow -o emu emu.c

Let's run the emulator now with:

./emu

However, the following error message will be output:

error: rom.img: No such file or directory

We need to create the ROM image!

ROM Hex Code

Assuming an "opcode[ dest[, src[, src2]]]" syntax:

LDI R0, 0xCE      # Load the address of the predefined byte into R0
LDI R1, 0x80      # Load the address of the TTY into R1
LDA R2, R0        # Load the predefined byte into R2
LDI R3, 1         # Load the constant 1 into R3
ADD R2, R2, R3    # Increment R2
STA R1, R2        # Store the value from R2 into TTY
HLT               # Halt
DB 0x41           # Predefined byte 0x41='A'

If we assemble this by hand (we have no assembler written yet), we get:

LDI R0, 0xCE      # 0x4=LDI 0x0=R0   0xCE=0xCE
LDI R1, 0x80      # 0x4=LDI 0x1=R1   0x80=0x80
LDA R2, R0        # 0x2=LDA 0x2=R2   0x0=0x0   0xX=RESV
LDI R3, 1         # 0x4=LDI 0x3=R3   0x01=1
ADD R2, R2, R3    # 0x5=ADD 0x2=R2   0x2=R2    0x3=R3
STA R1, R2        # 0x3=STA 0x1=R1   0x2=R2    0xX=RESV
HLT               # 0x1=HLT 0xX=RESV 0xX=RESV  0xX=RESV
DB 0x41           # 0x41

So we enter the following bytes in a hex editor (assuming RESV bits are all 0):

0x40 0xCE 0x41 0x80 0x22 0x00 0x43 0x01 0x52 0x23 0x31 0x20 0x10 0x00 0x41

Then we save the file under the name rom.img.

Now, if we run the emulator in the same way as above, it will seem like it did nothing at first glance. However, there will be a 'B' character before the prompt for the next command. Yes, you guessed right, that's the 0x41 ('A') predefined byte incremented to 0x42 ('B').

Note that in some terminals there might be additional characters after the 'B' character probably due to stdout being not '\n'-terminated. Just in case, try redirecting stdout to an on-disk file.

Next Steps

For starters, you might want to extend the ISA and play a bit with the emulator code. In the future, however, you might want to design a new ISA from the start up. See Category:ISA Design Considerations for some ideas. Who knows, maybe you will even invent something new!