DOS Boot Sector

September 2010

I've been interested in writing an OS for a long time now. An OS consists of many components with one of the most fundamental being its booting mechanism. Had I been writing a production OS, I would have made use of a package such as GNU GRUB or LILO. However, as a hobbyist I was interested to know exactly what my PC was doing during the boot process. I decided that a good way to start would be to study a simple operating system -- MS-DOS. An MS-DOS boot sector has a very simple job: load the first 3 sectors of IO.SYS into memory and execute it.

After your BIOS completes its POST, an IBM PC compatible computer will read the first 512 B block from disk into memory at location 0x07C00 and begin executing it. The last 2 B of the boot sector must have the value 0xAA55; this value is known as the boot signature. This leaves 510 B for code.

MS-DOS expects the disk to be formatted with the FAT file system and will populate the boot sector with an 8 B OEM name and a 51 B data structure known as the BIOS parameter block. The first 3 B are expected to contain a jump instruction. This finally leaves us with 448 B for code. Had I been writing a production DOS boot sector, I would have written the code in an assembly language under such extreme constraints. However, as a philocalist and masochist I felt compelled to write legible code and decided to use C.

Free and reserved bytes in an MS-DOS boot sector (1 B per square)

The BIOS parameter block contains important information about the layout of the filesystem. Here is a table describing its layout:

Length Name
2 Bytes per sector
1 Sectors per cluster
2 Number of reserved sectors
1 Number of file allocation tables
2 Number of root entries
2 Number of sectors (if < 65 536)
1 Media descriptor
2 Sectors per file allocation table
2 Sectors per track
2 Number of heads
4 Number of hidden sectors
4 Number of sectors (if ≥ 65 536)
1 Disk drive index
1 Reserved
1 Volume signature
4 Volume ID
11 Volume label
8 Volume type

The CPU will be in real mode when the boot sector is loaded. This means we can only use 16-bit opcodes and address up to 1 MiB of memory. The first 640 KiB are available to our program while the remaining 384 KiB are used for assorted system-specific purposes. These memory areas are known as conventional memory and the upper memory area, respectively.

Some parts of conventional memory are reserved by the system. The first 1 024 B are used for the interrupt vector table and the next 256 B are used for the BIOS data area. Also, recall that the boot sector is loaded in 512 B in [0x07C00, 0x07E00). We can safely use 29.75 KiB B in [0x00500, 0x07C00) and 480.5 KiB in [0x07E00, 0x80000) for a total of 510.25 KiB. There are also 128 KiB in [0x80000, 0xA0000), but some systems consume part of this region for the extended BIOS data area.

Free, partial, and reserved bytes in conventional memory (1 KiB per square)

In my boot sector implementation, I use 5 B in [0x07E00, 0x07E05) to store the number of sectors on the disk and the logical block address of the root directory and IO.SYS. I use 29.75 KiB in [0x00500, 0x07C00) for the root directory index. Each root directory entry is 32 B, meaning that IO.SYS must be one of the first 952 entries. (MS-DOS 4.0 expects IO.SYS to be the first record in the root directory.) Here is a table describing the layout of each root directory entry:

Length Name
8 Filename
3 Extension
1 Attributes
1 Reserved
1 Creation time (microseconds portion)
2 Creation time
2 Creation date
2 Last access date
2 Reserved
2 Last modified time
2 Last modified date
2 Cluster offset
4 File size in bytes

Dates are 16-bit, little-endian values stored in the following format: YYYYYYYMMMMDDDDD. Timestamps are 16-bit, little-endian values stored in the following format: HHHHHMMMMMMSSSSS.

Once IO.SYS is found, I store its first 3 sectors at 0x00700. I expect these 3 sectors to be unfragmented. This leaves 512 B in [0x00500, 0x00700) free for IO.SYS to store a copy of the boot sector later on.

Compiling the code into a raw binary with 16-bit opcodes became my next challenge. I was pleased to find that this is possible with GCC and binutils with a little bit of magic. First, I had to add the .code16gcc assembler directive to my C code. I also had to create a custom linker script to create a raw binary with a boot signature. The script instructs ld to construct a binary with a code segment, read-only data segment, and a boot signature. It also sets the instruction pointer to the correct memory offset.

The source code is released under the MIT license and is also available on GitHub at github.com/kjiwa/x86-boot-sector-c.

 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the
 * "Software"), to deal in the Software without restriction, including
 * without limitation the rights to use, copy, modify, merge, publish,
 * distribute, sublicense, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so, subject to
 * the following conditions:
 * The above copyright notice and this permission notice shall be included
 * in all copies or substantial portions of the Software.

 * An MS-DOS boot sector that reads the hard disk drive's root directory,
 * searches for IO.SYS, loads the first 3 sectors into memory, and executes it.

// Tell GCC to emit 16-bit opcodes.
asm (".code16gcc");

// The first 3 bytes are reserved for a JMP instruction that we use to jump to
// the _start() method. The next 59 bytes are reserved for the OEM name and the
// BIOS parameter block. We set drive_index to be 0x80 to indicate we are
// booting from a hard disk drive.
asm ("jmp _start");
asm (".space 0x0021");
asm (".byte 0x80");
asm (".space 0x001c");

 * Standard type definitions for the x86 in real mode.
typedef char           s8_t;
typedef unsigned char  u8_t;
typedef unsigned short u16_t;
typedef unsigned long  u32_t;

 * The boot sector.
 * @param _a                  3 reserved bytes used for a JMP instruction.
 * @param name                The OEM name of the volume.
 * @param bytes_per_sector    The number of bytes per sector.
 * @param sectors_per_cluster The number of sectors per cluster.
 * @param reserved_sectors    The number of reserved sectors.
 * @param fats                The number of file allocation tables.
 * @param root_entries        The number of entries in the root directory.
 * @param total_sectors       The number of hard disk sectors. If zero, use the
 *                                value in total_sectors2.
 * @param media_descriptor    The media descriptor.
 * @param sectors_per_fat     The number of sectors per file allocation table.
 * @param sectors_per_track   The number of sectors per track.
 * @param heads               The number of hard disk heads.
 * @param hidden_sectors      The number of hidden sectors.
 * @param total_sectors2      The number of hard disk sectors.
 * @param drive_index         The drive index.
 * @param _b                  Reserved.
 * @param signature           The extended boot signature.
 * @param id                  The volume ID.
 * @param label               The partition volume label.
 * @param type                The file system type.
 * @param _c                  Code to be executed.
 * @param sig                 The boot signature. Always 0xaa55.
typedef struct {
  s8_t  _a[3];
  s8_t  name[8];
  u16_t bytes_per_sector;
  u8_t  sectors_per_cluster;
  u16_t reserved_sectors;
  u8_t  fats;
  u16_t root_entries;
  u16_t total_sectors;
  u8_t  media_descriptor;
  u16_t sectors_per_fat;
  u16_t sectors_per_track;
  u16_t heads;
  u32_t hidden_sectors;
  u32_t total_sectors2;
  u8_t  drive_index;
  u8_t  _b;
  u8_t  signature;
  u32_t id;
  s8_t  label[11];
  s8_t  type[8];
  u8_t  _c[448];
  u16_t sig;
} __attribute__ ((packed)) boot_t;

 * Hard disk drive information.
 * @param sectors The number of sectors on the disk.
 * @param lba     The LBA of the block to read from the hard disk drive.
typedef struct {
  u8_t  sectors;
  u32_t lba;

 * A root directory entry.
 * @param filename           The file name.
 * @param extension          The file extension.
 * @param attributes         File attributes.
 * @param _a                 Reserved.
 * @param create_time_us     The microsecond value of the creation time.
 * @param create_time        The creation time.
 * @param create_date        The creation date.
 * @param last_access_date   The date the file was last accessed.
 * @param _b                 Reserved.
 * @param last_modified_time The time the file was last modified.
 * @param last_modified_date The date the file was last modified.
 * @param cluster            The cluster containing the start of the file.
 * @param size               The file size in bytes.
typedef struct {
  s8_t  filename[8];
  s8_t  extension[3];
  u8_t  attributes;
  u8_t  _a;
  u8_t  create_time_us;
  u16_t create_time;
  u16_t create_date;
  u16_t last_access_date;
  u8_t  _b[2];
  u16_t last_modified_time;
  u16_t last_modified_date;
  u16_t cluster;
  u32_t size;
} __attribute__ ((packed)) entry_t;

 * A pointer to the boot sector. After the BIOS POST, the boot sector is loaded
 * into memory at 0x7c00.
boot_t const* _bs = (boot_t*) 0x7c00;

 * A pointer to the hard disk drive information. We use 0x7e00, the first block
 * of available memory next to the boot sector.
FILE* _disk = (FILE*) 0x7e00;

 * A constant containing the name of the binary used to initialize the system
 * and device drivers. This is typically IO.SYS or IBMBIO.COM.
s8_t const* _io_bin = "IO      SYS";

 * A pointer to a general-purpose buffer in memory. Used to store the root
 * directory entries and the first 3 sectors of IO.SYS.
u8_t* _buffer;

 * A variable used by the read() method to indicate how many sectors to read
 * from the hard disk into memory.
u8_t _size;

 * A pointer to the root directory entry currently being read.
entry_t const* _entry;

 * Checks if the current root directory entry has the name IO.SYS by performing
 * a string comparison.
 * @return 0 if the current root directory entry has the name IO.SYS, a
 *     negative value if the entry name is less than IO.SYS, or a positive
 *     value if the entry name is greater than IO.SYS.
s8_t iosyscmp() {
  u16_t i;
  for (i = 0;
       i < 10 && ((s8_t*) _entry)[i] && ((s8_t*) _entry)[i] == _io_bin[i];
  return ((s8_t*) _entry)[i] - _io_bin[i];

 * Reads sectors at the provided LBA from the hard disk into memory. Uses the
 * value stored in _size as the number of sectors to read.
 * @return 0
u16_t read() {
  // convert the LBA into CHS values
  u32_t t = _bs->heads * _disk->sectors;
  u16_t c = _disk->lba / t;
  u16_t h = (_disk->lba % t) / _disk->sectors;
  c <<= 8;
  c |= ((_disk->lba % t) % _disk->sectors) + 1;

  // read sectors from the hard disk into memory at 0x0500
  asm ("int $0x13" : : "a"(0x0200 | _size), "b"(_buffer), "c"(c), "d"((h << 8) | 0x0080));
  return 0;

 * Reads drive information, searches for IO.SYS, loads the first 3 sectors into
 * memory, and begins executing it.
 * @return 0
u16_t _start() {
  // read hard disk drive information into memory at 0x7e00
  asm ("int $0x13" : "=c"(_disk->sectors) : "a"(0x0800), "d"(0x80) : "bx");
  _disk->sectors &= 0b00111111;

  // read the root directory into memory at 0x500
  _buffer = (u8_t*) 0x0500;
  _disk->lba = _bs->reserved_sectors + (_bs->fats * _bs->sectors_per_fat);
  _size = _bs->root_entries * sizeof(entry_t) / _bs->bytes_per_sector;

  // iterate over root directory entries and look for IO.SYS
  for (_entry = (entry_t*) _buffer; ; ++_entry)
    if (iosyscmp() == 0) {
      // load the first 3 sectors of IO.SYS into memory at 0x0700
      _buffer = (u8_t*) 0x0700;
      _disk->lba += _size + (_entry->cluster - 2) * _bs->sectors_per_cluster;
      _size = 3;

      // execute IO.SYS
      asm ("jmpw %0, %1" : : "g"(0x0000), "g"(0x0700));

  return 0;