Dani Drywa






1BRC - Attempt #1: The baseline


I won't be going into detail in how the benchmark or the run code works. You can check out the details in the source code zip file at the bottom of this page. Instead, I want to focus on the actual attempt code. This first attempt serves as a baseline against which I will measure all other implementations. Should an attempt ever be slower as the baseline, it can be considered an immediate failure.


File Reading


The first step is to open the 12.8 GiB measurements file and read data out of it. Whenever I write file parsers I usually just read the entire file from storage into memory. This is very convenient because I can just parse the whole file as a single chunk of contiguous memory and not worry about broken up sequences of bytes. But, this is only a good idea if the files remain relatively small. For an Assembler or a Compiler the source files are usually fairly small so the cost of holding entire files in memory is no big deal for modern computers. However, this is 12.8 GiB of data. My machine sure has enough RAM to hold on to the whole file, but I don't think copying this amount upfront is a good idea. Instead, I will be reading the file in chunks and process it one chunk at a time.

Whenever I hear about reading files, the term Memory-mapped file is being thrown around quite a bit. The operating system basically provides access to a resource, in my case the measurements file, as if that resource was already loaded into memory. So form a programming perspective all I would get is a base address to the beginning of the resource, and I would be able to iterate over it byte-by-byte, just like I would do with a simple array. The operating system is handling the reading/copying of the data from storage into memory under the hood, which means I do not have to read and mange chunks myself.

On Windows, I first have to open a file handle to the measurements file. This can be done with CreateFile.

void *handle = CreateFile(
    StringToCString(filename), GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, 0, 0);
if (handle == INVALID_HANDLE_VALUE) {
    FatalWindowsError(GetLastError());
}
Opening the measurements file.

Then I can get the size of the file via the file handle.

LARGE_INTEGER file_size;
if (GetFileSizeEx(handle, &file_size) == 0) {
   FatalWindowsError(GetLastError());
}
Retrieving the file size.

Then I have to create a file mapping by calling CreateFileMapping. This will return a new handle for the mapping itself.

void *mapping = CreateFileMapping(handle, 0, PAGE_READONLY, 0, 0, 0);
if (mapping == INVALID_HANDLE_VALUE || mapping == 0) {
    FatalWindowsError(GetLastError());
}
Creating the file mapping.

I can not use the mapping handle to just start reading bytes from the file. Instead, I have to provide a view into the mapping. Think of the view as a window into the file. It can start at any position within the file, and it can be of any size I wish to be able to read. This makes it possible to create multiple views into different regions of the file. However, in this attempt I am just going to create a single view into the file that contains the whole contents of the file. For more information see the Windows API documentation on file mapping.

u8 *base = (u8 *)MapViewOfFile(mapping, FILE_MAP_READ, 0, 0, file_size.QuadPart);
if (base == 0) {
    FatalWindowsError(GetLastError());
}
Creating a view into the mapped file.

I can now iterate over the base pointer and read the contents of the file byte-by-byte without any extra work. Neat!

file_map file = OpenFileMap(g_measurements_filename);

u64 start = file.buffer.position;
for (;file.buffer.position < file.buffer.capacity; file.buffer.position++) {
    // Parse file...
}
Reading the file one byte at a time.


Hashing


The measurements file contains multiple lines of weather station name and temperature value pairs. The challenge is to calculate the min, mean, and max of each weather station. This means a weather station can contain multiple temperature values in the same measurements file. This is a classic use case for a hash map in which the weather station name is used as the key to a temperature data structure. However, C does not have a default hash map implementation, which means I had to create my own.

Instead of a generic hash map implementation I decided to to just create a bespoke one. I went for an open addressing table to keep it simple. The hash map buckets also serve as the temperature data structure.

typedef struct HASH_MAP_BUCKET hash_map_bucket;
struct HASH_MAP_BUCKET {
    string key;
    u32 hash;
    
    u64 count;
    f64 average;
    f64 min;
    f64 max;
};
The hash map bucket data structure.

The hash map itself is just a pointer to the first bucket, and an integer containing the number of buckets. The map is allocated with the help of a memory arena.

hash_map HashMap(memory_arena *arena, u32 capacity) {
    hash_map result = {0};
    result.capacity = capacity;
    result.base = GetMemoryArenaArray(arena, hash_map_bucket, result.capacity);
    memset(result.base, 0, sizeof(*result.base) * result.capacity);
    return result;
}
Creating the hash map with the help of an arena.

I went with a default bucket count of 512 because I have something over 400 unique weather station names and I like rounding up to powers of two. I have not looked at how many collisions are actually happening in this attempt. That is something I will potentially be looking at in a later attempt.

I am using DJB2 from Dan Bernstein to calculate the hashes from a given string value.

u32 ComputeHash(u8 *name, u32 length) {
    u8 *end = name + length;
    u32 hash = 5381;

    while (name != end) {
        hash = ((hash << 5) + hash) + *(name++);
    }

    return hash;
}
The DJB2 hashing function.

Inserting a value works by calculating the hash from the string key, and retrieving a slot index which determines the bucket into which it should be stored.

u32 hash = ComputeHash(key.base, key.length);
u32 slot = hash % map->capacity;
Computing the bucket slot to use based on the hash.

If the bucket already contains a string key then I am advancing to the next slot (slot + 1) and check if it is empty. I repeat this step until I found and empty slot or until I reached the end of the bucket array.

for (u32 i = from; i < to; i++) {
    hash_map_bucket *bucket = &map->base[i];

    if (bucket->key.length == 0) {
        // Slot is empty so insert...
    } else if (bucket->hash == hash && IsStringEqual(key, bucket->key)) {
        // Update the item...
    }

    // Not empty or already occupied by something else... continue search
}
Searching for an empty bucket slot.

If I reached the end of the buckets array, and I still have not found an unoccupied slot, I have to wrap around and search from the first bucket down to the initial calculated slot. You can check out the source code to see the full implementation on how this is done.

I have no code to handle the scenario of no more room in the hash map, because this simply can not happen in this little test. The amount of unique values is known and fixed, and I have more buckets available than there are unique values. However, handling this scenario is not particularly difficult because all that needs to be done is creating a new hash map with more buckets, and then rehashing all values over to the new hash map. But I won't bother with that for this test.


Parsing floats


The next component that was needed was parsing a string representation of a floating point value to an actual float. Since I am not using the C standard library I had to write my own little function to do just that. Since the format of the floats is fairly straight forward it wasn't that difficult. Each temperature value can have an optional negative sign, and has only one digit after the decimal point.

f64 StringToFloat64(string value) {
    b32 is_negative = value.base[0] == '-';

    f64 result = 0.0;
    u32 digits = 0;
    for (u32 i = is_negative; i < value.length; i++) {
        u8 potential_digit = (value.base[i] - '0');
        b32 is_digit = potential_digit < 10;
        if (is_digit) {
            digits = (digits * 10) + potential_digit;
        } else if (value.base[i] == '.') {
            i += 1; // We know we have just 1 digit after the decimal point so read just that
            u8 fraction = (value.base[i] - '0');
            result = ((f64)digits) + (0.1 * fraction);
            break; // Break out of the loop
        }
    }

    return (result);
}
Converting a string to a 64-bit floating point value.


The parsing loop


With all the sub components done the parsing loop is rather straight forward. First, check if the current byte is a semicolon that separates the weather station name from the temperature value.

if (file.buffer.base[file.buffer.position] == ';') {
    // ...
}
Checking for a separator character.

Once the separator is found I can create a string that contains the weather station name.

string name = String(&file.buffer.base[start], (u32)(file.buffer.position - start));
Creating the weather station string.

Then I am trying to find the line ending in a sub-loop.

file.buffer.position += 1;
start = file.buffer.position;
for (; file.buffer.position < file.buffer.capacity; file.buffer.position++) {
    if (file.buffer.base[file.buffer.position] == '\n') {
        // ...
    }
}
Finding the line-ending.

Once the line ending is found I can create a string of the temperature and convert it to a 64-bit floating-point value.

string temp_str = String(&file.buffer.base[start], (u32)(file.buffer.position - start));
f64 temperature = StringToFloat64(temp_str);
Parsing the temperature.

And all that remains is adding the temperature value to the hash map.

HashMapInsertValue(&map, name, temperature);
Inserting the temperature to the hash map.

Then I repeat the same process for the remaining lines in the file. That is it! I have omitted a few minor details here and there, and I will not go into depth on how I print out the values to the terminal. That is trivial and you can check out the full source code in the zip file at the bottom of this page.


Conclusion


All that remains now is to run the benchmark and check out how long it takes to parse 1 billion rows of temperature data, in a single thread, on my machine. I am running this on a AMD Ryzen 7 7700 with a base clock of 3.8 GHz on Windows 11 Home 64-bit - 23H2 - 22631.3880 with about 32.0 GB RAM.

On average this attempt takes about 27.440652 seconds to parse 1 billion rows. This now serves as my baseline and now I can start the fun part - trying to speed things up!


Download