Daniel Drywa

Reverse Word Pairs

My take on a interview programming test

Sometime in 2014 when I was actively looking for a Job I had to do a programming challenge for a well known gaming company, which for obvious reason I can not name. My solution got me into the next interview stage which means it was good enough for them to go further with the interview. But as things go I ended up at Push Technology Ltd instead and I’ve been happily working there since. However the whole code for my solution has been on a private GitHub repository and today I’ve been wondering: Why not make it public so that others can learn from it? Or maybe even extend it with their own solution. So here we go! I present to you my solution to a reverse word pair finder.

Disclaimer

All measurements taken are not in any way scientifically. I have no fancy graphs to prove anything nor did I conduct extensive research to support my claims. This is simply a showcase of how I got to my solution for a programming challenge of a job interview. Nothing more, nothing less.

The Challenge

The challenge was seemingly simple. Find all reverse word pairs within a huge text file and put them grouped together in a separate result text file. Let’s say the input text file would look something like this:

pink abcd kli akka dcba odb ilk knipla bdo

The resulting text file after the algorithm has finished would have to look something like this:

abcd dcba
kli ilk
odb bdo

The solution had to be written in C++ and should be able to compile on Linux and Windows. It also had to be fast.

The Process

The first thing I had to figure out was how to compile the same code on different platforms. Luckily I worked on a multi platform project at that time so CMake was the obvious choice for me. Once the CMake project was set up I had to work on my solution. But I had to answer two obvious questions first:

  • How would I know if my solution is the fastest?
  • How would I compare it to other solutions?

The answer to both of these questions was some form of simple attempt manager that allowed me to run my attempts and measure their run time. After analysing the run time of every attempt I would then select the fastest and use it as my main solution to the challenge. The attempt manager would therefore have to be able to register new attempts and run them all at once.

class CAttemptManager final {
    // ...
    void AddAttempt( attempt_t &&attempt );
    void RunAttempts() const;
    // ...
};

A single attempt would need to be able to run a specific piece of code on a given file and report back the result. So a virtual class seemed the sensible choice. Every attempt I would create would then inherit IAttempt and report back the result as a struct of type sAttemptResult.

class IAttempt {
    // ...
    virtual void Run( const std::string &filename, sAttemptResult &result ) = 0;
    // ...
};

The result struct didn’t have to be complicated. All it needed was the ability to report back the time it took and the word pairs it found. It looked something like this:

typedef std::unordered_map< std::string, std::string > wordPairs_t;

struct sTimeDuration {
    std::chrono::nanoseconds    nanoseconds;
    std::chrono::microseconds   microseconds;
    std::chrono::milliseconds   milliseconds;
    std::chrono::seconds        seconds;
};

struct sAttemptResult {
    sTimeDuration   algorithmDuration;
    sTimeDuration   readingDuration;
    sTimeDuration   completeDuration;
    wordPairs_t     pairs;
};

The std::chrono namespace is a timing library in the C++11 standard. It contains all necessary types and methods to measure time. To make time measurement between attempts easier I created an attempt clock. This clock would be able to start and end time measurement and return the duration back to me.

class CAttemptClock {
    typedef std::chrono::steady_clock       internal_clock_t;
    typedef internal_clock_t::time_point    time_point_t;

    time_point_t    start       = {};
    time_point_t    end         = {};

    // ...
    void Start();
    void End();
    sTimeDuration GetDuration() const;
    // ...
};

My timer of choice was the std::chrono::steady_clock. This clock represents a monotonic clock that is not affected by any interval changes of the underlying implementation and therefore best suitable for measuring intervals. In retrospect I should have used std::chrono::high_resolution_clock instead to use the shortest ticks available on the current system.

This was my whole attempt framework. Now the only thing left to do was implementing the different attempts I had in mind, measure their time, and pick the one that seemed the fastest as my final solution.

The Attempts

I had 4 attempts in total which I wanted to test out:

  • Read a word from file, put it in the same group as other words with the same length an find it’s reverse pair in that group. Repeat with the next word in the file. All done with the help of the STL algorithms.
  • Group words with the same length first. Try to find pairs in these groups afterwards without the usage of the STL.
  • Same as the second attempt but with the help of the STL in the find process.
  • Same as the first attempt but with a different data structure. (std::map instead of std::unordered_map)

Here are the timing results for all the attempts:

  1. 290ms on average.
  2. 15s on average. My custom find algorithm was not very good.
  3. 1s on average.
  4. 350ms on average.

The key point to take away from this: It is much faster to search for pairs while still in the process of reading from the file instead of doing it in a post process. Duh!

One more thing

This is pretty much all there is to it. I had a full day to come up with a solution and I ended up sending them all my attempts. I hope this is useful for anyone that has to go through a programming test as well. At least it gives you a glimpse of what some of the Professional Gaming Companies out there expect from a Junior Developer.

You can find this whole project on GitHub. I would be happy for any pull requests that contain faster solutions or maybe even a better way of measuring the time. Have fun!


27 February 2016