samedi 23 février 2013

Random thoughts - Space vs. Time

Last week, Olivier Croisier resurrected an old coding challenge in his blog, the Coder's Breakfast. As geeky as every challenge can be, I immediately started to think about potential solutions. This led me to an amusing discussion with a Scala guy. This post in a follow-up of that discussion.

Problem

The discussion was like :
Him : "Pff, it's easy, do a Stream"
Me : "Err, yeah but you would have to generate all numbers from 0 to max"
Him : "Of course not ! You filter ! I can do it in one line right now"

I was quite surprised by his arguments, and wanted to explain that I was not looking for brievty but for optimization. So I told him that if our discussion were for Java, it would have been like :
"Pff, easy, do an ArrayList
- [same answer]
- Not all, just use Guava !"

My point was that a data structure cannot be an answer to a complex problem, we have to visualize the whole problem, not just by the first implementation that comes to our minds.

This is commonly known as a Space/Time tradeoff (or a CPU/Memory tradeoff) : if you write a very small program, such as a one-llner, then chances are that this space gaiin will have a time cost (wasted CPU time). On the other hand, if you craft a program that do not waste any CPU cycle, then chances are that it will take more space (either in memory or by the number of LoC).

Note that I perfectly know that in Scala and Clojure, Streams and seqs are lazy, where ArrayList is not. It was just a troll.


Some algorithm

For me, this challenge is about "What you are going to do" instead of "How you are going to do it". Brute-force solutions are different in their implementation details, but follow the same algorithm :
- For each number n taken between [0; max]
--- If n fulfills a test, keep it

In those one-liners wars, we do not think about algorithm, and we should feel bad about it, especially in a coding challenge. Such challenges are the only place where we can seek premature optimization, for the beauty of the exercise.

So let's go back to the roots of programming : algorithms.
The code we have to create can be splitted in the following components :
Generator -> Filter -> Aggregator
We know that the upper bound will be a power of 10, and won't be above 10^10.


First implementation

The most basic implementations does this :
- Generate all numbers in [1; max] that fulfill a given test
- Iterate over every element starting at index 1 an compute the difference between the element and the previous one
- Keep the highest difference
- Count the number of results

Considering we have numbers and solutions, our algorithm would generate numbers in O(max).


Second implementation

But we can achieve a better solution : let's generate every number by picking digits in a list of "allowed digits".

To calculate every non-repeating-digits number of length x,
- For each digit d taken in [1;9]
--- Find all numbers of length x that starts with d and that are composed by every digits from [0; d[ and ]d; 9]
To find all numbers of length x that starts with d and that are composed by someDigits
- For each digit i taken in someDigits
--- Add d at the position x of all numbers of length x-1 that starts with i and that are composed by someDigits without i

The main goal of this algorithm is to have no "Filter" component. We perfectly know the range of acceptable digits, so we can just generate every combination by restricting this range to a smaller subset. By using recursion, the program may be harder to reason about but also simpler to test and faster to run.

A Java implementation of this algorithm is available here : https://github.com/pingtimeout/challenges/tree/master/non-repeating-numbers-java

A Clojure one should follow in a few days.


Wrap-Up

My Java implementation is way longer than a Scala one-liner, however it is also faster from an algorithmic standpoint.

However I do not mean that it is is better implementation.

In a real project, I would probably prefer writing a simple (ineffective) solution in 10 minutes than a very optimised one that would take hours or even days. But keep in mind this Space vs. Time tradeoff. There is no such thing as a perfect solution, there are only tradeoffs and we must know them to make better decisions.

Note that I won't write here about the wall-clock standpoint of those solutions since it would make me create microbenchmarks, which is something one should not do.

Final thoughts : Thanks Olivier for this challenge ! It was a fun exercise =]