Functional FizzBuzz

There was a discussion that was resurrected on programmers.stackexchange yesterday talking about FizzBuzz. First things first: the only thing I’d accept on FizzBuzz is minor syntax errors. It appears from the aforementioned discussion that I might be holding a minority opinion. Given that, I started to wonder what FizzBuzz would look like if written in an idiomatic functional style (while not necessarily purely functional) and if my functional FizzBuzz would convince someone else that I was a competent functional developer, even if I don’t consider myself one.

I’m coming off a vacation, and have decided to blog more in the coming year, so I decided to settle into 2014 and combine the two. True to FizzBuzz form I wrote a solution in F# without testing it. I did use an IDE, though. Either way, this is what I came up with:

let numbers =
    seq { 1 .. 100 }
    |> Seq.mapi (fun i n -> (i, n % 3 = 0, n % 5 = 0))
    |> Seq.map (fun t ->
    match t with
        | (_, true, true) ->"FizzBuzz"
        | (_, true, false) -> "Fizz"
        | (_, false, true) -> "Buzz"
        | (n, false, false) -> n.ToString()
        )

numbers
    |> Seq.map (fun s -> printfn "%A" s)
    |> ignore

Clearly, it’s wrong. There’s no output whatsoever. Things aren’t looking good for me.

I quickly realize that numbers is lazily evaluated, and when I Seq.map it to a function that writes to the console (ie. no return value), without actually accessing any value in the collection, the whole Seq.map is being optimized out. Ok use Seq.iter over numbers.

So I run it again and find that the output starts at zero and all of the Fizz/Buzz statements are off by 1, but otherwise appear correct. Seeing the output on screen, it appear obvious: I don’t need the index from Seq.mapi… I already have it from the sequence. Not only is Seq.mapi redundant here, it’s also zero based when my sequence starts at 1. A normal Seq.map is clearly better.

With that done, I ran it again and found that it works as desired. The result is thus:

let numbers =
    seq { 1 .. 100 }
    |> Seq.map (fun n -> (n, n % 3 = 0, n % 5 = 0))
    |> Seq.map (fun t ->
    match t with
        | (_, true, true) -> "FizzBuzz"
        | (_, true, false) -> "Fizz"
        | (_, false, true) -> "Buzz"
        | (i, _, _) -> i.ToString()
        )

numbers
    |> Seq.iter (fun s -> printfn "%A" s)
    |> ignore

Note: the first/redundant map was left for (my) clarity. It can be removed and replaced by changing the match to: match (n, n % 3 = 0, n % 5 = 0)

That's it; total time was 10 minutes. As I said, I clearly wouldn’t hire me as a developer in a functional language, but would someone else?

Hill Climbing Variations

Many algorithms have variations for a multitude of reasons and Hill Climbing is no different. Last time I presented the most basic hill climbing algorithm and implementation. There are some known flaws with that algorithm and some known improvements to it as well. Here are 3 of the most common or useful variations.

Steepest Ascent

If you recall, in the basic hill climbing algorithm, you look at the neighbors of a solution and choose the first one that improves on the current solution and climb to it. In Steepest Ascent, the idea is to choose the route that ascends the fastest similar to a best-first search approach. That is, every time you take a step, you take the largest step you can.

In a problem like we had last time, where we flipped a bit in our string, we’ll never have a situation where we would have variable magnitude ascents. To see why, start with the bit string 001. The possible neighbors of 001 are 000, 011 and 101. 000 is less fit, and the remainder are equally as fit. This can be extrapolated and shown to be true in the case of all candidate solutions.

Never-the-less, given our code, it can be modified to perform steepest ascent hill climbing. In order to do so, we need some code that will be able to find a neighbor solution that has the biggest increase in fitness score over the current fitness score. To do that, we’ll modify the Optimize function.

public async Task Optimize(TimeSpan timeout)
{
    DateTimeOffset start = DateTimeOffset.Now;
    T current = this.RandomSolution();
    C currentFitness = this.Fitness(current);

    return await Task.Factory.StartNew(() =>
    {
        do
        {
            T steepestAscentNeighbour = default(T);
            C steepestAscentFitness = default(C);

            foreach (var n in this.Neighbours(current))
            {
                steepestAscentFitness = this.Fitness(n);
                if (Comparer.Default.Compare(steepestAscentFitness, currentFitness) >= 0)
                {
                    current = n;
                    currentFitness = steepestAscentFitness;
                }
            }

            if (steepestAscentNeighbour != null && Comparer.Default.Compare(currentFitness, steepestAscentFitness) >= 0)
                current = steepestAscentNeighbour;

        } while (DateTimeOffset.Now - start < timeout);

        return current;
    });
}

Stochastic Hill Climbing

There are times where the set of neighbor solutions is too large, or for whatever reason it’s impractical to iterate through them all when evaluating neighbor solutions. The stochastic variation attempts to solve this problem, by randomly selecting neighbor solutions instead of iterating through all of them.

To use our example from last time, we need a different Neighbour method that will generate random neighbors.

public static IEnumerable RandomNeighbours(BitArray barray)
{
    do
    {
        var n = new BitArray(barray);
        var i = ThreadLocalRandom.Next(0, barray.Length);

        n[i] = !n[i];

        yield return n;
    }
    while (true);
}

… except that this is clearly an infinitely generating sequence that will also, likely, generate duplicates. I was tempted to write a small shuffler, but an entire blog series could be written about those (seriously, go check it out).

It's a matter of personal taste that I like generating infinite sequences. In this case, the caller decides to cap the number of neighbors generated to 10 when instantiating the hill climber.

var search = new HillClimber<BitArray, int>(
    () => RandomBitArray(10),
    (b) => RandomNeighbours(b).Take(10),
    Fitness);

Random Restart

If you recall the metaphor from last time, this is basically the scenario where you start climbing a hill, reach the top, but it’s not the tallest hill in the solution landscape. That is to say: normal hill climbing suffers from being only a local optimization algorithm. To solve this scenario, a surprisingly simple solution is found: start again somewhere else.

The idea is simple, each time you reach a peak, or wander on a plateau for a long time, then compare the fitness score at the current location with the current (global) best and store it if it’s better. Then restart from a new random location. After a predetermined number of iterations, or time, you've finished.

Our problem from last time doesn't have any local maxima (visually, it should look like something along the lines of a stepped cone), so Random Restart doesn't do us a whole lot of good. That’s the upside of us writing a generic algorithm, so that we can still implement the algorithm, even when we don’t necessarily have a problem to solve at the present time.

The first thing we need, is to figure out when we've reached a peak. That’s fairly simple, we’ll keep a count of how many times we didn't find a good neighbor. After that is a whole lot of book keeping. I'm not terribly happy with the following code, but it does do what we need it to do.

public async Task Optimize(TimeSpan timeout)
{
    DateTimeOffset start = DateTimeOffset.Now;

    T current = this.RandomSolution();
    C currentFitness = this.Fitness(current);
    T globalBest = current;
    C globalBestFitness = currentFitness;

    int iterationsAtLevel = 0;

    return await Task.Factory.StartNew(() =>
    {
        do
        {
            var firstGoodNeighbour =
                this.Neighbours(current)
                    .SkipWhile(n => Comparer.Default.Compare(this.Fitness(n), currentFitness) < 0)
                    .FirstOrDefault();

            if (firstGoodNeighbour != null)
            {
                if (Comparer.Default.Compare(currentFitness, this.Fitness(firstGoodNeighbour)) < 0)                 {                     iterationsAtLevel = 0;                 }                 else                 {                     iterationsAtLevel++;                 }                 current = firstGoodNeighbour;                 currentFitness = this.Fitness(firstGoodNeighbour);             }             else             {                 iterationsAtLevel++;             }             if (iterationsAtLevel > this.PeakThreshold)
            {
                if (Comparer.Default.Compare(currentFitness, globalBestFitness) < 0)
                {
                    globalBest = current;
                    globalBestFitness = currentFitness;
                }

                current = this.RandomSolution();
                currentFitness = this.Fitness(current);
                iterationsAtLevel = 0;
            }
        } while (DateTimeOffset.Now - start < timeout);

        return current;
    });
}

As you can see, that's a fair bit more verbose than before. I may have to revisit it in the future and clean it up a bit. What's being done here is that we need to store the best global solution (and its fitness, to avoid some excessive re-calculation) as well as the current solution. Then, when we finally find a suitable neighbor we have to check to see if it's better than our global best. If it is, we update the global best.

Side note: we discussed it last time, but it's important to discuss again;in order to be able to traverse plateaus in the solution landscape, we need to accept neighbor solutions that are as fit as the current solution. In Random Restart, though, being as fit isn't enough to reset the clock on how long we've been roaming on a peak.

No matter what, we have to check to see if we've been on a peak for too long. How long? We check the PeakThreshold.

We're obviously not done there. The Random Restart variation is a particularly invasive graft on the Hill Climbing algorithm. We need to update our constructor(s) to take a value for PeakThreshold.

public int PeakThreshold { get; set; }

public HillClimber(Func randomSolution, Func<T, IEnumerable> neighbours, Func<T, C> fitness)
    : this(randomSolution, neighbours, fitness, 5)
{
}

public HillClimber(Func randomSolution, Func<T, IEnumerable> neighbours, Func<T, C> fitness, int peakThreshold)
{
    this.RandomSolution = randomSolution;
    this.Neighbours = neighbours;
    this.Fitness = fitness;
    this.PeakThreshold = peakThreshold;
}

And we're basically done. Updating the call to create the HillClimber to include a value for the PeakThreshold (will vary greatly depending on the characteristics of the solution landscape) and we've finished.

Conclusion

Outside of the changes listed here, for each variation, the rest of the algorithm is the same. If there’s interest or anyone is having particular difficulty, I’ll create a github repository to provide the source or something along those lines (I’m using a private Team Foundation Service at the moment).

I had promised some more examples and the equivalent F# implementations. I'd still like to do that, because it's difficult to see the benefits at runtime using the original example provided, but it'll have to be in a future post. There's some other algorithms that I'm itching to touch on before that.

To wrap up, what have we looked at? Three variations to the original Hill Climbing algorithm.

  • Steepest Ascent: always take the biggest step up
  • Stochastic: Choose random neighbors, and pick the first good one
  • Random Restart: when you reach the top, start over somewhere else

Bonus points for the eager: Random Restart can be combined with both of the two variations…

Hill Climbing

When it comes to optimization, there’s a class of algorithms called Hill Climbing. Wikipedia can tell you much about the details, but I find that information is often lost in details, so I’m going to try to spell it out in more straightforward terms, with easy examples, a real example and with luck we’ll also come upon a generic solution that can be re-used afterward.

So you have a problem that requires you to optimize (or minimize) something. To apply Hill Climbing, you need to know what a solution looks like, how to generate one, how to figure out how good it is, and how to find its neighbors. That can be a pretty tall order. Using more concise terms, respectively, you need a solution encoding, a solution generator, a fitness function and a neighbor function.

So let’s get started

Before designing a solution to a problem, we need a problem. We’ll use the “ones problem”:

Given a string of 1s and 0s, maximize the number of ones in the string.

In order to solve this problem, we need to be able to get a seed value. It suits us to randomly generate a string of 1s and 0s.

public static BitArray RandomBitArray(int length)
{
    BitArray b = new BitArray(length, false);

    for (int i = 0; i < length; i++)
         b[i] = Convert.ToBoolean(ThreadLocalRandom.Next(0, 2));

    return b;
}

Note here the use of the ThreadLocalRandom, a handy class originally written by Jon Skeet.

After this, we need to be able to know what a solution’s neighbors are. This is not always easy because it’s not always obvious what constitutes a neighbor. In our problem, a neighbor is any string that differs by 1 bit. So, given a BitArray, all of its neighbors can be found by flipping each bit.

public static IEnumerable<BitArray> Neighbours(BitArray barray)
{
    for (int i = 0; i < barray.Length; i++)
    {
        var n = new BitArray(barray);
        n[i] = !n[i];

        yield return n;
    }
}

At this point, we only have 1 component left. We need a way to find out how good a solution is. That is, we need a way to score a solution. This is a pretty common task and some, or most, algorithms call this a fitness function. It’s helpful to generate some random solutions in your problem domain, and compare them yourself to figure out what a good scoring function would be. In our cause, we can just take a count of the 1s in our BitArray like so:

public static int Fitness(BitArray barray)
{
    return barray.ToEnumerable().Count(b => b);
}

public static IEnumerable<bool> ToEnumerable(this BitArray barray)
{
    for (int i = 0; i < barray.Length; i++)
        yield return barray[i];
}

That makes every piece of the puzzle, except the algorithm itself! I've described it already, so I present the algorithm below pulled together into a single object:

public class HillClimber<T, C>
{
    public Func<T> RandomSolution { get; set; }
    public Func<T, IEnumerable<T>> Neighbours { get; set; }
    public Func<T, C> Fitness { get; set; }

    public HillClimber(Func<T> randomSolution, Func<T, IEnumerable<T>> neighbours, Func<T, C> fitness)
    {
        this.RandomSolution = randomSolution;
        this.Neighbours = neighbours;
        this.Fitness = fitness;
    }

    public async Task<T> Optimize(TimeSpan timeout)
    {
        DateTimeOffset start = DateTimeOffset.Now;
        T current = this.RandomSolution();

        return await Task.Factory.StartNew(() =>
            {
                do
                {
                    var firstGoodNeighbour =
                        this.Neighbours(current)
                            .SkipWhile(n => Comparer<C>.Default.Compare(this.Fitness(n), this.Fitness(current)) < 0)
                            .FirstOrDefault();

                    if (firstGoodNeighbour != null)
                        current = firstGoodNeighbour;

                } while (DateTimeOffset.Now - start < timeout);

                return current;
         });
     }
}

The class has 2 generic types:

  • T, the encoding which represents a solution (BitArray, in this case)
  • C, the type that represents a solution's fitness score (int, in this case)

The constructor takes our 3 functions:

  • A random solution generator
  • A neighbor function
  • A fitness function

The Optimize function uses RandomSolution to generate a random seed, it then goes through the current solution’s neighbors and uses the fitness function to find the first neighbor that has the same or better fitness.1 If such a solution is found, the current solution is updated to be the neighbor, and we repeat until a timeout has been reached.

For kicks, I’ve also made it asynchronous. Finding the solution to our original problem is fairly straightforward:

var search = new HillClimber<BitArray, int>(() => RandomBitArray(10), Neighbours, Fitness);
var optimized = await search.Optimize(TimeSpan.FromSeconds(1));

The optimized value return is 1111111111, as is expected.

That's all for today; coming soon I'll go over...

  • F# implementation of hill climbing
  • Variations on the basic hill climbing algorithm
  • Real world example(s)

1 The reason for accepting neighbours that are just as fit is so that the hill climbing algorithm can traverse plateaus.