Saturday, July 03, 2010

Quick Find Algorithm (from Sedgewick) in C# using TDD

In "Algorithms in Java", by Robert Sedgewick, available on Amazon.com here, chapter 1 explores a connectivity algorithm in increasingly complex iterations.

The goal of the connectivity process is as follows:
  1. be able to add pairs of connections as integers (2-3, 5-1, 8-7).
  2. The connections are transitive, so adding 2-3, 3-5, and 5-7 menas that 2-7 is also connected.
  3. Once a transitive connection exists, adding the direct connection (2-7) is redundant and therefore ignored.
MY goal in working through these algorithms is to use Test-Driven Development in C# to solve and evolve them.

Starting from the initial description of the connectivity process, I created a series of tests against a utility class that added connectivity pairs. Each time, before adding them, the utility class checks to see whether a transitive connection can be established across the already existing pairs in an in-memory array. This worked, and all tests were passing (for a very simple case.)

However, in reading further, I found that Sedgewick points out that while this is the typical first approach taken, it has significant problems:
"First, the number of pairs might be sufficiently large to preclude our saving them all in memory
in practical applications. Second, and more to the point, no simple method immediately suggests itself for determining whether two objects are connected from the set of all the connections, even if we could save them all!"
(The book is excellent and I recommend reading it directly to get the full account.)

From here, Sedgewick works through a series of iterative algorithms, each coming closer to the complete solutions.This approach dovetails nicely with the principle in Test-Driven Development of:
"Start by building the simplest thing that could possibly work."
For example, in the TDD Calculator kata, this might mean that you solve your first test by returning 0 (hard-coded) when an empty string is passed.

In the case of this connectivity algorithm, first pass, the Quick find approach is the simplest thing that could possibly work:
For each pair you add, (for exampe: 4-9) OVERWRITE the value on one side to be the same value as the other side.
For example, if you started with an array like this:

new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };

By the time you were done (all connections established), where the last input pair were, let's say, 7-1, the array would look like this:

new int[] { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };


Does this solve every connectivity issue? No. But neither does returning 0 in Calculator kata; it is simply the
first, simplest thing that works.

Having read Sedgewick's explanation for Quick Find, I decided to start by building the Quick Find algorithm as a series of unit tests. As you read through the unit tests in order (here on github), they document the progression through variations until all Quick Find connectivity add requirements are satisfied.

In the QuickFinder class itself, I would normally refactor without preserving previous iterations. However, for
documentation purposes, to trace the steps that got us here, I have kept (commented out) the previous iterations.
NOTE   To see how the tests failed/evolved across versions, try uncommenting an earlier iteration.
As I proceed further into Ch.1 of Sedgewick, additional iterations of the algorithms will be introduced. Time-permitting, my goal is to created each of these algorithms using TDD as well, and post them to github.

No comments: