Algorithmic Fun with Ruby Hashes

All programmers use them. You might call them Maps, Dictionaries, Hash Maps, or Associative Arrays. In the Ruby world, we call them Hashes and they’re pretty awesome. A Hash is, simply put, a collection of key-value pairs. It is similar to an Array, except that it is indexed via arbitrary keys of any object type instead of an incremental integer. If you take a look at the Hash API, you’ll see that it offers an impressive range of functionality. In this article, we won’t be looking at the Hash API as such, but rather at using Hashes to model, abstract, and solve common logic problems.

Truth Tables and Capturing the Algorithm

Many people use Hashes purely as simple data lookup tables or as named parameters to methods. This is all fine, but we can do much more by leveraging Hash’s inherent flexibility and powerful features to implement logic and complex algorithms. Let’s look at an interviewer’s favorite, the FizzBuzz test. It goes like this:

“Write a program that prints the numbers from 1 to 100. For multiples of 3, print “Fizz” instead of the number and for multiples of 5, print “Buzz”. For numbers which are multiples of both 3 and 5, print “FizzBuzz”.”

We have two conditions here which affect the outcome: (n % 3 == 0) and (n % 5 == 0). As soon as we grok that, solving the problem is straightforward:

def conditional(n)
  if (n % 3 == 0) && (n % 5 != 0)
    'Fizz'
  elsif (n % 3 != 0) && (n % 5 == 0)
    'Buzz'
  elsif (n % 3 == 0) && (n % 5 == 0)
    'FizzBuzz'
  else n
  end
end

To run FizzBuzz for a sequence of 1 to 100 we do:

(1..100).each {|n| puts conditional(n)}

That was a piece of cake! But where do Hashes come into it? Well, we can capture this conditional logic as a Hash. To get a clue on how to do that, write down the truth table for FizzBuzz:

(n % 3 == 0)	(n % 5 == 0)	outcome
true	false	‘Fizz’
false	true	‘Buzz’
true	true	‘FizzBuzz’
false	false	number

Observe that the outcome column consists of discrete, non-nil values for any given input. That would make a good key for our Hash. Let’s rewrite our truth table to suit a Hash:

Key	Value
‘Fizz’	(n % 3 == 0) && (n % 5 != 0)
‘Buzz’	(n % 3 != 0) && (n % 5 == 0)
‘FizzBuzz’	(n % 3 == 0) && (n % 5 == 0)
number	(n % 3 != 0) && (n % 5 != 0)

Look Ma, No If Statements!

Now, it’s easy to model our Hash on the truth table:

def declarative(n)
  h = {
    'Fizz' => (n % 3 == 0) && (n % 5 != 0),
    'Buzz' => (n % 3 != 0) && (n % 5 == 0),
    'FizzBuzz' => (n % 3 == 0) && (n % 5 == 0),
    n => (n % 3 != 0) && (n % 5 != 0)
  }
  h.key(true)
end

Here, we’re using a Hash to implement some declarative-style programming. We’re not telling our app how to calculate the outcome for each number, we just declare what conditions determine the keys and let the data structure do the lifting. Here’s the sweet thing: as the two conditions used to determine the keys are mutually exclusive, there will only be one true value in the Hash for any number input. This means that the correct key for the input will have the value true and all others will be false, so the method returns the only key with a value of true for the given input.

puts declarative(2) #=> 2
puts declarative(3) #=> Fizz
puts declarative(5) #=> Buzz
puts declarative(7) #=> 7
puts declarative(15)#=> FizzBuzz

To run FizzBuzz declaratively for a sequence of 1 to 100:

(1..100).each {|n| puts declarative(n)}

I don’t know about you, but this way appeals to me much more than the multi-conditional approach. I find it more elegant and tidy than the if..elsif way. Unfortunately, elegance comes at a price. As you can see from the benchmarks, the declarative hash approach is much, much slower than the imperative conditional statement.

Can we reach a happy compromise between elegance and performance? Why, yes we can!

Memoization

Memoization is simply the technique of storing computed or retrieved data for future use. Also known as caching to you and I. Hashes are pretty good for storing things. Let’s have another go at FizzBuzz, memoization-style:

memoization ||= Hash.new do |hash,key|
  hash[key] = case
  when (key % 3 == 0) && (key % 5 != 0)
    'Fizz'
  when (key % 3 != 0) && (key % 5 == 0)
    'Buzz'
  when (key % 3 == 0) && (key % 5 == 0)
    'FizzBuzz'
  else key
  end
end

We are creating a Hash that is initialized to the right FizzBuzz value for each numerical key passed to it. Run it for a range up to 100:

(1..100).each  { |n| memoization[n] }

The memoization hash has been created and has mapped each key from 1 to 100 to its FizzBuzz value. Next time we want the FizzBuzz value of a number in this range, a simple Hash#fetch will provide it.

Memoization has some very practical applications, other than solving FizzBuzz-style problems. We all use the web, so here’s how we could implement some raw and rugged response caching using a Hash:

require 'net/http'
http = Hash.new{|hash,key| hash[key] = Net::HTTP.get_response(URI(key)).body }
puts http['http://www.google.com'] # make actual request
puts http['http://www.google.com'] # get cached value

Now, let’s take a look at performance. Here are the benchmarks (on my machine – yours will obviously differ) when running each of our FizzBuzz solutions twice for a range of 1..1000000.

user	system	total	real
conditional	0.230000	0.000000	0.230000
conditional	0.240000	0.000000	0.240000
declarative	0.990000	0.000000	0.990000
declarative	0.980000	0.000000	0.980000
memoization	1.040000	0.020000	1.060000
memoization	0.370000	0.000000	0.370000

The first time the memoization code is run, it’s very slow. Ruby has to build up the Hash, as well as execute the conditional logic. But the 2nd time, and each time thereafter, its execution is dramatically quicker. That’s because the Hash already contains all values for that range, so it just has to look them up whenever we ask it instead of computing them all over again.

You may notice that, although quick for subsequent runs, the memoization technique is still not quite as fast as the if..elsif approach. This is because we don’t get the full benefit of caching by applying it to simple, linear algorithms like FizzBuzz. To make the most of memoization, we need to use it on more complex, polynomial or exponential-order algorithmic logic.

Recursive Hashes

Recursion is a typical domain for fast growth algorithms. Let’s look at a classic example of recursion: the Factorial problem. The usual recursive implementation goes like this:

def factorial(x)
  x == 1 ? 1 : x * factorial(x-1)
end

We can memoize it with a hash:

factorial_hash ||= Hash.new do |hash,key|
  hash[key] =  key * hash[key-1]
end
# initialize special cases
fib_hash[1] = 1; fib_hash[2] = 2

Here, we are initializing a Hash in such a way as to force the initialization code to be executed n times for an input of n. The first time we run, say, factorial_hash[5] the code will run 5 times, creating a Hash for the keys 2-5. Then, each time we need to get the factorial value of a number up to 5, the Hash will just do a simple lookup and fetch the right value.

Run each technique twice to get the factorial of 3000:

factorial(3000) 
factorial(3000) 
factorial_hash[3000]  
factorial_hash[3000]

Here are the timings:

user	real
factorial	0.005829
factorial	0.006082
factorial_hash	0.011202
factorial_hash	0.000002

As before, the first time the hash runs takes longer than the non-hashed equivalent, but subsequent runs are now blazingly fast.

Let’s try something even more complex: the Fibonacci sequence. Here’s the standard approach:

def fibonacci(n)
  n <= 2 ? 1 : fibonacci(n - 1) + fibonacci(n - 2)
end

and here’s our caching Hash approach:

fib_hash ||= Hash.new do |hash,key|
  hash[key] = hash[key-1] + hash[key-2]
end
# initialize special cases
fib_hash[1] = 1; fib_hash[2] = 1

Running each approach twice to get the Fibonacci value at 35 gives the following timings:

fibonacci(35) 
fibonacci(35) 
fib_hash[35]  
fib_hash[35]

user	real
fibonacci	0.987518
fibonacci	0.981804
fibonacci_hash	0.000041
fibonacci_hash	0.000002

Now even the first run of the Hash approach is significantly faster than the non-cached recursive method. Subsequent runs put the non-cached method to shame! The more complex the problem, the greater the benefits obtained with memoization.

But every silver lining has a cloud. Using Hashes recursively adds more data onto our app’s Stack Frame, which means that recursive Hash approach has much lower threshold for Stack Overflow type errors (that’s the error, not the website). So, for algorithms with a high range of input values, recursive Hashes are probably not the best solution.

Conclusion

Hashes are more than just a way to pass options to methods. Used properly, they can help make our code more elegant, faster or sometimes both at the same time. However, we need to be aware of their shortcomings and know how and when to use them effectively.

Code presented in this article can be found in the following gists:

Frequently Asked Questions about Ruby Hashes

What is a Ruby Hash and how does it work?

A Ruby Hash is a collection of key-value pairs, similar to an array, but indexing is done via arbitrary keys of any object type, not an integer index. The order in which you traverse a hash by either key or value may seem arbitrary and will generally not be in the insertion order. Hashes have many uses, such as to store data that can be quickly looked up by key.

How do I create a new Ruby Hash?

Creating a new Ruby Hash is straightforward. You can create an empty hash by calling Hash.new or by using the literal constructor {}. To create a hash with initial values, you can send a list of key-value pairs enclosed in braces, with the key and value separated by =>.

How can I access the values in a Ruby Hash?

You can access the values in a Ruby Hash by referring to its key. For example, if you have a hash named ‘h’ with a key ‘k’, you can get its value by calling h[k]. If the specified key is not found, it returns nil.

How can I add elements to a Ruby Hash?

You can add elements to a Ruby Hash by using the square bracket syntax. For example, if you have a hash named ‘h’, you can add a key-value pair by calling h[key] = value.

How can I remove elements from a Ruby Hash?

You can remove elements from a Ruby Hash by using the delete method. For example, if you have a hash named ‘h’ with a key ‘k’, you can remove the key-value pair by calling h.delete(k).

How can I iterate over a Ruby Hash?

You can iterate over a Ruby Hash by using the each method, which passes each key-value pair to a block of code. For example, if you have a hash named ‘h’, you can iterate over it with h.each { |key, value| … }.

How can I check if a Ruby Hash contains a certain key or value?

You can check if a Ruby Hash contains a certain key by using the has_key? method, and you can check if it contains a certain value by using the has_value? method.

How can I merge two Ruby Hashes?

You can merge two Ruby Hashes by using the merge method. This method returns a new hash that combines the contents of the two hashes. If there is a conflict, it uses the value from the hash that was passed as an argument.

How can I convert a Ruby Hash to an Array and vice versa?

You can convert a Ruby Hash to an Array by using the to_a method, which returns an array of arrays, each containing a key-value pair. You can convert an Array to a Hash by using the to_h method, which expects an array of arrays, each containing a key-value pair.

How can I sort a Ruby Hash?

You can sort a Ruby Hash by using the sort method, which returns an array of arrays, each containing a key-value pair. The array is sorted by key, but you can pass a block of code to sort by value or by any other criteria.