Functional Programming: Pure Functions

This is the second part of a two part series on functional programming in Ruby. Before we explored immutable values, now we’ll look at the other side of functional programming: pure and composable functions.

Pure Functions

A pure function is a function where the return value is only determined by its input values, without observable side effects. This is how functions in math work: Math.cos(x) will, for the same value of x, always return the same result. Computing it does not change x. It does not write to log files, do network requests, ask for user input, or change program state. It’s a coffee grinder: beans go in, powder comes out, end of story.

When a function performs any other “action”, apart from calculating its return value, the function is impure. It follows that a function which calls an impure function is impure as well. Impurity is contagious.

A given invocation of a pure function can always be replaced by its result. There’s no difference between Math.cos(Math::PI) and -1; we can always replace the first with the second. This property is called referential transparency.

Keep State Local

A pure function can only access what you pass it, so it’s easy to see its dependencies. We don’t always write functions like this. When a function accesses some other program state, such as an instance or global variable, it is no longer pure.

Take global variables as an example. These are typically considered a bad idea, and for good reason. When parts of a program start interacting through globals, it makes their communication invisible. There are dependencies that, on the surface, are hard to spot. They cause maintenance nightmares. The programmer needs to mentally keep track of how things are related and orchestrate everything just right. Small changes in one place can cause seemingly unrelated code to fail.

Pure Methods

In Ruby we don’t usually talk about functions. Instead, we have objects with methods, but the difference is small. When you call a method on an object, it’s as if the object is passed to the function as the argument self. It’s another value the function can rely on to compute its result.

Take the upcase method of String

str = "ukulele"
str.upcase # => "UKULELE"
str        # => "ukulele"

It turns a string into uppercase, but the original string remains untouched. upcase didn’t do anything else, such as write to a log file or read mouse input. upcase is pure. The same can’t be said of upcase!

str = "ukulele"
str.upcase! # => "UKULELE"
str         # => "UKULELE"

Ruby adds the bang to signal that this function is destructive. After you call it, the original string is gone, replaced by the new version. upcase! is not pure.

Benefits

Pure functions go hand in hand with immutable values (see the previous article). Together they lead to declarative programs, describing how inputs relate to outputs, without spelling out the steps to get from A to B. This can simplify systems and, in the face of concurrency, referential transparency is a godsend.

Reproducible Results

When functions are pure and values are easy to inspect and create, then every function call can be reproduced in isolation. The impact this has on testing and debugging is hard to overstate.

To write a test, you declare the values that will act as arguments, pass them to the function, and verify the output. There is no context to set up, no current account, request, or user. There are no side effects to mock or stub. Instantiate a representative set of inputs, and validate the outputs. Testing doesn’t get more straightforward than this.

Parallelization

Pure functions can always be parallelized. Distribute the input values over a number of threads, and collect the results. Here’s a naive version of a parallel map method:

module Enumerable
  def pmap(cores = 4, &block)
    [].tap do |result|
      each_slice((count.to_f/cores).ceil).map do |slice|
        Thread.new(result) do |result|
          slice.each do |item|
            result << block.call(item)
          end
        end
      end.map(&:join)
    end
  end
end

Now let’s simulate some expensive computation:

def report_time
  t = Time.now
  yield
  puts Time.now-t
end

report_time {
  100.times.map {|x| sleep(0.1); x*x }
}
# 10.014289725

report_time {
  100.times.pmap {|x| sleep(0.1); x*x }
}
# 2.504685127

The version with #map took 10 seconds to complete, the parallel version only took 2.5 seconds. But we can only swap out #map for #pmap if we know the function called is pure.

Memoization

Because pure functions are referentially transparent, we only need to compute their output once for given inputs. Caching and reusing the result of a computation is called memoization, and can only be done safely with pure functions.

Laziness

A variation on the same theme. We only ever need to compute the result of a pure function once, but what if we can avoid the computation entirely? Invoking a pure function means you specify a dependency: this output value depends on these input values. But what if you never use the output value? Because the function can not cause side effects, it does not matter if it is called or not. Hence a smart system can be lazy and optimize the call away.

Some languages, like Haskell, are completely built on lazy evaluation. Only values that are needed to achieve side effects are computed, the rest is ignored. Ruby’s evaluation strategy is called strict evaluation, each expression is completely evaluated before its result can be used in another expression. This is unfortunate, but with some imagination we can build our own opt-in laziness.

class Lazy < BasicObject
  def initialize(&blk)
    @blk = blk
  end

  def method_missing(name, *args, &blk)
    _resolve.send(name, *args, &blk)
  end

  def respond_to?(name)
    _resolve.respond_to?(name)
  end

  def _resolve
    @resolved ||= @blk.call
  end
end

def lazy(&blk)
  Lazy.new(&blk)
end

Now we can wrap potentially costly computations in lazy {},

def mul(a, b, c)
  a * b
end

a = lazy { sleep(0.5) ; 5 }
b = lazy { sleep(0.5) ; 7 }
c = lazy { sleep(3)   ; 9 }

mul(a, b, c)
# => 35

The calls to sleep simulate some CPU-intensive task. The final result pops up after about a second. Even though it would take 3 seconds to compute c, because the value is never used we don’t have to incur that cost.

Refactoring to Functional

There is a catch though. Much of what our programs do (interacting with databases, serving network requests, writing to log files) is inherently side-effectful. Our programs are processes that deal with inputs and generate outputs over time, they are not mathematical functions. There are ways to get the best of both worlds, however.

One fruitful approach is to separate the pure, functional, value based core of your application from an outer, imperative shell. Take a command line application that needs to parse command line arguments:

def parse_cli_options
  opts = OptionParser.new do |opts|
    opts.banner = 'cli_tool [options] infile outfile'
    opts.on('--version', 'Print version') do |name|
      $stderr.puts VERSION
      exit 0
    end.on('--help', 'Display help') do
      $stderr.puts opts
      exit 0
    end
  end

  opts.parse!(ARGV)
  if ARGV.length != 2
    $stderr.puts "Wrong number of arguments"
    $stderr.puts opts
    exit 1
  end

  opts
end

This is about as far away from a pure function as you can get. It does all of the following.

Write directly to $stderr
Call Kernel.exit
Rely on the global ARGV
Alter ARGV

How would you go about writing tests for such a monstrosity? It’s close it impossible. To make it a pure function, we need to ask ourselves what needs to go in and what should come out. As input, this function clearly needs access to the command line arguments. As output, it needs to tell us:

Was the parsing successful?
If not, what’s the error message?

What exit code should the process use?

def parse_cli_options(argv)
  opts = OptionParser.new do |opts|
    opts.banner = 'cli_tool [options] infile outfile'
    opts.on('--version', 'Print version') do |name|
      return { message: VERSION }
    end.on('--help', 'Display help') do
      return { message: opts.to_s }
    end
  end


  filenames = opts.parse(argv)
  if filename.length != 2
    return {
      message: ["Wrong number of arguments!", opts].join("\n"),
      exit_code: 1
    }
  end
  { filename: filenames }
end

Now we have a pure function that’s very easy to test, and we can wrap it an “imperative shell”.

def run
  result = parse_cli_options(ARGV)
  perform(*result[:filenames])  if result.key?(:filenames)
  $stderr.puts result[:message] if result.key?(:message)
  Kernel.exit(result.fetch(:exit_code, 0))
end

Keeping the core strictly functional is necessary, since a single impure function would contaminate any function that calls it. Notice how we turned some side effects, such as exiting the process, into an intermediate value representing that side effect. You can valuefy anything this way, even error conditions or database operations, reaping the benefits of functional programming.

Functional programming is a big subject, and one that not all Rubyists understand. After these two articles, you should have a good foundation for making your own code more functional. Try it out, and see where the journey leads you.

Frequently Asked Questions (FAQs) about Pure Functions in Functional Programming

What Makes a Function Pure in Functional Programming?

A function in functional programming is considered pure if it meets two main criteria. First, it should always produce the same output given the same input. This means that no matter how many times you call the function with the same arguments, the result will always be the same. Second, a pure function should not cause any side effects. Side effects refer to any changes in the state of the program or observable interaction with the outside world, such as modifying a global variable or performing I/O operations. Pure functions only depend on the input provided and do not alter any external state.

Why are Pure Functions Important in Functional Programming?

Pure functions are a fundamental concept in functional programming because they provide several benefits. They make the code easier to reason about since the output solely depends on the input, and there are no side effects to consider. This predictability makes the code more maintainable and easier to debug. Pure functions also enable powerful programming techniques such as memoization, where previous results are cached and reused, leading to performance improvements. Moreover, they are highly testable and can be easily composed to create more complex functions.

How are Pure Functions Different from Impure Functions?

The main difference between pure and impure functions lies in their interaction with external state and their predictability. Pure functions do not interact with external state and always produce the same output for the same input. On the other hand, impure functions may depend on or modify external state, making their output unpredictable for the same input. This unpredictability can make impure functions harder to test and debug.

Can Pure Functions Have Local Side Effects?

No, pure functions cannot have local side effects. A pure function’s output is solely determined by its input values, and it does not modify any state or variables outside its scope. Any changes to variables within a pure function do not affect the external state or other functions, ensuring that the function remains pure.

How Do Pure Functions Benefit Testing and Debugging?

Pure functions make testing and debugging easier due to their predictability. Since a pure function’s output is solely determined by its input, you can easily predict what the output should be for a given input. This makes it easier to write test cases and verify the correctness of the function. Moreover, since pure functions do not have side effects, you don’t need to worry about the function altering the state of your program in unexpected ways, making debugging simpler.

Can Pure Functions Use Global Variables?

Pure functions should not use global variables. Using global variables would make the function dependent on external state, which goes against the principle of pure functions. A pure function’s output should only depend on its input, and it should not cause any side effects, including modifying global variables.

How Do Pure Functions Enable Function Composition?

Pure functions can be easily composed to create more complex functions. Since the output of a pure function is solely determined by its input, you can use the output of one pure function as the input to another. This allows you to build complex functionality by composing simple, pure functions, making your code more modular and easier to understand.

Are All Functions in Functional Programming Pure?

Not all functions in functional programming are pure. While pure functions are a fundamental concept in functional programming, it’s possible to have impure functions in a functional programming language. However, the use of pure functions is encouraged because of the benefits they provide, such as predictability, ease of testing, and the ability to compose functions.

How Do Pure Functions Improve Performance?

Pure functions can improve performance through a technique called memoization. Since a pure function always produces the same output for the same input, you can cache the result of the function call and reuse it when the function is called again with the same arguments. This can significantly improve performance, especially for computationally intensive functions.

Can Pure Functions Have Parameters?

Yes, pure functions can have parameters. In fact, a pure function’s output is determined solely by its input parameters. However, a pure function should not modify the values of its parameters, as this would be considered a side effect. Instead, it should return a new value based on the input parameters.