Golden Master Testing: More Refactoring, More Understanding

Share this article

Robot of the metal parts on a dark grungy background

After a tedious slog through views and controllers, we finally reached the part of refactoring that becomes an adventure: Grasping at straws. Second guessing everything. Obsessively running the tests. Nurturing the delicate hope that our new-found understanding is, indeed, insight and not just another failed assumption.

In the previous article, we dissected the first line of the compute method, tracing that line back carefully and meticulously. We discovered that the charts are concerned with dates, most likely on some sort of weekly basis, for the past year.

Next we will untangle the code that processes the raw data into the correct format for the chart to consume.

Once Upon a Time

module Stats
  class RunningTimeData
    include Rails.application.routes.url_helpers

    attr_reader :title, :dates, :key, :today,
                :y_legend, :y2_legend, :x_legend, :values,
                :links, :values_2, :x_labels, :y_max

    def initialize(title, timestamps, key, now)
      @title = title
      @dates = timestamps.map(&:to_date)
      @key = key
      @today = now.to_date
    end

    def compute
      @actions_running_per_week_array = convert_to_weeks_from_today_array

      # cut off chart at 52 weeks = one year
      @count = [52, total_weeks_of_data].min

      # convert to new array to hold max @cut_off elems + 1 for sum of actions after @cut_off
      @actions_running_time_array = cut_off_array_with_sum(@actions_running_per_week_array, @count)
      @max_actions = @actions_running_time_array.max

      # get percentage done cumulative
      @cumulative_percent_done = convert_to_cumulative_array(@actions_running_time_array, dates.count )

      @url_labels = Array.new(@count){ |i| url_for(:controller => 'stats', :action => 'show_selected_actions_from_chart', :index => i, :id=> @key, :only_path => true) }
      @url_labels[@count]=url_for(:controller => 'stats', :action => 'show_selected_actions_from_chart', :index => @count, :id=> "#{@key}_end", :only_path => true)

      @time_labels         = Array.new(@count){ |i| "#{i}-#{i+1}" }
      @time_labels[0] = "< 1"
      @time_labels[@count] = "> #{@count}"

      # Normalize all the variable names
      @y_legend = I18n.t('stats.running_time_legend.actions')
      @y2_legend = I18n.t('stats.running_time_legend.percentage')
      @x_legend = I18n.t('stats.running_time_legend.weeks')
      @values = @actions_running_time_array.join(",")
      @links = @url_labels.join(",")
      @values_2 = @cumulative_percent_done.join(",")
      @x_labels = @time_labels.join(",")
      # add one to @max for people who have no actions completed yet.
      # OpenFlashChart cannot handle y_max=0
      @y_max = 1+@max_actions+@max_actions/10
    end

    def total_weeks_of_data
      weeks_since(dates.last)
    end

    def weeks_since(date)
      days_since(date) / 7
    end

    def days_since(date)
      (today - date).to_i
    end

    def convert_to_weeks_from_today_array
      convert_to_array(dates, total_weeks_of_data+1) {|date|
        [weeks_since(date)]
      }
    end

    # uses the supplied block to determine array of indexes in hash
    # the block should return an array of indexes each is added to the hash and summed
    def convert_to_array(records, upper_bound)
      a = Array.new(upper_bound, 0)
      records.each { |r| (yield r).each { |i| a[i] += 1 if a[i] } }
      a
    end

    # returns a new array containing all elems of array up to cut_off and
    # adds the sum of the rest of array to the last elem
    def cut_off_array_with_sum(array, cut_off)
      # +1 to hold sum of rest
      a = Array.new(cut_off+1){|i| array[i]||0}
      # add rest of array to last elem
      a[cut_off] += array.inject(:+) - a.inject(:+)
      return a
    end

    def convert_to_cumulative_array(array, max)
      # calculate fractions
      a = Array.new(array.size){|i| array[i]*100.0/max}
      # make cumulative
      1.upto(array.size-1){ |i| a[i] += a[i-1] }
      return a
    end
  end
end

And so the story begins, with a variable named @actions_running_per_week_array.

Actions running is something that we’ve seen before, and it can be translated into unfinished TODOs.

The thing is, we don’t care that they’re TODOs. We extracted the created_at timestamps from the TODOs, coerced them into dates, and that’s all the chart is interested in. Simple datapoints.

It also seems unnecessary to encode the type name into the name.

def datapoints_per_week
  convert_to_weeks_from_today_array(dates, total_weeks_of_data+1)
end

convert_to_weeks_from_today_array takes two arguments, both of which are available in the entire class. There is no need to pass either of them. Taking them off leaves us with a spurious abstraction:

def datapoints_per_week
  convert_to_weeks_from_today_array
end

Inlining the method results in a vague sense of déjà vu.

def datapoints_per_week
  convert_to_array(dates, total_weeks_of_data + 1) {|date|
    [weeks_since(date)]
  }
end

The entire method body is a call to another method, which takes arguments that are available to the whole class.

Inline it!

Step back for a moment and think about the methods that we just inlined. It started out like this:

def datapoints_per_week
  convert_to_weeks_from_today_array
end

def convert_to_weeks_from_today_array
  convert_to_array
end

def convert_to_array
  # several lines of complicated logic
end

If a method definition contains a single line, and that line is just another method call, then the two method names better be worth it.

Contrast the above with the following method, which we defined in the course of the previous refactoring.

def total_weeks_of_data
  weeks_since(dates.last)
end

In the latter example both names are meaningful. They’re at different levels of abstraction, and together they tell a good story.

Discovering Datapoints

Let’s poke at datapoints_per_week with our test_stuff test, to figure out how the data is structured.

def test_stuff
  now = Time.utc(2014, 1, 2, 3, 4, 5)
  timestamps = [
    now - 5.minutes,
    now - 1.day,
    now - 7.days,
    now - 8.days,
    now - 8.days,
    now - 30.days,
  ]
  stats = Stats::RunningTimeData.new("title", timestamps, "key", now)
  assert_equal "something", stats.datapoints_per_week
end

The spread of data here goes from just a few minutes ago, to a month ago. I’m particularly interested in figuring out what happens in a week without any data.

The failure looks like this:

Expected: "something"
  Actual: [2, 3, 0, 1]

The index of the array represents the number of weeks ago, and the value is the number of datapoints that occurred that week.

Here’s how the array of datapoints is produced:

def datapoints_per_week
  a = Array.new(total_weeks_of_data+1, 0)
  dates.each {|date|
    [weeks_since(date)].each {|i|
      a[i] += 1 if a[i]
    }
  }
  a
end

Shield your eyes, children, we’ve got nested iteration!

The inner loop iterates over an array of one element. That seems a bit pointless. All we need is an index:

dates.each {|date|
  i = weeks_since(date)
  a[i] += 1 if a[i]
}

There is no way in which weeks_since will ever return nil or false, so we can delete the postfix if. While we’re at it, let’s rename a to frequencies.

def datapoints_per_week
  frequencies = Array.new(total_weeks_of_data + 1, 0)
  dates.each {|date|
    frequencies[weeks_since(date)] += 1
  }
  frequencies
end

The datapoints have been clustered by week. Next it looks like we’re going to truncate it.

# cut off chart at 52 weeks = one year
@count = [52, total_weeks_of_data].min

The comment suggests that we only want to show data back to a certain date. That kind of makes sense, but then you might wonder why the SQL query didn’t simply limit the data based on some cutoff date.

Perhaps there’s more to it.

If we look at how @count is used, it gets even more confusing.

# convert to new array to hold max @cut_off elems + 1
# for sum of actions after @cut_off
@actions_running_time_array = cut_off_array_with_sum(datapoints_per_week, @count)

Assuming that @cut_off actually means @count, this comment contradicts the previous one.

Either it’s cutting the data off at a year, or it’s summing up the dates outside the cutoff as a single entry. I don’t see how it could be both.

We need to poke at it to see what’s actually going on.

def test_stuff
  chart = Stats::RunningTimeData.new("title", [], "key", Time.now)
  per_week = [1, 0, 3, 4, 5]
  assert_equal [], chart.cut_off_array_with_sum(per_week, 1)
  assert_equal [], chart.cut_off_array_with_sum(per_week, 2)
  assert_equal [], chart.cut_off_array_with_sum(per_week, 3)
  assert_equal [], chart.cut_off_array_with_sum(per_week, 4)
  assert_equal [], chart.cut_off_array_with_sum(per_week, 5)
  assert_equal [], chart.cut_off_array_with_sum(per_week, 6)
end

One at a time, the failures fill in the blanks.

# datapoints_per_week = [1, 0, 3, 4, 5]
[1, 12] # cut off at 1
[1, 0, 12] # cut off at 2
[1, 0, 3, 9] # cut off at 3
[1, 0, 3, 4, 5] # cut off at 4
[1, 0, 3, 4, 5, 0] # cut off at 5
[1, 0, 3, 4, 5, 0, 0] # cut off at 6

The cutoff method doesn’t discard data, it batches all the overflow into a single slot. Moreover, it also fills in any empty slots within the desired time period with 0.

@count seems like an overly generic name. What it really tells us, is exactly how many weeks of data will be displayed in the chart, regardless of how many weeks of data are available in the raw data.

Renaming count to total_weeks_in_chart, and actions_running_time_array to datapoints_per_week_in_chart tells a slightly more coherent story. This is verbose, but until we have teased out all of the concepts, it’s going to be hard to pick a better name.

def datapoints_per_week_in_chart
  cut_off_array_with_sum(datapoints_per_week, total_weeks_in_chart)
end

Notice the familiar pattern: The arguments are globally available, and the method definition consists of a single call to another method.

Inlining cut_off_array_with_sum gives us:

def datapoints_per_week_in_chart
  a = Array.new(total_weeks_in_chart + 1) { |i| datapoints_per_week[i]||0 }
  a[total_weeks_in_chart] += datapoints_per_week.inject(:+) - a.inject(:+)
  a
end

This looks a bit scary, but can be simplified somewhat:

def datapoints_per_week_in_chart
  frequencies = Array.new(total_weeks_in_chart) {|i|
    datapoints_per_week[i].to_i
  }
  frequencies << datapoints_per_week.inject(:+) - frequencies.inject(:+)
end

Moving on, the code is introducing a completely new term: cumulative_percent_done.

I’m not really sure that done is the word that we are looking for. We’re dealing with unfinished TODOs, not completed TODOs. And even so, we don’t care.

Let’s rename it to cumulative_percentages:

def cumulative_percentages
  convert_to_cumulative_array(datapoints_per_week, dates.count)
end

As usual, we can inline the contained method:

def cumulative_percentages
  a = Array.new(datapoints_per_week_in_chart.size) {|i|
    datapoints_per_week_in_chart[i]*100.0/dates.count
  }
  1.upto(timestamp_counts.size-1) {|i| a[i] += a[i-1]}
  a
end

That looks more than a little frightning. It’s doing a lot.

It would be helpful to name the concepts, so let’s extract some methods.

The first part is creating an array of percentages by week.

def percentages_per_week
  Array.new(datapoints_per_week_in_chart.size) {|i|
    datapoints_per_week_in_chart[i]*100.0/dates.count
  }
end

Think about what this is doing.

  1. It is creating a new array that is the same length as an existing array, and
  2. It is deriving each value in the array from the corresponding value in the original array.

Another word for this is map.

def percentages_per_week
  datapoints_per_week_in_chart.map {|count|
    count * 100.0 / dates.count
  }
end

This method is doing quite a bit, as well. Naming the block cleans it up considerably.

def percentages_per_week
  datapoints_per_week_in_chart.map(&percentage)
end

def percentage
  Proc.new {|count| count * 100.0 / dates.count}
end

The fully refactored cumulative_percentages logic looks like this:

def cumulative_percentages
  running_total = 0
  percentages_per_week.map {|percentage|
    running_total += percentage
  }
end

That’s not nearly as frightening as it seemed just a moment ago.

Cleanup

The rest of compute can be blown apart with simple extractions. The resulting code can be seen below.

It is even longer than the original, but the concepts, originally hidden behind cryptic names at odd levels of abstraction, have now been brought to the surface. Are the names right? Probably not. This was not a refactoring to create a good arrangement of the code, it was a refactoring to discover what is there.

Inline, inline, extract, inline, extract, extract.

This is the rhythm of refactoring when codebase has not yet found the right abstractions.

The result of this refactoring is understanding, not good code.

For that, we’re going to take one final pass at this. We will ask the question if this were two things, what would they be? and out of the ashes if this refactoring, a tiny, cohesive abstraction will appear.


module Stats
  class RunningTimeData
    include Rails.application.routes.url_helpers

    attr_reader :title, :dates, :key, :today

    def initialize(title, timestamps, key, now)
      @title = title
      @dates = timestamps.map(&:to_date)
      @key = key
      @today = now.to_date
    end

    def compute
      # FIXME: delete call from controllers
    end

    def y_legend
      I18n.t('stats.running_time_legend.actions')
    end

    def y2_legend
      I18n.t('stats.running_time_legend.percentage')
    end

    def x_legend
      I18n.t('stats.running_time_legend.weeks')
    end

    def values
      datapoints_per_week_in_chart.join(",")
    end

    def links
      url_labels.join(",")
    end

    def values_2
      cumulative_percentages.join(",")
    end

    def x_labels
      time_labels.join(",")
    end

    def y_max
      # add one to @max for people who have no actions completed yet.
      # OpenFlashChart cannot handle y_max=0
      1 + datapoints_per_week_in_chart.max + datapoints_per_week_in_chart.max/10
    end

    private

    def url_labels
      urls = Array.new(total_weeks_in_chart) {|i|
        url(i, key)
      }
      urls << url(total_weeks_in_chart, "#{key}_end")
    end

    def url(index, id)
      options = {
        :controller => 'stats',
        :action => 'show_selected_actions_from_chart',
        :index => index,
        :id=> id,
        :only_path => true
      }
      url_for(options)
    end

    def time_labels
      labels = Array.new(total_weeks_in_chart) {|i|
        "#{i}-#{i+1}"
      }
      labels[0] = "< 1"
      labels[total_weeks_in_chart] = "> #{total_weeks_in_chart}"
      labels
    end

    def total_weeks_in_chart
      [52, total_weeks_of_data].min
    end

    def total_weeks_of_data
      weeks_since(dates.last)
    end

    def weeks_since(date)
      (today - date).to_i / 7
    end

    def cumulative_percentages
      running_total = 0
      percentages_per_week.map {|count| running_total += count}
    end

    def percentages_per_week
      datapoints_per_week_in_chart.map(&percentage)
    end

    def percentage
      Proc.new {|count| (count * 100.0 / dates.count)}
    end

    def datapoints_per_week_in_chart
      frequencies = Array.new(total_weeks_in_chart) {|i|
        datapoints_per_week[i].to_i
      }
      frequencies << datapoints_per_week.inject(:+) - frequencies.inject(:+)
    end

    def datapoints_per_week
      frequencies = Array.new(total_weeks_of_data + 1, 0)
      dates.each {|date|
        frequencies[weeks_since(date)] += 1
      }
      frequencies
    end
  end
end
Katrina OwenKatrina Owen
View Author

Katrina is a developer working primarily in Ruby and Go, and formerly of Splice, Turing School of Software and Design and Bengler. She is the creator of exercism.io, an open source application that developers use to level up their programming skills. She is the co-author of 99 Bottles of OOP.

GlennG
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week