Photo by Daniele Levis Pelusi on Unsplash

Counting Sort WTF

Thu Mar 18 2021

In preparation for this weekend’s Kick Start Hackathon held by Google, I started going through last year’s questions. The very first one I came to gave me difficulty. Given a set budget amount and a list of house prices, calculate the maximum number of houses you can buy. The logic seemed simple enough, and I quickly had a solution worked out in Ruby.

The difficulty came with the speed requirement: no more than 15 seconds per test set. When you think in the realm of realism, that is plenty of time. Kick Start, however, never sticks to just "realistic" data sets. With a possible $100,000 to work with, house prices $1000 and below (wouldn’t that be nice!), and a possible 100,000 houses to choose from, the test sets get quite large. Large enough that the typical O(n log n) time complexity of Ruby’s built-in sort method times out and the answer fails.

After much Google-fu, I came across the counting sort. After much more, I found an explanation that didn’t require a Ph.D. in high-level mathematics to understand. Here is my attempt at re-explaining it in a much more vernacular style:

Let’s say you have an array: arr = [1, 5, 3, 5, 2, 1] that you need to sort. The first thing you need to do is count how many of each value you have, and match it to an index that corresponds to the value being counted.

We’ll start with an array with the same maximum index value as the maximum value found in arr, initialized to 0 for each value:

max_value = arr.max
counts = [0] * (max_value + 1)

Then we count each value in arr and set the value at the matching index in counts to that count:

arr.each {|val| counts[val] += 1}

counts now equals [0, 2, 1, 1, 0, 2]. This translates to: "The sorted array will have 0 zeroes, then 2 ones, then 1 two, then 1 three, then 0 fours, then 2 fives."

Now we use counts to re-factor counts on top of itself. The new version will hold values that match the number of values that precede that index value in the original arr. I’ll explain that better below:

num_items_before = 0
counts.each_with_index{|count, index|
  counts[index] = num_items_before
  num_items_before += count
}

This changes counts to [0, 2, 3, 4, 4, 6]. This translates to: "fill up the sorted array with zeroes until before index 0 (which doesn’t exist, so 0 places), then fill with ones until before index 2 (two places), then fill with twos until before index 3 (one place), then fill with threes until before index 4 (one place), then fill with fours until before index 4 (same as before, so 0 places), then fill with fives until before index 6 (the end of the array, so 2 places)". This seems strange, but these extra steps prevent having to re-iterate through arr multiple times, saving time in exchange for the extra space needed for counts. Also, re-factoring counts on top of itself saves some additional space needed for a third array.

Finally, we create the sorted array from the new counts and return it:

sorted_array = [0] * arr.length
arr.each{|val|
  sorted_array[counts[val]] = val
  counts[val] -= 1
}

The decrementing of counts[val] allows for multiples of each value.

This returns [1, 1, 2, 3, 5, 5]. The best part is that this whole thing runs in O(n) time complexity and O(n) space complexity.

Happy coding!