Why Ruby hashes with default values are dangerous

Ruby hashes are simple yet powerful data structures every Ruby-developer uses about ten times a day. Setting default values with the Hash.new constructor feels intuitive and makes developers’ lives even easier. But overusing this language feature can lead to some surprises down the line — lots of fun debugging.

My journey into Ruby hashes and their default values started with the following code (this is the simplified version):

some_values_list.each do |value|
type = 1
# some logic is omitted
if !values_by_type[type].include?(value)
values_by_type[type] << value
# put something to database
end
end
# some usage of values_by_type hash
p values_by_type[1]

Of course, this piece of code isn’t great, but we’ll focus on resolving the main issues, not rewriting the whole thing. Our goal is to understand why it isn’t working properly.

This code saves everything in the database correctly but the resulting hash is empty. An experienced (or just attentive) Ruby developer will quickly detect the issue — changing Hash.new(){[]} to Hash.new([]) resolves the problem. But why?

We’ll dive deeper into the Hash.new method later, for now let’s just look at a couple of examples, which can be easily repeated in irb.

# Example 2. hash with array as a default value
> h2 = Hash.new([]) # => {}
> h2[1] # => []
> h2[2] # => []
> h2[1] << 'x' # => ["x"]
> h2[1] # => ["x"]
> h2 # => {}
# Example 3. hash with block provided
> h3 = Hash.new(){ [] } # => {}
> h3[1] # => []
> h3[1] << 'x' # => ["x"]
> h3[1] # => []
> h3 # => {}
> h3[1] = ['x'] # => ["x"]
> h3[1] # => ["x"]
> h3 # => {1=>["x"]}

Looks a bit confusing, but lets try to understand the logic behind it.

First, let’s define why the third example can’t work (and doesn’t make sense). The correct usage of Hash.new with the block is:

This example will help us understand the issue.

Another hint — if you haven’t left irb yet run the following code:

Hmm, the puzzle begins to add up.

Ruby hashes have the following structure:

A default value passed to the hash constructor, via either argument or block, is saved in the IFNONE structure. The only difference between the default value and the default block is that the RHASH_PROC_DEFAULT flag is only set for the block.

When you’re trying to get the value from the hash, it invokes code that looks something like this (originally written in C, not Ruby):

  get_default(key)
end
def get_default(key)
if ifnone && RHASH_PROC_DEFAULT
ifnone.call(self, key)
else
ifnone
end
end

Returning to our examples — modifying a value in the hash with a default array, h2[1] << ‘x’, didn’t update the value in the ST Table, it updated the default value for the whole hash. And asking the hash for another key not presented in the ST Table will return the same default object, already modified.

In my opinion, this is exactly the point — the hash returns the default object, not the default value. And as we know, objects in Ruby are modifiable.

The only question left is why modifying the value with default number h1[1] += 1 didn’t modify it for the h1[2]? I think, the answer is pretty obvious, isn’t it?

Conclusion

I don’t expect this small article about the hashes’ inner structure to be extremely useful, but we can probably extract the following advice from it:

  1. Try to use the simplest possible values for hash defaults.
  2. Use the Hash.new { |hash, key| hash[key] = … } form. It’s the clearest and a customizable way to set default value.

Of course, everything written here is covered in ruby documentation, but who really reads it attentively and isn’t it so much fun to discover stuff deep within Ruby internals.

Links

Backend developer, interested in Ruby, Elixir, Postgres, Domain-Driven Development

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store