Learn With Me: Elixir - Pattern Matching (#22)

Pattern matching is a mechanism in Elixir that controls how data are bound to variables and it can provide a form of control flow. There are many places in Elixir where you can provide a pattern and data that matches that pattern will be used to perform some sort of function.

Pattern matching at its most basic resembles destructuring in Javascript, but it is much more than that. Pattern matching is one of those fundamental concepts that Elixir is built around and I haven't seen an equivalent to it in any other language I know. Combining Elixir function clauses and pattern matching together is a powerful tool, although I'm only beginning to understand how to use it.

C# 7 has a simple version of destructuring (called "deconstructors") that superficially resembles Elixir pattern matching, but it only works for classes that specifically implement a deconstruction method. C# 7 also has a a form of pattern matching that can be used in "if" and "switch" statements, but that's quite a bit different than the Elixir concept of pattern matching.

A lot of Elixir revolves around pattern matching, so it's critical to understand. Fortunately, although I had never seen pattern matching before, it didn't take a long time to figure out how it works. However, I suspect it will take a lot of practice before I fully master it.

It's difficult to convey pattern matching without showing any examples, so let's look into the most simple example of pattern matching: assignments.

Pattern Matching and Assignments

The assignment = operator in Elixir doesn't assign in the same sense as other languages. Instead, it pattern matches the right side of the = operator to the left side. This means that it attempts to match the data on the right side of the = operator to the pattern on the left side.

Let's look at a very simple example of an assignment.

iex> x = 4
4

In this assignment statement, Elixir attempts to match what's on the right side to the pattern left side. In this case, that's easy. The pattern is a single variable, so Elixir matches the integer 4 with the variable x: they correspond to each other. So the variable x is bound to the value 4.

This is pattern matching at its most basic.

The opposite cannot be done. An undefined variable on the right side will result in an error.

iex> 4 = x
** (CompileError) iex:1: undefined function x/0

This is because Elixir attempts to access the values of anything on the right side of the "=" operator and match them with a pattern on the left side. The number 4 is a very simple pattern, but the variable x is not data because it doesn't exist.

In order to make this pattern match successful, we will need to assigning x to the value of 4 beforehand. The variable x now exists and is bound to data. The pattern matches because the literal 4 matches a variable bound to the value 4.

iex> x = 4
4
iex> 4 = x
4

No binding will occur in this case because the variable is on the right side of the "=". When a variable is on the right-hand side, only its values will be used. Variables on the left side, on the other hand, will not have their values examined: they will be bound to what they are matched to. We can see this in another example.

iex> x = 4
4
iex> 4 = x
4
iex> 5 = x
** (MatchError) no match of right hand side value: 4

The statement 4 = x matches because x is 4, but nothing further happens. When we attempt to match using 5 = x, 5 does not match 4, so we get a match error. The variable x is not bound to 5 because it is on the right side of the operator.

That matching was really trivial, so let's go with a more involved pattern matching example.

iex> x
** (CompileError) iex:1: undefined function x/0

iex> {x, 5} = {10, 5}
{10, 5}
iex> x
10

When this code begins, the variable x does not exist (I restarted IEx just prior to doing this to ensure that it did not exist). Then we make an assignment in the form {x, 5} = {10, 5}. Elixir sees that the tuple {10, 5} matches the pattern {x, 5}. Both sides are tuples of size 2.

Since x is on the left side of the assignment, it is bound to the matching data on the right side of the assignment, which is 10. So x is bound to 10 because they match.

Now watch what happens when the two tuples don't match. I'll restart IEx so that x is no longer bound to anything.

iex> x
** (CompileError) iex:1: undefined function x/0

iex> {x, 5} = {10, 2}
** (MatchError) no match of right hand side value: {10, 2}

iex> x
** (CompileError) iex:1: undefined function x/0

When we start off, x is unbound. We attempt to do an assignment, but there is no match. {x, 5} does not match {10, 2} because 2 is not equal to 5. So we get a MatchError instead. Afterward, x is still unbound: no binding took place.

It's a bit like algebra in a programming language. I'm hoping that pattern matching is starting to become clear to you now. It looks a lot like destructuring in Javascript so far, but it goes beyond that, which we'll see as we use pattern matching more and more.

More Pattern Matching Examples

I'm going to toss out a few more pattern matching examples so that you can get a better sense of how it works.

We can use pattern matching to extract all the values in a tuple, which is an exact equivalent to destructuring in Javascript.

iex> some_tuple = {5, 6}
{5, 6}
iex> {x, y} = some_tuple
{5, 6}
iex> x
5
iex> y
6

The pattern {x, y} matches the tuple on the right, which also has two values, so x is bound to 5 and y is bound to 6.

Here are some examples of failed matching.

iex> [x, y] = {5, 6}
** (MatchError) no match of right hand side value: {5, 6}

iex> [x, y] = [5, 6]
[5, 6]
iex> {x, y, z} = {5, 6}
** (MatchError) no match of right hand side value: {5, 6}

iex> {x, y, z} = {5, 6, 7}
{5, 6, 7}

In the first example, [x, y] fails to match {5, 6} because the pattern matches a list, but the data is in a tuple. They are different data types, so there is no match. When I change the data to be a list, then there is a pattern match.

In the third example, {x, y, z} fails to match {5, 6} because that pattern matches a tuple with three elements. The tuple we are matching it against has only two elements. However, that same pattern will match the tuple {5, 6, 7}, with x being bound to 5, y being bound to 6, and z being bound to 7.

The Pin Operator

The pin operator ^ pulls out the value of a variable on the left-side of the "=" operator to serve in pattern matching, and the pinned variable is not rebound. Remember that the values of variables on the left side of the "=" operator are usually ignored and those variables are rebound. The pin operator flips that around so the variable on the left side is never rebound and its value gets involved in pattern matching.

Let's see an example without the pin operator.

iex> name = "Bob"
"Bob"
iex> {name, favorite_color} = {"Sam", "blue"}
{"Sam", "blue"}
iex> name
"Sam"
iex> favorite_color
"blue"

In this example name started off being bound to "Bob", but the pattern matching causes name to be bound to "Sam". The value "blue" is then bound to favorite_color. What if we don't want to rebind the variable name? Well, we can use the pin operator ^ to tell Elixir that we want to use the value of the variable for pattern matching, and that it shouldn't be rebound.

Here's the same example with the pin operator

iex> name = "Bob"
"Bob"
iex> {^name, favorite_color} = {"Sam", "blue"}
** (MatchError) no match of right hand side value: {"Sam", "blue"}

This example fails to match because the value of name, "Bob", does not match "Sam" on the right. The pin operator ^ caused the pattern matching to examine the value of the name variable and use that in the pattern matching.

So {^name, favorite_color} = {"Sam", "blue"} is the equivalent of {"Bob", favorite_color} = {"Sam", "blue"}, which clearly does not match.

Here's an example that will match.

iex> name = "Sam"
"Sam"
iex> {^name, favorite_color} = {"Sam", "blue"}
{"Sam", "blue"}
iex> name
"Sam"
iex> favorite_color
"blue"

The names match and favorite_color will be bound to "blue". The name variable will remain untouched because of the pin operator.

Wildcards

An underscore _ in a pattern represents any value, and is called the wildcard character. Any value that matches the wildcard character _ is not bound to anything because there is no variable in that part of the pattern, but it's quite useful to use when you don't care about the value in that part of the pattern.

Let's say I want the third item in a three-item tuple, but I'm not interested in the other two items. That's a great use case for wildcards.

iex> {_, _, third_item} = {10, 12, 20}
{10, 12, 20}
iex> third_item
20

The pattern represents a tuple of size 3 and it matches a tuple of size 3. However, only the third item in that tuple will be bound because that's the only part of the pattern with a variable. The other two items will match the wildcard characters, but won't be used for anything.

String Matching

We can do a very basic form of string matching by using the concatenation operator <> in a pattern.

iex> "Stuff: " <> stuff_contents = "Stuff: Ookaboo and the little dog too"
"Stuff: Ookaboo and the little dog too"
iex> stuff_contents
"Ookaboo and the little dog too"

I was only interested in the text that followed "Stuff: ", so I constructed a pattern that would extract that text and bound it to stuff_contents.

As far as I know of, that's all there is to string matching. If you want anything more sophisticated, regular expressions would be a better fit, but regular expressions (as far as I know of) can't be used in Elixir pattern matching. You can only do regular expression matching via the functions in the Regex module.

Pattern Matching For Various Data Types

Let's continue to explore how pattern matching works for various data types.

Tuple Pattern Matching

iex> {_, 4, result} = {7, 4, :ok}
{7, 4, :ok}
iex> result
:ok
iex> {first, second, third} = {1, 2, 3}
{1, 2, 3}
iex> first
1
iex> second
2
iex> third
3

List Pattern Matching

iex> [first, second, first] = [10, 6, 10]
[10, 6, 10]
iex> [first, second, first] = [4, 6, 8]
** (MatchError) no match of right hand side value: [4, 6, 8]

iex> [_, second, _] = [8, 12, 10]
'\b\f\n'
iex> second
12
iex> ["peanut", second, "almond"] = ["peanut", "macadamia", "almond"]
["peanut", "macadamia", "almond"]
iex> second
"macadamia"

I did something new in that first example. The pattern [first, second, first] means that the same value must be found at position 1 and position 3 in the list. In the first example, that's the value 10. In the second example, I used the same pattern, but the values at positions 1 and 3 were different, so there was no match.

Since lists have head and tails, you can do head and tail matching. This is a very important part of list matching.

iex> [head | tail] = [4, 5, 6]
[4, 5, 6]
iex> head
4
iex> tail
[5, 6]

In this example, the element 4 matches head and the rest of the list, [5, 6] matches tail.

List patterns will also allow us to retrieve the first N elements of the list and then put the rest in the tail, like so:

iex> [_, second | rest_of_list] = [4, 5, 6]
[4, 5, 6]
iex> second
5
iex> rest_of_list
[6]

The first element of the list matches the wildcard character, the second element matches the second variable, and the rest of the list (the tail) matches the rest_of_list variable.

Map Pattern Matching

iex> %{first: first, second: second} = %{first: 4, second: 9}
%{first: 4, second: 9}
iex> first
4
iex> second
9
iex> %{first: first, second: 9} = %{first: 4, second: 9}
%{first: 4, second: 9}
iex> {first: first, second: second} = %{first: 5}
** (SyntaxError) iex:44: syntax error before: first

iex> %{first: first} = %{first: 4, second: 9}
%{first: 4, second: 9}

If we specify a pattern that exactly matches all the keys in the map, we can bind some or all of the keys to variables. If we have a pattern with a key that isn't in the map, there will be no match.

We don't have to specify all the keys in a map pattern. We can also specify a subset of those keys. We can specify a pattern that that only has one key (the last example) and it will match any map that has that key, even if it also has other keys. We can also do the same thing with two keys. This is great if the map you're matching against has a lot of keys.

We can use an empty map pattern to match any map.

 %{} = %{first: 6, second: 9, third: 12}
%{first: 6, second: 9, third: 12}

We can use underscores as values to just match keys.

iex> %{name: _, age: _} = %{name: "Bob", age: 32}
%{age: 32, name: "Bob"}

The pattern in the example matches any map that has the keys :name and :age regardless of what the values are.

Although values can be bound during pattern matching, keys cannot.

iex> %{key: "Bob"} = %{name: "Bob"}
** (MatchError) no match of right hand side value: %{name: "Bob"}

This fails because the pattern will only match a map with a key of :key. It will not cause the :name key to be bound to a key variable. That cannot be done. You can enumerate the key-value pairs to find the key whose value is "Bob", but you cannot do that with pattern matching. Maps are dictionaries and dictionaries involve key to value mapping, not value to key mapping.

However, we can use the pin operator when we do map pattern matching so that the key can be stored in a variable.

iex> key = :name
iex> %{^key => "Bob"} = %{name: "Bob"}
%{name: "Bob"}

We have to use arrow syntax in this case, but it does allow us to pattern match a key that isn't known until runtime.

In most of my examples, the right side consists of a literal, but production code will also use variables on the right side of an assignment as well.

iex> value_map = %{first: 6, second: 9, third: 12}
%{first: 6, second: 9, third: 12}
iex> %{third: number} = value_map
%{first: 6, second: 9, third: 12}
iex> number
12

You may have noticed that a lot of these pattern matching statements I've shown so far don't actually bind anything, and as a result, are pretty useless. The patterns match, but nothing else happens. This is true with regards to the assignment operator, but pattern matching can be used for flow control as well. We'll get into that in a future post where I'll go over how pattern matching is used with functions.

Combining the Patterns

We can also combine all these data type patterns to get much more complex matching.

iex> user_name = "Bob"
"Bob"
iex> %{name: ^user_name, furniture: [_, "chair" | _], favorite_colors: {"red", _, _}} = %{name: "Bob", furniture: ["table", "chair", "bed", "couch"], favorite_colors: {"red", "blue", "yellow"}}
%{
  favorite_colors: {"red", "blue", "yellow"},
  furniture: ["table", "chair", "bed", "couch"],
  name: "Bob"
}
iex> user_name
"Bob"

This example matches a map with a user name of "Bob" (pin operator ensures that the value of the "user_name" variable is used and that "user_name" is not rebound), the second item in the "furniture" list being "chair", and the first item in the "favorite_colors" tuple being "red" and that there are three items in the "favorite_colors" tuple. It doesn't matter exactly how many items the "furniture" list has, because we use the pipe character: it just has to have at least two items.

This is a particularly complex example of pattern matching, so take your time to look at it and understand it. Most code will probably not be this complex, but you can create very large patterns if you choose to.

Using Pattern Matching

Pattern matching is commonly used to pull information out of tuples returned from functions. It's common in Elixir, for example, to return a tuple whose first element is an atom that indicates what kind of result it is (:ok or :error) and the second element is the data being returned.

For example, {:ok, contents} = File.read("data.txt") will pattern match if the file read operation succeeded. If it fails, we will get a failed match.

It wasn't clear to me what happens after the failed match. Where does the error condition get handled? So I had to track down some example code. I found some in the Elixir IO module documentation. I love that Elixir documentation!

It turns out that a "case" statement is used. I haven't learned about the case statement yet at this point, but it's obvious from the example how it works.

case File.read(file) do
  {:ok, body}      -> # do something with the `body`
  {:error, reason} -> # handle the error caused by `reason`
end

I'm fairly certain that the case statement does pattern matching and runs some code depending on the pattern. In this example there is a success and a failure pattern and a place to put the code that handles each situation.

While reading the documentation, I also became aware that there's also another version of File.read/1 called "File.read!/1". As you may recall from when I talked about function naming conventions, the "!" in a function's name means that it throws an exception.

So "File.read!/1" will throw an exception if there's an error. I know absolutely nothing about Elixir exceptions other than that they exist, but I'm going to assume that they function similarly to exceptions in every other language I'm familiar with. In that case, the exception would be caught and handled at a higher level.

It's quite nice that we can choose between functions that return a result when there's an error and functions that throw an exception when there's an error. I'm sure that both versions can be useful.

But Wait! There's More!

Yes, that's right. We've just covered pattern matching and how it applies to assignments. Pattern matching is much more than destructuring on steroids. Pattern matching is also used for other things, such as determining which function clause should be called. I'll cover how pattern matching applies to functions in the next post. Pattern matching is one of the core concepts in Elixir, and Elixir would be a completely different language without it.