Learn With Me: Elixir - Character lists (#15)
There's another string-like construct in Elixir called a character list. It's not really a string, but a standard Elixir list that contains numbers representing characters.
A character list literal uses single quotes instead of double quotes.
iex> string = "Speedy Taco"
"Speedy Taco"
iex> char_list = 'Speedy Taco'
'Speedy Taco'
Whereas strings are actually UTF-8-encoded binaries, character lists are lists of numbers, where each number represents a Unicode code point. Elixir will automatically convert the characters in a character list literal into the equivalent numbers. Likewise, when Elixir sees a list of numbers, and all those numbers represent printable characters, it will display the list as a character list literal instead.
For example, the character list 'Bob' is actually [66, 111, 98]. IEx just chooses to display that list of numbers as a character list literal instead, because all those numbers correspond to Unicode code points.
iex> [66, 111, 98]
'Bob'
If we take an existing character list and concatenate a [0]
to it, we'll see the true values of the character list. This works because IEx will only display number lists as characteris if all the characters in the list correspond to printable characters. A 0 does not correspond to a printable character, so IEx will display the list as a list of numbers instead.
iex> char_list = 'Speedy Taco'
'Speedy Taco'
iex> char_list ++ [0]
[83, 112, 101, 101, 100, 121, 32, 84, 97, 99, 111, 0]
As we can see, 'Speedy Taco'
is actually stored as [83, 112, 101, 101, 100, 121, 32, 84, 97, 99, 111]
, with each number corresponding to a character. It's just a list of integers, but IEx displays it as a character list literal because all the numbers correspond to printable characters.
This is why you will sometimes be working with a list of numbers and IEx will display the list as characters instead. It just happens that those numbers all correspond to printable characters. This confused me immensely when I first encountered it. A list of numbers I had created appeared in IEx as characters, and I thought I had done something wrong. It wasn't until I investigated why this happened did I come across the concept of character lists.
This behavior seems to be limited to the basic ASCII character set (Basic Latin in Unicode), since if I add a number that's a Unicode code point, but outside the Basic Latin code points, the list is once again displayed as a list of numbers.
For example, [66, 111, 345, 98] is displayed by the interpeter as a list of numbers. The code point 345 corresponds to the "ř" character in Unicode, but that's outside the Basic Latin group. I guess the interpreter is just trying to be helpful when displaying lists containing Basic Latin code points. It's a little annoying when you are working with lists of numbers though.
iex> [66, 111, 98]
'Bob'
iex> [66, 111, 345, 98]
[66, 111, 345, 98]
Like me, you are probably wondering what character lists are used for and why strings aren't good enough. You'd be right to wonder about that. I certainly did when I was learning about character lists. To me, they appeared to be a less-awesome version of strings.
It turns out that pretty much the only reason Elixir has character lists is to interact with Erlang. Erlang stores strings as character lists, and since Elixir can interact with Erlang libraries, you need to be able to easily pass strings to Erlang functions and receive strings from Erlang functions. I'm sure there are a few other uses of character strings, like iterating over code points in a string, but interacting with Erlang code is the reason character lists exist.
I haven't seen any code that interacts with Erlang yet, but from what I've read so far, the typical workflow seems to be to work with Elixir strings and then convert the string to a character list before passing it to an Erlang function. Then, if any character list is returned, you convert it right back to an Elixir string.
There are of course functions to convert between character lists and strings.
iex> String.to_charlist("Hello World")
'Hello World'
iex> List.to_string('Hello World')
"Hello World"
iex> List.to_string([3, 353, 1024])
<<3, 197, 161, 208, 128>>
It looks like List.to_string/1
converts any list of code points into the UTF-8 representation, but either IEx or my console is unable to display this particular binary as a string, probably because the 3 represents an unprintable character. The code point 353 is encoded as two bytes (197, 161) in UTF-8 and the code point 1024 is encoded as (208, 128).
In fact, if we leave off the 3, we do get a printable string when we do the conversion.
iex> List.to_string([353, 1024])
"šЀ"
Character list literals can also be specified by using a sigil, which begins with a tilde (~) character. So 'Hello World' can be written as ~c(Hello World), where ~c() indicates a character list. I'm not sure why a sigil would be used instead of '' notation, but it is available. Perhaps it makes it easier to put single quote characters in the character list.
iex> char_list = ~c(Hello World)
'Hello World'
iex> char_list = ~c(A 'character list' with 'single quotes')
'A \'character list\' with \'single quotes\''
Elixir allows you to retrieve a character's code point value by using the ?
before a character, like so: ?c
gives you the code point number that corresponds to the letter "c", and ?T
results in the code point value for "T".
iex> ?c
99
iex> ?T
84
iex> [?T, ?c]
'Tc'
From what I've seen of character lists so far, I imagine that I will rarely use them in practice, and pretty much only when making calls to Erlang libraries.