Learn With Me: Elixir - Binaries (#8)

We covered binaries briefly in the post discussing the data types in Elixir, but here's where we go into detail.

As previously mentioned, binaries store binary data either as a collection of bytes (binary) or as a collection of bits (bitstring). I suspect they are both pretty much the same thing under the surface, but a binary makes it simpler to deal with data that has the byte as the smallest element.

A binary literal can be represented with angle brackets <<>>, with numbers inside it giving the value of a single byte.

iex> <<1, 0, 4, 18>>
<<1, 0, 4, 18>>

So the binary literal <<1, 0, 4, 18>> represents 4 bytes with the values 0x01, 0x00, 0x04, and 0x12.

Since a byte can only store values between 0 and 255, any byte values over 255 make no sense. I tried to see what would happen when I did specify a value over 255, and I found that the value rolls over to 0 and goes up from there like an odometer.

iex> <<1, 2, 255, 256>>
<<1, 2, 255, 0>>
iex> <<1, 2, 255, 260>>
<<1, 2, 255, 4>>

Byte values can be specified using decimal, octal (0o), or hexidecimal (0x) notation. Here's an example.

iex> <<1, 0, 0o04, 0x12>>
<<1, 0, 4, 18>>

IEx appears to always display binary values using the decimal byte values.

Binaries can be concatenated using the <> operator.

iex> <<3, 12>> <> <<4, 8>>
<<3, 12, 4, 8>>

The byte_size function can be used to retrieve the number of bytes in a binary value.

iex> byte_size(<<1, 0, 4, 18>>)
4
iex> byte_size(<<1, 2, 128, 96, 255, 4>>)
6

Variables can also be used to specify byte values in a binary literal.

iex> byte1 = 34
34
iex> byte2 = 154
154
iex> binary = <<byte1, byte2, 87>>
<<34, 154, 87>>

Bitstrings

Bitstrings are very similar, except that they work with collections of bits instead of bytes. A bitstring literal looks almost the same. Whole bytes can be specified with decimal (or octal or hexidecimal) numbers and sequences of bits can be specified using the ::size operator. So <<2::size(2)>> will give you the bit string "10" and <<5::size(3)>> will give you the bit string "101". We could even use a high number to specify a longer sequence of bits. The bitstring <<1138::size(11)>> is equal to 10001110010.

Let's look at what bitstream literals look like in IEx.

iex> <<2::size(2)>>
<<2::size(2)>>
iex> <<5::size(3)>>
<<5::size(3)>>
iex> <<1138::size(11)>>
<<142, 2::size(3)>>

Notice the last example, <<1138::size(11)>>. It is displayed by IEx as <<142, 2::size(3)>>. Those two values are the same, just displayed in different ways. 142 is the equivalent of 10001110, which is the first 8 bits, and 2::size(3) is the equivalent of 010, which is the 3 remaining bits.

So we can see that IEx will always attempt to display the byte values for every chunk of 8 bits and then display the remaining bits using the ::size notation.

The ::size operation makes a big difference in what the bits look like. If you specify a larger size than the value needs, then the leading bits will be set to 0. So <<5::size(5)>> will give you the bitstring 00101, whereas <<5::size(3)>> gives you the bit string 101.

If you specify a smaller size than the value needs, then the most significant bits will be chopped off. So <<5::size(2) will give you the bit string 01, since the leading 1 will have been chopped off. IEx will display the resulting bit string as "<<1::size(2)>>" because that's the equivalent of 01.

Binaries are are a subtype of bitstrings, so any binary is also a bitstring. A bitstring is only considered a binary when the number of bits is divisible by 8. This is true even if we use the "::size" option to specify it. So <<4::size(8)>> and <<4::size(16)>> are binaries as well as bitstrings, but <<4::size(7)>> is just a bitstring

Elixir provides us with is_binary and is_bitstring functions.

This bitstring is not a binary, because the number of bits is not divisible by 8

iex> is_binary(<<142, 2::size(3)>>)
false
iex> is_bitstring(<<142, 2::size(3)>>)
true

All binaries are also bitstrings.

iex> is_bitstring(<<142, 2>>)
true
iex> is_binary(<<142, 2>>)
true

The bitstrings that are divisible by 8 are also binaries, but the ones that aren't are just bitstrings.

iex> <<4::size(8)>>
<<4>>
iex> is_binary(<<4::size(8)>>)
true
iex> is_bitstring(<<4::size(8)>>)
true
iex> is_binary(<<4::size(16)>>)
true
iex> is_bitstring(<<4::size(16)>>)
true
iex> is_binary(<<4::size(7)>>)
false
iex> is_bitstring(<<4::size(7)>>)
true

Since a string is actually a binary, strings are both binaries and bitstrings.

iex> is_binary("This is a string")
true
iex> is_bitstring("This is a string")
true

Other data types are neither binaries nor bitstrings.

iex> is_binary(1.2)
false
iex> is_bitstring(1.2)
false
iex> is_binary(5)
false
iex> is_bitstring(5)
false
iex> is_binary(:atom)
false
iex> is_bitstring(:atom)
false

I generally consider binaries and bitstrings to be two aspects of the same concept: they're just different ways of expressing a blob of binary bits.

The ::size operator is called a segment option and there are many more segment options that can be used to specify a blob of bits.

The ::float segment option can be used to convert a number to a floating point number, which is then converted to a binary. Floats consist of 8 bytes

iex> <<4.3::float>>
<<64, 17, 51, 51, 51, 51, 51, 51>>
iex> <<3::float>>
<<64, 8, 0, 0, 0, 0, 0, 0>>

The ::utf8 segment option converts a string into a binary of UTF-8 bytes. However, since a string is already a sequence of UTF-8 bytes, this doesn't do much for standard Elixir strings. It's probably more useful for converting an alternate encoding. The ::utf8 option is restricted to string literals.

iex> <<"Bob"::utf8>>
"Bob"

The ::utf16 segment option converts a string into a binary of UTF-16 bytes. Since IEx doesn't attempt to display UTF-16 binaries as text, it will display the result as a binary literal. The ::utf16 option is restricted to string literals.

iex> <<"Bob"::utf16>>
<<0, 66, 0, 111, 0, 98>>

There are many more of these segment options, and they are documented as part of the bitstream constructor syntax section in the Kernel.SpecialForms documentation. There are a lot of options when you need something converted to a binary.

We can also embed strings in binary literals, and the string bytes will be added to the binary.

iex> <<3, 5, "Bob", 2, 10>>
<<3, 5, 66, 111, 98, 2, 10>>

It took me a little while to understand Elixir binaries, bitstreams, and how they are represented, and just the act of writing this helped me to understand them better. Also, just trying things out in IEx gave me a better understanding as well. When getting familiar with Elixir, I suggest playing around in IEx to figure out how things work.

Most Elixir code will probably not have to deal with binaries directly. They'll be there in the sense that strings are also binaries, but you are likely to never have to treat them as binaries. Only those of you who will specifically be processing with some sort of binary data (images, audio, video, etc.) in your Elixir code will end up delving into the details of binaries. I'm sure that there are also issues like endianness to deal with, but I won't be diving that far in at this point.

I probably won't be doing much manipulation of binaries either, but I went into detail here because I find binary data to be pretty interesting. It probably goes back to my earlier days of programming where I was writing code in C++ that dealt with image data, screen buffers, different endianness, and different pixel color formats on different hardware. It's kind of a nice luxury not to have to worry about hardware details (or manual memory management) anymore, but having learned that sort of stuff gives me a lot more understanding as to what is going on under the covers in the software libraries I'm using.