Learn With Me: Elixir - File I/O (#74)

I'm preparing myself to write a project that involves file I/O, which I've previously written in a couple other languages. I'll talk more about it in an upcoming section, but for now, I just need to learn how file I/O works in Elixir.

Although I'm going to be primarily focusing on file I/O, much of what I'll be learning here can be applied to other types of I/O in Elixir.

Creating Example Files

Before I get into examples, I'm going to create some files to use in the examples. The first file, "text.txt", an ASCII/UTF-8 encoded text file with the following contents.

This is an example
of a multi-line
text file.

The second file, "blob.bin" is a binary file with the following contents. I'm showing the content using hexidecimal numbers with spaces in between them to improve readability.

24 12 04 A2 B8 45

Yep, it's only a 6-byte file, but that's a good size for demonstration purposes.

By the way, I don't have a hex editor installed at the moment, so I used HexEdit.it, an online hex editor, to create the binary file. It's a good online hex editor that I didn't know existed until I went looking for one.

The third file is text file with a list of foods.

broccoli
cabbage
curry
bibimbap
tacos

Each food is followed with a newline, so there is an empty line at the bottom of the file.

You can find these files in in the "lwm 74 - File IO" folder in the code examples in the Learn With Me: Elixir repository on Github.

Reading and Writing an Entire File at Once

First of all, I'm going to cover the most simplistic way of reading and writing files: reading and writing the entire thing in one operation. This may not suffice for large amounts of data or more complex operations, but sometimes it's just good enough.

Functions that do a complete file read and write operation are found in the File module, which contains lots of useful functions for working with files. We'll cover the File module in depth sometime in the future, but for now we'll just focus on the functions that are useful for the current topic.

The File.read/1 and its close cousin, File.read!/1 read the entire contents of a file and return it as a binary. This will just be a bunch of bytes. If the text in the file is UTF-8 text (which most text is), then this binary will also be a valid string.

As with similar pairs of functions, the difference between File.read/1 and File.read!/1 is that the first version (without the "!") will return tuple containing a code indicating if there was an error (:ok or :error) with the second value in the tuple either being the file contents or error information, and the second version (with the "!") will just throw an error if there is an error. If you expect the file to be readable, and your code can't continue without it, use the version with the "!" character. If the file may not be readable (probably because it doesn't exist) and your code is set up to handle that situation, use the version without the "!" character.

Let's try out File.read!/1 on the example files.

iex> File.read("blob.bin")
{:ok, <<36, 18, 4, 162, 184, 69>>}
iex> File.read!("blob.bin")
<<36, 18, 4, 162, 184, 69>>
iex> File.read("text.txt")
{:ok, "This is an example\r\nof a multi-line\r\ntext file."}
iex> File.read!("text.txt")
"This is an example\r\nof a multi-line\r\ntext file."

I tried reading each file with both File.read/1 and File.read!/1. The first function returns a tuple with a status code and a binary whereas the second function just returns the binary. The contents of the binary file were read correctly, although the bytes are displayed in iex in decimal notation rather than hexidecimal notation. When we read the text file, we also get a binary, but since that binary is a valid UTF-8 string, it is displayed as a string. Well, that was easy!

Now let's see what happens while I attempt to use File.read/1 and File.read!/1 to read a non-existent file.

File.read("nonexistent.txt")
{:error, :enoent}
iex> File.read!("nonexistent.txt")
** (File.Error) could not read file "nonexistent.txt": no such file or directory
    (elixir) lib/file.ex:353: File.read!/1

With File.read/1, we get a tuple containing an :error atom, and with File.read!/1, an error is thrown.

Now that we know how to read a file, let's write to a file. For writing, I'm going to write a binary to a file using File.write/3 and File.write!/3. These two functions are very similar to the read functions, but they take two extra parameters. The first parameter is the binary that will be written to the file and the third parameter are some (optional) options you can specify, such as appending to the file instead of overwriting, specifying the content encoding, and much more (see the documentation for details).

I'm going to create some text and write that to a file.

iex> File.write("write_text.txt", "The dog ate my dinner", [:write, {:encoding, :utf8}])
:ok
iex> File.write("write_text.txt", "The dog ate my dinner")
:ok
iex> File.write!("write_text.txt", "The dog ate my dinner")
:ok

In the first call, I explicitly indicated that I wanted to write with UTF-8 encoding, and in the second call, I left the options out. The result seems to be the same, so I wonder if those are the default options. The documentation does not say anything about what the default options are.

Here's the contents of the file the function wrote, which are exactly what I was expecting.

> cat write_text.txt
The dog ate my dinner

I can use the :append option to add some text to the end of the file.

iex> File.write("write_text.txt", "\nThe cat ate my dinner", [:append])
:ok

Here are the file contents now.

> cat write_text.txt
The dog ate my dinner
The cat ate my dinner

The text was appended to the end of the file just like I was expecting.

Both the error-throwing version of the write function (with the "!" character) return :ok when the write operation is successful. Let's see what happens when it fails.

I set the file to read-only by running chmod 444 write_text.txt. Let's see what happens now.

iex> File.write("write_text.txt", "The dog ate my dinner")
{:error, :eacces}
iex> File.write!("write_text.txt", "The dog ate my dinner")
** (File.Error) could not write to file "write_text.txt": permission denied
    (elixir) lib/file.ex:1014: File.write!/3

As expected, the non-error-throwing function returned an :error and an atom that indicates there are file access issues. The error-throwing function threw an error indicating that permission was denied.

Reading and Writing Files Using the IO Module

While it is convenient to read and write an entire file at once, many things we may do will require many read and write operations. We may only want to read pieces of a file, perform many different operations on a file over time, or deal with very large files whose contents we don't want to put in memory all at once.

In this model, a file is opened using File.open/2 (or one of its close cousins), which creates a file device. That file device can then be passed to the functions in the IO module to do reading or writing. This provides the ability to read or write pieces of the file rather than the entire thing.

The IO module can actually read to or write from any device, not just files. So familiarizing yourself with the IO module can be useful, since you can use the same functions to work with other devices, including standard in and standard out, which are the default devices. A "device" here is a virtual device in the operating system, which can be associated actual hardware device in your computer, but can also be software channels managed by the operating system that move data around. In this case, a file device something that represents the file that can serve as an input or output channel.


Digression Time - Standard In and Standard Out

For those of you who aren't familiar with standard in (standard input) and standard out (standard output), a process in an operating system can read from a standard input channel and write to a standard output channel. For processes being directly run by a user, standard input tends to be the keyboard and standard output tends to be the text output in a console, but depending on how a process is started, they could be other things such as files, I/O from other processes, network traffic, and so forth.

The operating system does the job of configuring standard in and out and the process should just read from standard in and write to standard out without worrying about the exact source of the input and destination of the output.


Elixir IO functions typically expect either chardata, which is text, or iodata, which is binary data. A chardata is an Elixir typespec that describes data that is a string or a character list. An iodata is an Elixir typespec, which is either a binary or a list of binaries.

Functions in the IO module that expect binary data should only be given binary data and functions that expect text data should only be given text data. If not, the function will fail with an error or it will succeed, but your data will get mangled.

You may be wondering about strings and the role they play. Strings are text data, but they are stored in binaries. So they can be passed as binary data, correct? Wrong. The functions that expect text data will be aware of the string's UTF-8 encoding and take that into consideration when performing I/O operations. The functions that work with binary data are ignorant of text encodings and will apparently mangle UTF-8 strings under certain circumstances. I haven't found any details of what those circumstances are (and I am curious about the details), but I will heed the warnings and not feed strings to binary I/O functions.

Note that this binary vs text I/O separation is typical in most of the programming languages I've used. Text just gets treated differently than blobs of bytes, and it typically has its own dedicated I/O functionality that's encoding-aware.

It's time to show some examples. First, I'm going to open a text file in read mode, read some things from it, and then close the file.

iex> file = File.open!("text.txt", [:read, :utf8])
#PID<0.116.0>
iex> IO.read(file, :line)
"This is an example\n"
iex> IO.read(file, 5)
"of a "
iex> IO.read(file, 5)
"multi"
iex> IO.read(file, 5)
"-line"
iex> IO.read(file, 5)
"\ntex"
iex> IO.read(file, :line)
"t file."
iex> IO.read(file, :line)
:eof
iex> File.close(file)
:ok

When I open the file, I pass it the :read option to indicate that we are opening the file in read mode (not write mode) and the :utf8 option to indicate that this is a UTF-8-encoded text file. That will ensure that we can read the text from the file. Notice that that File.open!/2 returns a device, and iex prints the value as a PID (a process ID). Since Elixir runs in an environment that is used to handling large numbers of processes, and I know that Elixir code will often split up parts of applications into multiple processes, it looks to me that devices (like files) are handled in a separate process, where each device gets its own process. I suspect that file I/O is actually a wrapper around some lower-level interprocess communication, but that's just a guess on my part. I believe this would allow the file I/O operations (where the Erlang VM communicates with the underlying operating system) to not affect the process where our main code is running.

Once I have the file device, I can pass it to IO.read/2 to read in some text. If the second parameter is :line, it will read in the entire line, including the newline character. If the second parameter is a number, it will read that many characters from the file. When the end of the file is encountered, an :eof is returned. Once we've read everything, we should close the file with File.close/1.

Now let's try writing text to a file.

iex> file = File.open!("write_text.txt", [:write, :utf8])
#PID<0.126.0>
iex> IO.write(file, "Cat")
:ok
iex> IO.write(file, "Dog")
:ok
iex> IO.write(file, "\n")
:ok
iex> IO.write(file, "Mouse\n")
:ok
iex> File.close(file)
:ok

I opened the file in write mode, wrote some text to the file using IO.write/2, and then closed the file.

Here's what the final file looks like.

> cat write_text.txt
CatDog
Mouse

That's just what I was expecting.

I have no idea if it's possible to open a file in both read and write modes, like was possible in C++. There doesn't appear to be any way (at least not that I've seen so far) to move a pointer around the file, allowing you to move backwards and forwards through the contents of a file, which is possible in C++ file I/O. It appears that in Elixir you can only move forward through a file. It wouldn't surprise me if there was some sort of lower-level Erlang file API, but the Elixir libraries don't appear to have that functionality.

Let's try reading and writing binary files.

With a binary file, we can open the file in binary mode and use the IO.binread/2 and IO.binwrite/2 functions to read and write binary data.

iex> read_file = File.open!("blob.bin", [:read, :binary])
#PID<0.181.0>
iex> write_file = File.open!("blob_copy.bin", [:write, :binary])
#PID<0.183.0>
iex> bytes = IO.binread(read_file, 2)
<<36, 18>>
iex> IO.binwrite(write_file, bytes)
:ok
iex> bytes = IO.binread(read_file, 2)
<<4, 162>>
iex> IO.binwrite(write_file, bytes)
:ok
iex> bytes = IO.binread(read_file, 2)
<<184, 69>>
iex> IO.binwrite(write_file, bytes)
:ok
iex> bytes = IO.binread(read_file, 2)
:eof
iex> File.close(read_file)
:ok
iex> File.close(write_file)
:ok

This code opens "blob.bin" for reading and "blob_copy.bin" for writing. Then it reads in the bytes two at a time from the read file and writes them to the write file until IO.binread/2 returns :eof. I'm sure this could be converted into a recursive function to make it generic.

In the end, I was left with two binary files, which I compared using a hex editor. The contents of both were exactly the same.

I noticed from the documentation that IO.binread/2 also has a :line option, where it will read in bytes line by line. I have no idea how a "line" is defined for binary data. The documentation does not go into details. I'll have to investigate that in the future.

When I was first reading in bytes from the "blob.bin" file, I accidentally used IO.read/2 instead of IO.binread/2, and I noticed a difference. I was reading the bytes 0x04 0xA2, which should be read as <<4, 162>> (the decimal equivalent). That's how IO.binread/2 reads those bytes. The IO.read/2 function, however, gave me <<4, 194, 162>>. I was trying to figure out how it could have produced <<194, 162>> instead of <<162>>, and I didn't figure it out until I was looking at a chart of UTF-8 code points and code units. The 0xA2 code point is encoded as 194 162 in UTF-8. The IO.read/2 function appeared to be interpreting the 0xA2 byte (which is not valid value in a UTF-8) code point), and converting it to the equivalent UTF-8 encoding. Weird.

So it definitely makes a difference which functions you use, even if it's all stored in binaries in the end.

Nom Nom Example Using IO Module Functions

This example involves reading in a list of foods from a text file and then writing each food to an output file, followed by the "om nom nom", which I'll refer to as the "nom text".

So food.txt, the input file, looks like this:

broccoli
cabbage
curry
bibimbap
tacos

The output file would then look like this:

broccoli om nom nom
cabbage om nom nom
curry om nom nom
bibimbap om nom nom
tacos om nom nom

I implemented this example in Elixir using the IO module functions I talked about above. The resulting code is located in "nom_nom_io.exs".

defmodule NomNomIO do
  @nom_text "om nom nom"

  def nom(input_file_path, output_file_path) do
    #Open the files for reading and writing
    input_file = File.open!(input_file_path, [:read, :utf8])
    output_file = File.open!(output_file_path, [:write, :utf8])

    #Nomify the food, writing the results to the output file
    nomify_food(input_file, output_file)

    #Close the files
    File.close(input_file)
    File.close(output_file)
  end

  #Starts the process of reading food from the food file and
  #writing the food plus the nom text to the output file
  defp nomify_food(food_file, output_file) do
    #Read the next food from the food file and trim the whitespace
    food = read_next_food(food_file)

    nomify_food(food_file, output_file, food)
  end

  #Writes the food to the output file with the nom text and
  #reads the next food. This function will recursively call
  #itself until all the food has been read and written to
  #the output file
  defp nomify_food(_, _, nil) do
    :ok
  end
  defp nomify_food(food_file, output_file, food) do
    #Write the food and the nom text to the output file
    write_food(food, @nom_text, output_file)

    #Read the next food from the food file
    food = read_next_food(food_file)

    #Make a recursive call to read the next food
    nomify_food(food_file, output_file, food)
  end

  #Reads the next food from the food file
  defp read_next_food(food_file) do
    #Read the food from the file and trim the whitespace
    food_file
    |> IO.read(:line)
    |> trim_food()
  end

  #Writes the food and the nom text to the output file
  defp write_food(nil, _, _), do: :ok
  defp write_food(food, nom_text, output_file) do
    IO.write(output_file, food)
    IO.write(output_file, " ")
    IO.write(output_file, nom_text)
    IO.write(output_file, "\n")
  end

  #Trims the whitespace off of a food string, unless its an
  #:eof (end of file) atom, which is passed back as a nil
  defp trim_food(:eof) do
    nil
  end
  defp trim_food(food) do
    String.trim(food)
  end
end

For me, this was a surprising amount of code for what is a relatively simple operation. I'll briefly go over the code.

The nom/2 function opens the input and output files, calls nomify_food/2 to do the work, and then closes the files at the end.

  def nom(input_file_path, output_file_path) do
    #Open the files for reading and writing
    input_file = File.open!(input_file_path, [:read, :utf8])
    output_file = File.open!(output_file_path, [:write, :utf8])

    #Nomify the food, writing the results to the output file
    nomify_food(input_file, output_file)

    #Close the files
    File.close(input_file)
    File.close(output_file)
  end

The nomify_food/2 function reads the initial value from the input file using read_next_food/1 and then passes that to nomify_food/3.

  defp nomify_food(food_file, output_file) do
    #Read the next food from the food file and trim the whitespace
    food = read_next_food(food_file)

    nomify_food(food_file, output_file, food)
  end

The nomify_food/3 function writes the food along with the nom text to the output file, reads the next value from the input file, and then calls itself recursively until it has read all the foods from the input file. With recursive calls, we always need to be aware of when it will stop calling itself. In this case, when the end of the input file is reached (:eof is read), read_next_food/1 will return a nil and nomify_food/3 will stop recursing.

  defp nomify_food(_, _, nil) do
    :ok
  end
  defp nomify_food(food_file, output_file, food) do
    #Write the food and the nom text to the output file
    write_food(food, @nom_text, output_file)

    #Read the next food from the food file
    food = read_next_food(food_file)

    #Make a recursive call to read the next food
    nomify_food(food_file, output_file, food)
  end

The read_next_food/1, write_food/3, and trim_food/1 functions take care of the small details. I wanted to keep each function from becoming too complex (and stay focused on doing one thing), so I put a lot of the code into separate functions.

To run the code, I loaded the NomNomIO module into iex and called the nom/2 function.

iex> c "nom_nom_io.exs"

[NomNomIO]
iex> NomNomIO.nom("food.txt", "nom_file.txt")
:ok

The resulting output file, "nom_file.txt", contains the expected result. You're welcome to get the examples from Github and run this yourself. Feel free to play around with the source code. I often learn a lot by doing that.

Reading and Writing Files Using File Streams

The functions in the IO module aren't the only way to work with files. Another way is to create file-based streams and then use the functions in the Stream module to manipulate the stream.

To create a file stream, all you need to do is use File.stream!/3. You need to pass options indicating whether this is a text or binary file and you can specify whether to read a file line by line (most useful for a text file) or using a specified number of bytes (most useful for a binary file. There's no need to indicate whether the stream is for reading or writing: Elixir will figure that out depending on how you use it.

Let's see some code. We'll start out by reading the contents of the example text file, displaying it on the screen.

iex> file_stream = File.stream!("text.txt", [:utf8], :line)
%File.Stream{
  line_or_bytes: :line,
  modes: [{:encoding, :utf8}, :binary],
  path: "text.txt",
  raw: false
}
iex> file_stream |> Stream.each(&IO.puts/1) |> Stream.run()
This is an example

of a multi-line

text file.
:ok

The file stream is created, and each line is streamed to a call to Stream.each/1, where it is output to the screen. The call to Stream.run/0 causes the stream to be processed. The output has gaps between the lines because IO.puts/1 puts a newline at the end of each thing it displays. The file already contains newlines, so this causes the text to be displayed with gaps. I can fix that by trimming the newlines off the text coming from the file stream.

iex> file_stream |> Stream.map(&String.trim/1) |> Stream.each(&IO.puts/1) |> Stream.run()
This is an example
of a multi-line
text file.
:ok

Each line will pass through the pipeline, getting trimmed and then displayed.

Now let's try to display the contents of the binary file.

iex> file_stream = File.stream!("blob.bin", [:binary], 2)
%File.Stream{
  line_or_bytes: 2,
  modes: [:raw, :read_ahead, :binary, :binary],
  path: "blob.bin",
  raw: true
}
iex> file_stream |> Stream.each(&(IO.puts(inspect(&1)))) |> Stream.run()
<<36, 18>>
<<4, 162>>
<<184, 69>>
:ok

The stream reads the file in binary mode 2 bytes at a time, and I then display each group of 2 bytes on the screen.

When writing to a file using streams, you take a stream containing the contents to be written and direct it to the file stream using Stream.into/1. The contents of the stream will then be written to the file. Here's an example of creating a stream that is written to a file.

iex> file_stream = File.stream!("stream_write.txt", [:utf8])
%File.Stream{
  line_or_bytes: :line,
  modes: [{:encoding, :utf8}, :binary],
  path: "stream_write.txt",
  raw: false
}
iex> Stream.into(["This is some text", "More text goes here"], file_stream) |> Stream.run()
:ok
iex> file_stream = File.stream!("stream_write.txt", [:utf8, :append])
%File.Stream{
  line_or_bytes: :line,
  modes: [{:encoding, :utf8}, :append, :binary],
  path: "stream_write.txt",
  raw: false
}
iex> Stream.into(["\nHere's a second line"], file_stream) |> Stream.run()
:ok
iex> file_stream |> Stream.map(&String.trim/1) |> Stream.each(&IO.puts/1) |> Stream.run()
This is some textMore text goes here
Here's a second line
:ok

I first open a file called "stream_write.txt" for streaming. The stream doesn't actually know whether I'm going to read or write. Whether it reads or writes to the file depends. If I were to attempt to read from this non-existent file at the point, I'd get an error. Instead I wrote some text to the file, using Stream.into/3 to pipe the contents of an enumerable (in this case a list) to the file stream. This action causes the stream to write to the file. The two pieces of text are written to the file, one after another.

After that, I reopened the stream in append mode (which I also could have done the first time). Then I streamed a second line of text to the file, which was appended on the end of the file.

Finally, using the same stream, I read from the file using Stream functions to list the contents of the file.

Ah yes, I do like streams.

Nom Nom Example Using IO Module Functions

Now let's return to the Nom Nom example from earlier. I'm going to implement the same example using streams.

Here it is in nom_nom_stream.exs:

defmodule NomNomIO do
  @nom_text "om nom nom"

  def nom(input_file_path, output_file_path) do
    #Open file streams for the input and output files
    input_stream = File.stream!(input_file_path, [:utf8])
    output_stream = File.stream!(output_file_path, [:utf8])

    #Nomify the food, writing the results to the output file
    nomify_food(input_stream, output_stream)
  end

  #Starts the process of reading food from the input stream and
  #writing the food plus the nom text to the output stream
  defp nomify_food(input_stream, output_stream) do
    input_stream
    |> Stream.map(&String.trim/1)
    |> Stream.map(&concat_nom_text/1)
    |> Stream.map(&add_newline/1)
    |> Stream.into(output_stream)
    |> Stream.run()
  end

  #Concats the nom text to a food
  defp concat_nom_text(food) do
    food <> " " <> @nom_text
  end

  #Adds a newline to the end of a string
  defp add_newline(text) do
    text <> "\n"
  end
end

You may notice that this is significantly simpler than the example that used the IO functions. It was far easier to write too. I do like streams quite a lot.

The input stream reads each line from the food file, strims the newline off of each line, concats the nom text, concats a newline character, and then streams it to the output stream, which writes the results in a file.

Here's how I ran the code in iex.

iex> c "nom_nom_stream.exs"

[NomNomIO]
iex> NomNomIO.nom("food.txt", "nom_file.txt")
:ok

Here's the output.

broccoli om nom nom
cabbage om nom nom
curry om nom nom
bibimbap om nom nom
tacos om nom nom

To my great surprise, this worked perfectly the first time I ran it. The stream abstraction really simplified the code. Yay, streams!

Conclusion

If you've read the section on streams, you'll know that I love streams. So it should come as no surprise to you that I prefer file streams to using functions in the IO module. Depending on the situation you're in, you may find file stream operations more convenient or you may find the IO module functions more convenient. I don't yet have enough experience in Elixir file I/O to give more detailed guidance than that. I recommend trying to use both methods and see what works best for you.

I believe I now have enough knowledge and experience with file I/O to start working on a project that involves file I/O. I'll take a slight detour to cover the functions in the File module, and then I'll start working on the project.

If you want to do some extra reading, here are some links that I found useful when learning about files in Elixir.
- https://elixir-lang.org/getting-started/io-and-the-file-system.html
- https://joyofelixir.com/11-files/