Learn With Me: Elixir - ElixirLargeSort IntGen Project Part 1 (#76)

I've learned enough now to start working on the "ElixirLargeSort" project. This project is a fun exercise in sorting a very large number of integers while keeping only a subset of integers in memory at any particular time, and I've been using it as a way to practice or improve my skills in various programming languages.

I originally saw this listed somewhere as an interview question: "How do you sort a large number of integers without being able to load all of them into memory at once?" I quickly realized that the solution would be to sort chunks of integers and write them to files. Then I would merge multiple sorted integer files in to a single sorted integer file, solving the problem without ever having them all in memory at once.

I was learning Node.js at the time, so I wanted to go and implement the solution in Node.js to get some experience using it. The Node.js project is called NodeLargeSort, which is on my GitHub page. It also uses Bacon.js, a functional reactive programming library that is very much like working with streams in Elixir.

Later on I implemented the solution in C# in the LargeSortCSharp project, also on my Github page. I already knew C# really well, but I thought it would be interesting to get to know .NET Core better, so I implemented the soution using .NET Core 2.1. It was an interesting exercise and I was able to compare it to the Node.js solution.

Now it's time to implement this solution in Elixir. So I created a project on my Github page called ElixirLargeSort. This project will accomplish the same thing that the other two projects did, but using Elixir. Like the other projects, I plan to create a full suite of unit tests to test the parts of the code.

Like the other projects, there are two runnable programs in ElixirLargeSort: IntGen and IntSort. IntGen is the simpler of the two: its job is to create an integer file of randomly-generated integers. This provides us a collection of random integers to sort in chunks. The IntSort program will then read in the randomly-generated integers and provide the sorting functionality, spitting out a final sorted integer file.

When I talk about "integer file" in the context of this project, I mean a text file with an integer on each line. Having each integer on its own line makes it easy for code to read the file and easy to people to view the file in a text editor. The line numbers in the editor can easily tell us how many integers are in an integer file.

In this post, I'm going to walk you through the process of creating the IntGen tool. We'll work on IntSort later on.

Getting Started

First, I created the IntGen project (known to Elixir by the :int_gen atom) by running mix new int_gen. That created a nice project for me. Since Elixir is a functional language, I'm trying to create simpler, more composable functions, so I started work on a function that creates a random integer stream. Anyone who's been reading along knows that I love streams, so I'm going to try to use streams as much as I can (the Node.js implementation used Bacon.js streams).

So I created a module called IntGen, which will contain the main functions for the IntGen application. After I'm finished with this module, I'll put an interface layer (to the command line) on top of it. Back in lwm 43, where I talked about streams, I had already implemented a stream that generates random integers. So I'll just use that code here as well.

Here's the initial code for the IntGen module.

defmodule IntGen do
  @moduledoc """
  Contains the functionality for creating a file of random integers
  """

  @doc """
  Creates a stream that generates an endless number of random integers
  from min_value to max_value (inclusive)

  ## Parameters

  - integer_range: the range of integers to be generated

  ## Returns

  A stream that will generate an endless number of random integers
  """
  @spec random_integer_stream(Range.t()) :: Enumerable.t()
  def random_integer_stream(integer_range) do
    Stream.repeatedly(fn -> Enum.random(min_value..max_value) end)
  end
end

Most of the contents of this module are actually documentation. The code is quite small. The random_integer_stream/1 function takes a range that defines the range of the random integers to be generated and then returns a stream (created using Stream.repeatedly/1) that will generate those random integers. Note that this is an infinite stream, so it will generate as many random numbers as we need.

Once I finished with that code, I compiled it by running mix on the command line. Once it compiled, I loaded the application into iex by running iex -S mix. Then I tested the function manually to see how it works, piping the resulting stream into a list to see the randomly-generated integers.

> iex -S mix
Interactive Elixir (1.8.0) - press Ctrl+C to exit (type h() ENTER for help)
iex> IntGen.random_integer_stream(1..100)
#Function<54.117072283/2 in Stream.repeatedly/1>
iex> IntGen.random_integer_stream(1..100) |> Enum.take(10)
[3, 77, 31, 3, 73, 96, 91, 45, 3, 21]
iex> IntGen.random_integer_stream(-8..23) |> Enum.take(10)
[10, 22, 18, 21, 0, 10, 19, -7, -1, 2]
iex> IntGen.random_integer_stream(-10000..10000) |> Enum.take(10)
[6839, 2411, 8535, -1875, 8413, 3844, 2333, -2002, -3911, -2953]

It's important to use Enum.take/1 here because the stream is infinite. The first time I did this, I mistakenly used Enum.to_list/1 and was then puzzled when it never finished, until I realized what I had done.

Manual tests appeared to be fine, so the next step was to write some unit tests for the function. Here are the tests in "test/in_gen_test.exs".

defmodule IntGenTest do
  use ExUnit.Case
  doctest IntGen

  @num_stream_elements 1000

  describe "random_integer_stream -" do
    test "Generates integers" do
      random_stream = IntGen.random_integer_stream(1, 100)

      random_stream
      |> Enum.take(@num_stream_elements)
      |> test_integers()
    end

    test "Testing range with only positive numbers" do
      test_range(1, 100)
    end

    test "Testing range with positive and negative numbers" do
      test_range(-10, 10)
    end

    test "Testing range with negative numbers" do
      test_range(-87, -12)
    end

    test "Testing range that starts with 0" do
      test_range(0, 23)
    end

    test "Testing range that ends with 0" do
      test_range(-145, 0)
    end

    test "Testing range of size 2" do
      test_range(4, 5)
    end

    test "Testing range of size 1" do
      test_range(12, 12)
    end

    test "Testing range 0..0" do
      test_range(0, 0)
    end

    test "Testing descending range" do
      test_range(10, -2)
    end

    test "Testing large range" do
      test_range(-1_000_000_000, 1_000_000_000)
    end

    defp test_integers(enumerable) do
      Enum.each(enumerable, fn element -> assert is_integer(element) end)
    end

    defp test_range(min_value, max_value) do
      random_stream = IntGen.random_integer_stream(min_value, max_value)

      expected_range = min_value..max_value

      random_stream
      |> Enum.take(@num_stream_elements)
      |> Enum.each(fn integer -> integer in expected_range end)
    end
  end
end

I grouped all the tests for this function using describe. I tested that the stream only generates integers (and not some other type of data) and then tried testing with different ranges of values to verify that the stream still functions as expected. While writing these tests, I saw that they would mostly be the same, with the only variation being the range of integers that were being generated. So I created a function that contains the logic for all the tests, with the test variables passed in as parameters. So all I had to do for each test is to call test_range/2, passing the range I want to test. I typically structure my unit tests this way when I see a chance for reusability. It's a lot more maintainable and readable than copying and pasting code between the tests.

As I was writing the tests, I wanted to see the tests start off failing and go from failing to passing as I wrote the tests. In the past when I failed to do this, I occasionally found that I forgot to implement some tests, since they we're all passing due to not containing any code. I call these sort of tests false positives, so I like to insert something that deliberately causes them to fail until they are implemented.

In Elixir, I used the flunk function to deliberately fail a test.

test "Testing large range" do
  flunk "This test has not yet been implemented"
end

When I finished the unit tests, they were all passing, so I had a working random integer stream. Now I needed some functionality to write the random integers in that stream to an integer file.

ElixirLargeSort.Shared

In past implementations of this project, I found that there was some common integer file functionality that was shared between IntGen and IntSort. I had no reason to doubt that this project implementation would be any different. I then had to figure out how to share code between Elixir projects without having to publish a package to hex.

Here's what I ended up doing. I created a separate project called ElixirLargeSort.Shared (located in the largesort_shared) directory that contains any functionality that will be shared between IntGen and IntSort. This is where I will put the generic integer file functionality. For the IntGen project, I wanted a function that will create a file stream for an integer file, and then keeping in the spirit of functional composition, I wanted a separate function that will take an enumerable containing integers (which could be another stream) and write it to a stream. The plan was to create the random integer stream, an integer file stream, and then write the integers in the random integer stream to the file stream.

I was able to link the two projects by adding LargeSort.Shared as a dependency in the IntGen mix.exs file and using the path: option to specify the directory of the dependency.

  defp deps do
    [
      {:largesort_shared, path: "../largesort_shared"},
    ]
  end

I also kept in mind that when I was doing unit testing, I want to mock out dependencies that produce side effects in order to make testing simpler. That requires decoupling the interface from the implementation and mocking out a test implementation during unit tests. In C#, I did that using interfaces and implementations and in Elixir, I do that using behaviours and behaviour implementations. If you want to review those subjects, I talked about unit tests and mocking in lwm 65 and I talked about behaviours in lwm 71.

So here's the behaviour in "lib/integer_file_behavior.ex", which I named LargeSort.Shared.IntegerFileBehavior.

defmodule LargeSort.Shared.IntegerFileBehavior do
  @callback create_integer_file_stream(String.t()) :: Enumerable.t()
  @callback write_integers_to_stream(Enumerable.t(), Collectable.t()) :: Enumerable.t()
end

That defines the integer file functionality using typespecs. Ideally, I want to specify a stream of integers in the typepsec, but typespecs cannot currently get that specific. I can define a list of integers in a typespec, but streams are enumerables, not lists. So the best I could do was to specify an enumerable (Enumerable.t()) for streams that will be read from and a collectable (Collectable.t()) for streams that will be written to. Streams are both enumerables and collectables, so I can use either typespec depending on the primary role of the stream.

Here's the IntegerFile module in "lib/integer_file.ex", which implements IntegerFileBehavior.

defmodule LargeSort.Shared.IntegerFile do
  alias LargeSort.Shared.IntegerFileBehavior
  @behaviour IntegerFileBehavior

  @moduledoc """
  Contains functionality for working with integer files
  """

  @doc """
  Creates a stream for an integer file that operates in line mode

  Any existing file will be overwritten.

  If something goes wrong when creating the file stream, this function
  will throw an exception.

  ## Parameters

   - path: the path of the file to be written to

  ## Returns

  A stream that can be used to read from or write to the file
  """
  @impl IntegerFileBehavior
  @spec create_integer_file_stream(String.t()) :: Enumerable.t()
  def create_integer_file_stream(path) do
    File.stream!(path, [:utf8], :line)
  end

  @doc """
  Writes an enumerable containing integers to a stream

  ## Parameters

  - enumerable: the enumerable whose integers are to be written to the file
  - out_stream: the stream to be written to. Actually, this doesn't necessarily
  have to be a stream. Any collectable will do.

  ## Returns

  A stream consisting of the write operations
  """
  @impl IntegerFileBehavior
  @spec write_integers_to_stream(Enumerable.t(), Collectable.t()) :: Enumerable.t()
  def write_integers_to_stream(enumerable, out_stream) do
    enumerable
    |> Stream.map(&Integer.to_string/1)
    |> Stream.map(&(&1 <> "\n"))
    |> Stream.into(out_stream)
  end
end

The alias statement at the top of the module allows me just write IntegerFileBehavior instead of having to include the full namespace. The @behaviour IntegerFileBehavior statement tells the module that it's implementing IntegerFileBehavior.

Again, most of the content of the module is documentation. The code is relatively simple and brief, which is my goal. Notice that when writing integers to a stream, which will mostly be an integer file stream, I need to convert each integer to a string and then add a newline character to the end of each integer string before I write it to the stream. That's necessary to write to a text file in the proper format.

I tested both of these functions using the unit tests in "test/integer_file_test.txt".

defmodule LargeSortShared.Test.IntegerFile do
  use ExUnit.Case
  doctest LargeSort.Shared.IntegerFile

  alias LargeSort.Shared.IntegerFile

  @test_integer_file_name "test_integer_file.txt"

  #Tests create_integer_file_stream
  describe "create_integer_file_stream -" do
    setup do
      on_exit(&delete_test_file/0)
    end

    test "Create integer file stream and write to it" do
      #Create the file stream and write some test data to the file stream
      test_data = 1..10
      file_stream = IntegerFile.create_integer_file_stream(@test_integer_file_name)

      file_stream
      |> write_data_to_stream(test_data)
      |> Stream.run()

      #Verify that the stream was created correctly by verifying the data that was
      #written to the file stream
      verify_integer_file(@test_integer_file_name, test_data)
    end

    test "Create integer file stream and read from it" do
      #Create the file stream and write some test data to the file stream
      test_data = 1..10
      file_stream = IntegerFile.create_integer_file_stream(@test_integer_file_name)

      file_stream
      |> write_data_to_stream(test_data)
      |> Stream.run()

      #Create a new file stream, which we will use to read from the file
      file_stream = IntegerFile.create_integer_file_stream(@test_integer_file_name)

      #Verify that the stream can read from the file correctly
      verify_integer_stream(file_stream, test_data)
    end
  end

  describe "write_integers_to_stream -" do
    setup do
      on_exit(&delete_test_file/0)
    end

    test "Write a small number of integers to a stream" do
      test_write_integers_to_stream(-100..100)
    end

    test "Write a large number of integers to a stream" do
      test_write_integers_to_stream(1..100_000)
    end

    test "Write a single integer to a stream" do
      test_write_integers_to_stream([0])
    end

    test "Write an empty enumerable to a stream" do
      test_write_integers_to_stream([])
    end

    defp test_write_integers_to_stream(test_data) do
      #Create the file stream
      file_stream = IntegerFile.create_integer_file_stream(@test_integer_file_name)

      #Write the test data to the file stream
      test_data
      |> IntegerFile.write_integers_to_stream(file_stream)
      |> Stream.run()

      #Verify that the data that was written to the file stream correctly
      verify_integer_file(@test_integer_file_name, test_data)
    end
  end

  #Deletes the test file
  defp delete_test_file() do
    File.rm!(@test_integer_file_name)
  end

  #Writes an enumerable containing integers to a stream
  defp write_data_to_stream(stream, data) do
    data
    |> Stream.map(&Integer.to_string/1)
    |> Stream.map(&(&1 <> "\n"))
    |> Stream.into(stream)
  end

  #Verifies that an integer file contains the expected contents
  defp verify_integer_file(path, expected_integers) do
    File.stream!(path, [:utf8], :line)
    |> verify_integer_stream(expected_integers)
  end

  #Verifies that a stream contains the expected contents
  defp verify_integer_stream(stream, expected_integers) do
    stream
    |> Stream.map(&String.trim/1)
    |> Stream.map(&String.to_integer/1)
    |> Stream.zip(expected_integers)
    |> Stream.each(&compare_integers/1)
    |> Stream.run()
  end

  #Compares the integers in a tuple to each other
  defp compare_integers({integer1, integer2}) do
    assert integer1 == integer2
  end
end

The typical test involves creating and writing to an integer file and then reading it again to verify that the function being tested worked correctly. The setup macro runs some code before each test. I inserted some code into setup to call the on_exit/2 function, which sets up a function to be called when each test is finished. I configured on_exit to delete the test file at the end of every test.

Creating the Integer File

Now that I have the generic integer file functionality implemented, I could create a function in the IntGen module combine the functions I implemented to write integers to an integer file. So I created a function to take in an integer stream and write the first N integers to an integer file.

Implementing the Function

I was aware that this function would need to call the integer file functions in the LargeSort.Shared function, which would give me two options for testing. The first option would be to just test the entire thing including the side effects of writing to the file. That option was non-ideal, as it greatly expands the complexity of testing. The second option was to start implementing the dependency injection and mocking that I discussed in lwm 67, which takes more work to setup, but reduces test complexity. I went with the second option. Not only is that the better way to unit test something with an interface to an outside system (the file system in this case), but I definitely need to practice what I preached. I needed some practical mocking experience with Elixir.

So the idea is that instead of calling the IntegerFile module directly, I call the same functions on the @integer_file placeholder. As I explained in lwm 67, this is a module attribute that is assigned a module name that differs depending on the environment. In a Dev or Prod scenario, @integer_file is assigned the IntegerFile module and it can call functions on that module. In a Test scenario, we can assign @integer_file to a mocked module that will perform whatever actions are needed for the test.

So here's what the @integer_file declaration looks like in the IntGen module.

@integer_file Application.get_env(:int_gen, :integer_file)

When the module is compiled (this happens at compile time, not runtime), the value is read from config.exs depending on which environment the application is being built for. This will allow us to specify IntegerFile when the application is being built for normal usage and a mock module when the application is being built for testing.

Now here's the actual function implementation.

  @doc """
  Creates an integer file that contains an integer on each line

  ## Parameters

  - path: the path of the file to be created. If the file already exists,
  it will be overwritten
  - num: the number of integers to written to the file. If there aren't
  enough integers in the random integer stream or enumerable to fulfill
  this number, then only the  max possible number of integers are written.
  - integers: A stream or enumerable containing the integers to be written
  """
  @spec create_integer_file(String.t(), non_neg_integer(), Enumerable.t()) :: :ok
  def create_integer_file(path, num, integers) do
    # Create the integer file stream
    file_stream = @integer_file.create_integer_file_stream(path)

    # Pipe N integers to the file stream
    integers
    |> Stream.take(num)
    |> @integer_file.write_integers_to_stream(file_stream)
    |> Stream.run()
  end

In "config/config.exs", I have a statement that imports an environment-specific configuration file. This is where I configure which module is used.

Here are the contents of "config/dev.exs", which contains the configuration for a Dev environment build.

use Mix.Config

config :int_gen, :integer_file, LargeSort.Shared.IntegerFile

This tells Elixir that the :integer_file configuration setting is set to LargeSort.Shared.IntegerFile in a Dev build.

The "config/prod.exs" configuration file contains the same thing for production builds.

Here are the contents of "config/test.exs", which is used for Test environment builds.

use Mix.Config

config :int_gen, :integer_file, IntGen.IntegerFileMock

Instead of :integer_file being assigned the IntegerFile module, it's assigned a mock module called IntegerFileMock, which will contain the mock implementation for use in unit tests.

The application is built for the Dev environment by running mix and it's built for the Test environment by running mix test. I don't yet know what does a Prod environment build, but expect I'll eventually find that out.

Another dependency injection option would have been to pass in the module name as a parameter, with the default value being IntegerFile. This would allow us to pass in a mock implementation during testing. That would look like this.

  def create_integer_file(path, num, integers, integer_file \\ IntegerFile) do
    # Create the integer file stream
    file_stream = integer_file.create_integer_file_stream(path)

    # Pipe N integers to the file stream
    integers
    |> Stream.take(num)
    |> integer_file.write_integers_to_stream(file_stream)
    |> Stream.run()
  end

I prefer the configuration-based approach better because it reduces the number of parameters in the function and I don't have to think about it most of the time.

Unit Testing the Function

Now for unit testing the function. I'm using the mox library to mock the IntegerFile module. In order to use mox, I had to add it as a test-only dependency in mix.exs in the IntGen project.

defp deps do
  [
    {:largesort_shared, path: "../largesort_shared"},
    {:mox, "~> 0.5.1", only: [:test]}
  ]
end

I used the only: attribute to indicate that this dependency is only used when testing.

Now that I can use the mox library to mock the IntegerFile module, I'll add some code to "test/test_helper.exs" to define the mock module before running the tests.

Mox.defmock(IntGen.IntegerFileMock, for: LargeSort.Shared.IntegerFileBehavior)

ExUnit.start()

The call to defmock tells mox to create a mock module called IntGen.IntegerFileMock that implements the LargeSort.Shared.IntegerFileBehavior behaviour. This will cause a module to be generated with some default function implementations. The documentation does not state what these default implementations do, so I dug into the Mox code. I discovered that by default, if you don't do anything else, the functions in the mock module will throw an error. You have to provide some code for those mock functions if you want them to be able to be called.

So now I have a mock module that doesn't do anything. When I write the unit tests, I'll provide an implementation for some of these mock functions depending on what the test is doing.

Here is the code for the unit tests for create_integer_file.

describe "create_integer_file -" do
  @test_file "test_integer_file.txt"
  @small_num_integers 100
  @large_num_integers 10000

  test "Create a file with a small number of random integers" do
    integer_range = -10..10

    random_stream = IntGen.random_integer_stream(integer_range)

    test_integer_file_with_random_stream(
      @test_file,
      @small_num_integers,
      random_stream,
      integer_range
    )
  end

  test "Create a file with a large number of random integers" do
    integer_range = -1000..1000

    random_stream = IntGen.random_integer_stream(integer_range)

    test_integer_file_with_random_stream(
      @test_file,
      @large_num_integers,
      random_stream,
      integer_range
    )
  end

  test "Create a file with positive integers" do
    integers = [3, 12, 4, 2, 32, 128, 12, 8]

    test_integer_file_with_specific_integers(integers, length(integers))
  end

  test "Create a file with negative integers" do
    integers = [-13, -1, -4, -23, -83, -3, -43, -8]

    test_integer_file_with_specific_integers(integers, length(integers))
  end

  test "Create a file with positive and negative integers" do
    integers = [332, -1, 4, 18, -23, 1345, 0, -83, -3, -43, 19, -8, 2]

    test_integer_file_with_specific_integers(integers, length(integers))
  end

  test "Create a file using a subset of a list of integers" do
    integers = [332, -1, 4, 18, -23, 1345, 0, -83, -3, -43, 19, -8, 2]

    test_integer_file_with_specific_integers(integers, 6)
  end

  test "Create a file with a single integer" do
    integers = [5]

    test_integer_file_with_specific_integers(integers, length(integers))
  end

  test "Create a file with zero random integers" do
    integers = []

    test_integer_file_with_specific_integers(integers, length(integers))
  end

  # Tests creating an integers file with a specific list of integers
  @spec test_integer_file_with_specific_integers(list(integer()), integer()) :: :ok
  defp test_integer_file_with_specific_integers(integers, count) do
    result = test_integer_file(@test_file, count, integers, &verify_written_integers/2)

    assert result == :ok

    :ok
  end

  # Test creating an integer file with a random stream
  @spec test_integer_file_with_random_stream(
          String.t(),
          integer(),
          Enumerable.t(),
          Range.t()
        ) :: :ok
  defp test_integer_file_with_random_stream(
         path,
         num_of_integers,
         random_stream,
         integer_range
       ) do
    verify_integers = fn _, written_data ->
      verify_written_integers_range(num_of_integers, integer_range, written_data)
    end

    result = test_integer_file(path, num_of_integers, random_stream, verify_integers)

    assert result == :ok

    :ok
  end

  # Tests creating an integer file
  defp test_integer_file(path, num_of_integers, integers, verify) do
    # Create the test stream
    {test_device, test_stream} = create_test_stream()

    # Setup the IntegerFile mock
    IntGen.IntegerFileMock
    |> expect(
      :create_integer_file_stream,
      fn actual_path ->
        verify_create_file_stream(path, actual_path, test_stream)
      end
    )
    |> expect(
      :write_integers_to_stream,
      fn enumerable, out_stream ->
        verify_write_integers_to_stream(enumerable, test_stream, out_stream)
      end
    )

    # Call the test method and verify the results
    result = IntGen.create_integer_file(path, num_of_integers, integers)

    assert result == :ok

    # Close the test stream and get the data that was written to it
    {:ok, {_, written_data}} = StringIO.close(test_device)

    # Call the verification method
    verify.(Enum.take(integers, num_of_integers), written_data)
  end

  # Verifies the create file stream parameters and returns the test stream
  defp verify_create_file_stream(expected_path, actual_path, test_stream) do
    assert expected_path == actual_path

    test_stream
  end

  # Verifies the write_integers_to_stream parameters and write space-separated
  # integers to the output stream. Since the integers may be part of a stream
  # (particularly a random integer stream), we won't verify the integers
  # now. We'll just write them to the test stream.
  defp verify_write_integers_to_stream(
         integers,
         expected_stream,
         actual_stream
       ) do
    assert expected_stream == actual_stream

    # Separate each integer with a space
    integers
    |> Stream.map(fn integer -> Integer.to_string(integer) <> " " end)
    |> Stream.into(actual_stream)
  end

  # Verifies that the integers were written correctly
  defp verify_written_integers(integers, written_data) do
    written_integers = stream_data_to_integers(written_data)

    # Verify that the exact integer order matches
    written_integers
    |> Enum.zip(integers)
    |> Enum.each(fn {integer1, integer2} -> assert integer1 == integer2 end)

    # Assert that the number of integers is as expected
    assert Enum.count(integers) == Enum.count(written_integers)

    :ok
  end

  # Verifies that the correct number of integers within a certain range
  # were written
  defp verify_written_integers_range(expected_count, integer_range, written_data) do
    written_integers = stream_data_to_integers(written_data)

    # Assert that the number of integers is as expected
    assert Enum.count(written_integers) == expected_count

    # Assert that each integer is in the expected range
    Enum.each(written_integers, fn integer -> assert integer in integer_range end)
  end

  # Converts integer stream data to an enumerable containing integers
  defp stream_data_to_integers(data) do
    data
    |> String.trim()
    |> String.split(" ")
    |> Stream.reject(fn line -> line == "" end)
    |> Enum.map(&String.to_integer/1)
  end

  # Creates a test stream that reads from and writes to a string I/O device
  # Returns both the stream and the device that it wraps so that the contents
  # of the device can be read later on
  @spec create_test_stream() :: {:ok, Enumerable.t()}
  defp create_test_stream() do
    # Open a string I/O device
    {:ok, device} = StringIO.open("")

    # Turn the string I/O device into a text line stream
    {device, IO.stream(device, :line)}
  end
end

This is a lot of code, so I'm going to go through it and describe what I'm doing. Like the previous unit tests, I created some test functions, with each test passing in different parameters. The test_integer_file_with_random_stream/4 function tests creating an integer file with a stream of random numbers and the test_integer_file_with_specific_integers/2 function tests creating an integer file with a specific list of integers. The reason for doing both is that I wanted to test writing a file with random integers (since that is what this project will be doing), but the quality of the test for random integers is lower than the test for a specific list of integers.

The reason is that it's impossible to read from the random integer stream in advance because doing so will alter the random numbers that it generates. The numbers are random, so reading ahead would give us different numbers than would be written to the file. So for the random number test, I just verify that the correct number of integers were written and that they were all within the expected range. For the specific integer test, I can verify that the expected sequence of numbers were written, which is a much more specific and thorough test. Testing both improves my confidence that the function being tested works as expected.

The core test function, test_integer_file/4 is called by both types of tests. This is where the core of the test logic resides.

# Tests creating an integer file
defp test_integer_file(path, num_of_integers, integers, verify) do
  # Create the test stream
  {test_device, test_stream} = create_test_stream()

  # Setup the IntegerFile mock
  IntGen.IntegerFileMock
  |> expect(
    :create_integer_file_stream,
    fn actual_path ->
      verify_create_file_stream(path, actual_path, test_stream)
    end
  )
  |> expect(
    :write_integers_to_stream,
    fn enumerable, out_stream ->
      verify_write_integers_to_stream(enumerable, test_stream, out_stream)
    end
  )

  # Call the test method and verify the results
  result = IntGen.create_integer_file(path, num_of_integers, integers)

  assert result == :ok

  # Close the test stream and get the data that was written to it
  {:ok, {_, written_data}} = StringIO.close(test_device)

  # Call the verification method
  verify.(Enum.take(integers, num_of_integers), written_data)
end

First a test stream is created. Since we are mocking the file interface, the integers won't really be written to a file. Instead, all the integers will be written to this test stream, which pretends to be a file stream. That way I can examine what was written to the file without having an actual dependency on a file system.

Then we mock the functions in the IntegerFileMock module. Both file functions are called by create_integer_file, so I mock both. With mox, we mock a function using expect/3. The first parameter is the function to be mocked, which we specify as an atom. Apparently, each function in Elixir can be represented as an atom. The second parameter is the number of times we expect the mocked function to be called; if it is called more or fewer times than expected, an error will be generated. This is an optional parameter with a default of 1. Since I expected these mocked functions to only be called once, I did not specify the second parameter. The third parameter is a function that provides the code for the mocked function. When the mocked function is called during the unit test, it is this code that is run.

Here's a diagram showing how the function calls happen during normal code execution and how the function calls happen during unit tests. Keep in mind that I am testing create_integer_file and not the file system functionality, so I mock the file system functionality to reduce testing complexity as much as possible.

Normal Code Execution:

Unit Test Execution:

In the methods I mocked, I verify that the parameters being passed to the mocked method are correct and I return a response. The mocked create_integer_file_stream function just returns the test stream, but the write_integers_to_stream function was a bit more involved. In this situation, I needed to be able to analyze the final contents of the data that was written to the stream. This ideally involves an in-memory collection that I can access later. Due to the nature of immutable data in Elixir, I found great difficulty in doing this until I stumbled across the StringIO module in Elixir.

The functions in StringIO can create an in-memory device that deals with string data. It's well-suited for use in mocked functions in unit tests. You can give it data to be read and you can write data to it as well. This makes it very nice to use for substituting for a real device. The IO.stream/2 and IO.binstream/2 functions allow us to convert the StringIO device to a stream. When we close the StringIO device, we can obtain the data that was read from it and written to it, which we can then look at to verify that the function we're testing is working correctly. I didn't know about the StringIO functionality when I first started writing unit tests and I had to stumble around to figure out how to record the integers being written to the file. In a language with mutable data, I would have just written each integer to an outside collection, but since Elixir has immutable data, that is more difficult to pull of. In the end StringIO turned out to be the best solution to this problem.

In my unit tests I make use of a stream that was created from a StringIO device using the IO.stream/2 function to convert a device into a stream.

Once the test was finish, I read the contents of the StringIO device and verified that the correct data had been written to it.

Formatting

I discovered that running "mix format" on a project will run the code through a standardized formatter. So I ran "mix format" on the project, so the code formatting has been standardized. It left most of my code as it was, but it made some formatting choices that were different than what I had chosen. I had no objections, so I kept the formatting changes.

Using this tool as part of an automatic commit process would be a good way to standardize code in a project. That's something I'll have to look into later.

Giving IntGen a Command Line Interface

I now have IntGen working nicely and its unit tests are passing. There's still one more thing to do, and that's connecting the IntGen functionality to a command line interface layer. That layer will translate command line arguments into function calls and handle whatever command line stuff that needs to be handled. In the next post, I'll discuss the CLI layer and go over the code that it consists of.