I've learned enough now to start working on the "ElixirLargeSort" project. This project is a fun exercise in sorting a very large number of integers while keeping only a subset of integers in memory at any particular time, and I've been using it as a way to practice or improve my skills in various programming languages.
I originally saw this listed somewhere as an interview question: "How do you sort a large number of integers without being able to load all of them into memory at once?" I quickly realized that the solution would be to sort chunks of integers and write them to files. Then I would merge multiple sorted integer files in to a single sorted integer file, solving the problem without ever having them all in memory at once.
I was learning Node.js at the time, so I wanted to go and implement the solution in Node.js to get some experience using it. The Node.js project is called NodeLargeSort, which is on my GitHub page. It also uses Bacon.js, a functional reactive programming library that is very much like working with streams in Elixir.
Later on I implemented the solution in C# in the LargeSortCSharp project, also on my Github page. I already knew C# really well, but I thought it would be interesting to get to know .NET Core better, so I implemented the soution using .NET Core 2.1. It was an interesting exercise and I was able to compare it to the Node.js solution.
Now it's time to implement this solution in Elixir. So I created a project on my Github page called ElixirLargeSort. This project will accomplish the same thing that the other two projects did, but using Elixir. Like the other projects, I plan to create a full suite of unit tests to test the parts of the code.
Like the other projects, there are two runnable programs in ElixirLargeSort: IntGen and IntSort. IntGen is the simpler of the two: its job is to create an integer file of randomly-generated integers. This provides us a collection of random integers to sort in chunks. The IntSort program will then read in the randomly-generated integers and provide the sorting functionality, spitting out a final sorted integer file.
When I talk about "integer file" in the context of this project, I mean a text file with an integer on each line. Having each integer on its own line makes it easy for code to read the file and easy to people to view the file in a text editor. The line numbers in the editor can easily tell us how many integers are in an integer file.
In this post, I'm going to walk you through the process of creating the IntGen tool. We'll work on IntSort later on.
Getting Started
First, I created the IntGen project (known to Elixir by the :int_gen
atom) by running mix new int_gen
. That created a nice project for me. Since Elixir is a functional language, I'm trying to create simpler, more composable functions, so I started work on a function that creates a random integer stream. Anyone who's been reading along knows that I love streams, so I'm going to try to use streams as much as I can (the Node.js implementation used Bacon.js streams).
So I created a module called IntGen
, which will contain the main functions for the IntGen application. After I'm finished with this module, I'll put an interface layer (to the command line) on top of it. Back in lwm 43, where I talked about streams, I had already implemented a stream that generates random integers. So I'll just use that code here as well.
Here's the initial code for the IntGen module.
defmodule IntGen do
@moduledoc """
Contains the functionality for creating a file of random integers
"""
@doc """
Creates a stream that generates an endless number of random integers
from min_value to max_value (inclusive)
## Parameters
- integer_range: the range of integers to be generated
## Returns
A stream that will generate an endless number of random integers
"""
@spec random_integer_stream(Range.t()) :: Enumerable.t()
def random_integer_stream(integer_range) do
Stream.repeatedly(fn -> Enum.random(min_value..max_value) end)
end
end
Most of the contents of this module are actually documentation. The code is quite small. The random_integer_stream/1
function takes a range that defines the range of the random integers to be generated and then returns a stream (created using Stream.repeatedly/1
) that will generate those random integers. Note that this is an infinite stream, so it will generate as many random numbers as we need.
Once I finished with that code, I compiled it by running mix
on the command line. Once it compiled, I loaded the application into iex by running iex -S mix
. Then I tested the function manually to see how it works, piping the resulting stream into a list to see the randomly-generated integers.
> iex -S mix
Interactive Elixir (1.8.0) - press Ctrl+C to exit (type h() ENTER for help)
iex> IntGen.random_integer_stream(1..100)
#Function<54.117072283/2 in Stream.repeatedly/1>
iex> IntGen.random_integer_stream(1..100) |> Enum.take(10)
[3, 77, 31, 3, 73, 96, 91, 45, 3, 21]
iex> IntGen.random_integer_stream(-8..23) |> Enum.take(10)
[10, 22, 18, 21, 0, 10, 19, -7, -1, 2]
iex> IntGen.random_integer_stream(-10000..10000) |> Enum.take(10)
[6839, 2411, 8535, -1875, 8413, 3844, 2333, -2002, -3911, -2953]
It's important to use Enum.take/1
here because the stream is infinite. The first time I did this, I mistakenly used Enum.to_list/1
and was then puzzled when it never finished, until I realized what I had done.
Manual tests appeared to be fine, so the next step was to write some unit tests for the function. Here are the tests in "test/in_gen_test.exs".
defmodule IntGenTest do
use ExUnit.Case
doctest IntGen
@num_stream_elements 1000
describe "random_integer_stream -" do
test "Generates integers" do
random_stream = IntGen.random_integer_stream(1, 100)
random_stream
|> Enum.take(@num_stream_elements)
|> test_integers()
end
test "Testing range with only positive numbers" do
test_range(1, 100)
end
test "Testing range with positive and negative numbers" do
test_range(-10, 10)
end
test "Testing range with negative numbers" do
test_range(-87, -12)
end
test "Testing range that starts with 0" do
test_range(0, 23)
end
test "Testing range that ends with 0" do
test_range(-145, 0)
end
test "Testing range of size 2" do
test_range(4, 5)
end
test "Testing range of size 1" do
test_range(12, 12)
end
test "Testing range 0..0" do
test_range(0, 0)
end
test "Testing descending range" do
test_range(10, -2)
end
test "Testing large range" do
test_range(-1_000_000_000, 1_000_000_000)
end
defp test_integers(enumerable) do
Enum.each(enumerable, fn element -> assert is_integer(element) end)
end
defp test_range(min_value, max_value) do
random_stream = IntGen.random_integer_stream(min_value, max_value)
expected_range = min_value..max_value
random_stream
|> Enum.take(@num_stream_elements)
|> Enum.each(fn integer -> integer in expected_range end)
end
end
end
I grouped all the tests for this function using describe
. I tested that the stream only generates integers (and not some other type of data) and then tried testing with different ranges of values to verify that the stream still functions as expected. While writing these tests, I saw that they would mostly be the same, with the only variation being the range of integers that were being generated. So I created a function that contains the logic for all the tests, with the test variables passed in as parameters. So all I had to do for each test is to call test_range/2
, passing the range I want to test. I typically structure my unit tests this way when I see a chance for reusability. It's a lot more maintainable and readable than copying and pasting code between the tests.
As I was writing the tests, I wanted to see the tests start off failing and go from failing to passing as I wrote the tests. In the past when I failed to do this, I occasionally found that I forgot to implement some tests, since they we're all passing due to not containing any code. I call these sort of tests false positives, so I like to insert something that deliberately causes them to fail until they are implemented.
In Elixir, I used the flunk
function to deliberately fail a test.
test "Testing large range" do
flunk "This test has not yet been implemented"
end
When I finished the unit tests, they were all passing, so I had a working random integer stream. Now I needed some functionality to write the random integers in that stream to an integer file.
ElixirLargeSort.Shared
In past implementations of this project, I found that there was some common integer file functionality that was shared between IntGen and IntSort. I had no reason to doubt that this project implementation would be any different. I then had to figure out how to share code between Elixir projects without having to publish a package to hex.
Here's what I ended up doing. I created a separate project called ElixirLargeSort.Shared (located in the largesort_shared) directory that contains any functionality that will be shared between IntGen and IntSort. This is where I will put the generic integer file functionality. For the IntGen project, I wanted a function that will create a file stream for an integer file, and then keeping in the spirit of functional composition, I wanted a separate function that will take an enumerable containing integers (which could be another stream) and write it to a stream. The plan was to create the random integer stream, an integer file stream, and then write the integers in the random integer stream to the file stream.
I was able to link the two projects by adding LargeSort.Shared as a dependency in the IntGen mix.exs file and using the path:
option to specify the directory of the dependency.
defp deps do
[
{:largesort_shared, path: "../largesort_shared"},
]
end
I also kept in mind that when I was doing unit testing, I want to mock out dependencies that produce side effects in order to make testing simpler. That requires decoupling the interface from the implementation and mocking out a test implementation during unit tests. In C#, I did that using interfaces and implementations and in Elixir, I do that using behaviours and behaviour implementations. If you want to review those subjects, I talked about unit tests and mocking in lwm 65 and I talked about behaviours in lwm 71.
So here's the behaviour in "lib/integer_file_behavior.ex", which I named LargeSort.Shared.IntegerFileBehavior
.
defmodule LargeSort.Shared.IntegerFileBehavior do
@callback create_integer_file_stream(String.t()) :: Enumerable.t()
@callback write_integers_to_stream(Enumerable.t(), Collectable.t()) :: Enumerable.t()
end
That defines the integer file functionality using typespecs. Ideally, I want to specify a stream of integers in the typepsec, but typespecs cannot currently get that specific. I can define a list of integers in a typespec, but streams are enumerables, not lists. So the best I could do was to specify an enumerable (Enumerable.t()) for streams that will be read from and a collectable (Collectable.t()) for streams that will be written to. Streams are both enumerables and collectables, so I can use either typespec depending on the primary role of the stream.
Here's the IntegerFile
module in "lib/integer_file.ex", which implements IntegerFileBehavior
.
defmodule LargeSort.Shared.IntegerFile do
alias LargeSort.Shared.IntegerFileBehavior
@behaviour IntegerFileBehavior
@moduledoc """
Contains functionality for working with integer files
"""
@doc """
Creates a stream for an integer file that operates in line mode
Any existing file will be overwritten.
If something goes wrong when creating the file stream, this function
will throw an exception.
## Parameters
- path: the path of the file to be written to
## Returns
A stream that can be used to read from or write to the file
"""
@impl IntegerFileBehavior
@spec create_integer_file_stream(String.t()) :: Enumerable.t()
def create_integer_file_stream(path) do
File.stream!(path, [:utf8], :line)
end
@doc """
Writes an enumerable containing integers to a stream
## Parameters
- enumerable: the enumerable whose integers are to be written to the file
- out_stream: the stream to be written to. Actually, this doesn't necessarily
have to be a stream. Any collectable will do.
## Returns
A stream consisting of the write operations
"""
@impl IntegerFileBehavior
@spec write_integers_to_stream(Enumerable.t(), Collectable.t()) :: Enumerable.t()
def write_integers_to_stream(enumerable, out_stream) do
enumerable
|> Stream.map(&Integer.to_string/1)
|> Stream.map(&(&1 <> "\n"))
|> Stream.into(out_stream)
end
end
The alias
statement at the top of the module allows me just write IntegerFileBehavior
instead of having to include the full namespace. The @behaviour IntegerFileBehavior
statement tells the module that it's implementing IntegerFileBehavior
.
Again, most of the content of the module is documentation. The code is relatively simple and brief, which is my goal. Notice that when writing integers to a stream, which will mostly be an integer file stream, I need to convert each integer to a string and then add a newline character to the end of each integer string before I write it to the stream. That's necessary to write to a text file in the proper format.
I tested both of these functions using the unit tests in "test/integer_file_test.txt".
defmodule LargeSortShared.Test.IntegerFile do
use ExUnit.Case
doctest LargeSort.Shared.IntegerFile
alias LargeSort.Shared.IntegerFile
@test_integer_file_name "test_integer_file.txt"
#Tests create_integer_file_stream
describe "create_integer_file_stream -" do
setup do
on_exit(&delete_test_file/0)
end
test "Create integer file stream and write to it" do
#Create the file stream and write some test data to the file stream
test_data = 1..10
file_stream = IntegerFile.create_integer_file_stream(@test_integer_file_name)
file_stream
|> write_data_to_stream(test_data)
|> Stream.run()
#Verify that the stream was created correctly by verifying the data that was
#written to the file stream
verify_integer_file(@test_integer_file_name, test_data)
end
test "Create integer file stream and read from it" do
#Create the file stream and write some test data to the file stream
test_data = 1..10
file_stream = IntegerFile.create_integer_file_stream(@test_integer_file_name)
file_stream
|> write_data_to_stream(test_data)
|> Stream.run()
#Create a new file stream, which we will use to read from the file
file_stream = IntegerFile.create_integer_file_stream(@test_integer_file_name)
#Verify that the stream can read from the file correctly
verify_integer_stream(file_stream, test_data)
end
end
describe "write_integers_to_stream -" do
setup do
on_exit(&delete_test_file/0)
end
test "Write a small number of integers to a stream" do
test_write_integers_to_stream(-100..100)
end
test "Write a large number of integers to a stream" do
test_write_integers_to_stream(1..100_000)
end
test "Write a single integer to a stream" do
test_write_integers_to_stream([0])
end
test "Write an empty enumerable to a stream" do
test_write_integers_to_stream([])
end
defp test_write_integers_to_stream(test_data) do
#Create the file stream
file_stream = IntegerFile.create_integer_file_stream(@test_integer_file_name)
#Write the test data to the file stream
test_data
|> IntegerFile.write_integers_to_stream(file_stream)
|> Stream.run()
#Verify that the data that was written to the file stream correctly
verify_integer_file(@test_integer_file_name, test_data)
end
end
#Deletes the test file
defp delete_test_file() do
File.rm!(@test_integer_file_name)
end
#Writes an enumerable containing integers to a stream
defp write_data_to_stream(stream, data) do
data
|> Stream.map(&Integer.to_string/1)
|> Stream.map(&(&1 <> "\n"))
|> Stream.into(stream)
end
#Verifies that an integer file contains the expected contents
defp verify_integer_file(path, expected_integers) do
File.stream!(path, [:utf8], :line)
|> verify_integer_stream(expected_integers)
end
#Verifies that a stream contains the expected contents
defp verify_integer_stream(stream, expected_integers) do
stream
|> Stream.map(&String.trim/1)
|> Stream.map(&String.to_integer/1)
|> Stream.zip(expected_integers)
|> Stream.each(&compare_integers/1)
|> Stream.run()
end
#Compares the integers in a tuple to each other
defp compare_integers({integer1, integer2}) do
assert integer1 == integer2
end
end
The typical test involves creating and writing to an integer file and then reading it again to verify that the function being tested worked correctly. The setup
macro runs some code before each test. I inserted some code into setup
to call the on_exit/2
function, which sets up a function to be called when each test is finished. I configured on_exit
to delete the test file at the end of every test.
Creating the Integer File
Now that I have the generic integer file functionality implemented, I could create a function in the IntGen module combine the functions I implemented to write integers to an integer file. So I created a function to take in an integer stream and write the first N integers to an integer file.
Implementing the Function
I was aware that this function would need to call the integer file functions in the LargeSort.Shared function, which would give me two options for testing. The first option would be to just test the entire thing including the side effects of writing to the file. That option was non-ideal, as it greatly expands the complexity of testing. The second option was to start implementing the dependency injection and mocking that I discussed in lwm 67, which takes more work to setup, but reduces test complexity. I went with the second option. Not only is that the better way to unit test something with an interface to an outside system (the file system in this case), but I definitely need to practice what I preached. I needed some practical mocking experience with Elixir.
So the idea is that instead of calling the IntegerFile
module directly, I call the same functions on the @integer_file
placeholder. As I explained in lwm 67, this is a module attribute that is assigned a module name that differs depending on the environment. In a Dev or Prod scenario, @integer_file
is assigned the IntegerFile
module and it can call functions on that module. In a Test scenario, we can assign @integer_file
to a mocked module that will perform whatever actions are needed for the test.
So here's what the @integer_file
declaration looks like in the IntGen
module.
@integer_file Application.get_env(:int_gen, :integer_file)
When the module is compiled (this happens at compile time, not runtime), the value is read from config.exs depending on which environment the application is being built for. This will allow us to specify IntegerFile
when the application is being built for normal usage and a mock module when the application is being built for testing.
Now here's the actual function implementation.
@doc """
Creates an integer file that contains an integer on each line
## Parameters
- path: the path of the file to be created. If the file already exists,
it will be overwritten
- num: the number of integers to written to the file. If there aren't
enough integers in the random integer stream or enumerable to fulfill
this number, then only the max possible number of integers are written.
- integers: A stream or enumerable containing the integers to be written
"""
@spec create_integer_file(String.t(), non_neg_integer(), Enumerable.t()) :: :ok
def create_integer_file(path, num, integers) do
# Create the integer file stream
file_stream = @integer_file.create_integer_file_stream(path)
# Pipe N integers to the file stream
integers
|> Stream.take(num)
|> @integer_file.write_integers_to_stream(file_stream)
|> Stream.run()
end
In "config/config.exs", I have a statement that imports an environment-specific configuration file. This is where I configure which module is used.
Here are the contents of "config/dev.exs", which contains the configuration for a Dev environment build.
use Mix.Config
config :int_gen, :integer_file, LargeSort.Shared.IntegerFile
This tells Elixir that the :integer_file
configuration setting is set to LargeSort.Shared.IntegerFile
in a Dev build.
The "config/prod.exs" configuration file contains the same thing for production builds.
Here are the contents of "config/test.exs", which is used for Test environment builds.
use Mix.Config
config :int_gen, :integer_file, IntGen.IntegerFileMock
Instead of :integer_file
being assigned the IntegerFile
module, it's assigned a mock module called IntegerFileMock
, which will contain the mock implementation for use in unit tests.
The application is built for the Dev environment by running mix
and it's built for the Test environment by running mix test
. I don't yet know what does a Prod environment build, but expect I'll eventually find that out.
Another dependency injection option would have been to pass in the module name as a parameter, with the default value being IntegerFile
. This would allow us to pass in a mock implementation during testing. That would look like this.
def create_integer_file(path, num, integers, integer_file \\ IntegerFile) do
# Create the integer file stream
file_stream = integer_file.create_integer_file_stream(path)
# Pipe N integers to the file stream
integers
|> Stream.take(num)
|> integer_file.write_integers_to_stream(file_stream)
|> Stream.run()
end
I prefer the configuration-based approach better because it reduces the number of parameters in the function and I don't have to think about it most of the time.
Unit Testing the Function
Now for unit testing the function. I'm using the mox library to mock the IntegerFile
module. In order to use mox, I had to add it as a test-only dependency in mix.exs in the IntGen project.
defp deps do
[
{:largesort_shared, path: "../largesort_shared"},
{:mox, "~> 0.5.1", only: [:test]}
]
end
I used the only:
attribute to indicate that this dependency is only used when testing.
Now that I can use the mox library to mock the IntegerFile
module, I'll add some code to "test/test_helper.exs" to define the mock module before running the tests.
Mox.defmock(IntGen.IntegerFileMock, for: LargeSort.Shared.IntegerFileBehavior)
ExUnit.start()
The call to defmock
tells mox to create a mock module called IntGen.IntegerFileMock
that implements the LargeSort.Shared.IntegerFileBehavior
behaviour. This will cause a module to be generated with some default function implementations. The documentation does not state what these default implementations do, so I dug into the Mox code. I discovered that by default, if you don't do anything else, the functions in the mock module will throw an error. You have to provide some code for those mock functions if you want them to be able to be called.
So now I have a mock module that doesn't do anything. When I write the unit tests, I'll provide an implementation for some of these mock functions depending on what the test is doing.
Here is the code for the unit tests for create_integer_file
.
describe "create_integer_file -" do
@test_file "test_integer_file.txt"
@small_num_integers 100
@large_num_integers 10000
test "Create a file with a small number of random integers" do
integer_range = -10..10
random_stream = IntGen.random_integer_stream(integer_range)
test_integer_file_with_random_stream(
@test_file,
@small_num_integers,
random_stream,
integer_range
)
end
test "Create a file with a large number of random integers" do
integer_range = -1000..1000
random_stream = IntGen.random_integer_stream(integer_range)
test_integer_file_with_random_stream(
@test_file,
@large_num_integers,
random_stream,
integer_range
)
end
test "Create a file with positive integers" do
integers = [3, 12, 4, 2, 32, 128, 12, 8]
test_integer_file_with_specific_integers(integers, length(integers))
end
test "Create a file with negative integers" do
integers = [-13, -1, -4, -23, -83, -3, -43, -8]
test_integer_file_with_specific_integers(integers, length(integers))
end
test "Create a file with positive and negative integers" do
integers = [332, -1, 4, 18, -23, 1345, 0, -83, -3, -43, 19, -8, 2]
test_integer_file_with_specific_integers(integers, length(integers))
end
test "Create a file using a subset of a list of integers" do
integers = [332, -1, 4, 18, -23, 1345, 0, -83, -3, -43, 19, -8, 2]
test_integer_file_with_specific_integers(integers, 6)
end
test "Create a file with a single integer" do
integers = [5]
test_integer_file_with_specific_integers(integers, length(integers))
end
test "Create a file with zero random integers" do
integers = []
test_integer_file_with_specific_integers(integers, length(integers))
end
# Tests creating an integers file with a specific list of integers
@spec test_integer_file_with_specific_integers(list(integer()), integer()) :: :ok
defp test_integer_file_with_specific_integers(integers, count) do
result = test_integer_file(@test_file, count, integers, &verify_written_integers/2)
assert result == :ok
:ok
end
# Test creating an integer file with a random stream
@spec test_integer_file_with_random_stream(
String.t(),
integer(),
Enumerable.t(),
Range.t()
) :: :ok
defp test_integer_file_with_random_stream(
path,
num_of_integers,
random_stream,
integer_range
) do
verify_integers = fn _, written_data ->
verify_written_integers_range(num_of_integers, integer_range, written_data)
end
result = test_integer_file(path, num_of_integers, random_stream, verify_integers)
assert result == :ok
:ok
end
# Tests creating an integer file
defp test_integer_file(path, num_of_integers, integers, verify) do
# Create the test stream
{test_device, test_stream} = create_test_stream()
# Setup the IntegerFile mock
IntGen.IntegerFileMock
|> expect(
:create_integer_file_stream,
fn actual_path ->
verify_create_file_stream(path, actual_path, test_stream)
end
)
|> expect(
:write_integers_to_stream,
fn enumerable, out_stream ->
verify_write_integers_to_stream(enumerable, test_stream, out_stream)
end
)
# Call the test method and verify the results
result = IntGen.create_integer_file(path, num_of_integers, integers)
assert result == :ok
# Close the test stream and get the data that was written to it
{:ok, {_, written_data}} = StringIO.close(test_device)
# Call the verification method
verify.(Enum.take(integers, num_of_integers), written_data)
end
# Verifies the create file stream parameters and returns the test stream
defp verify_create_file_stream(expected_path, actual_path, test_stream) do
assert expected_path == actual_path
test_stream
end
# Verifies the write_integers_to_stream parameters and write space-separated
# integers to the output stream. Since the integers may be part of a stream
# (particularly a random integer stream), we won't verify the integers
# now. We'll just write them to the test stream.
defp verify_write_integers_to_stream(
integers,
expected_stream,
actual_stream
) do
assert expected_stream == actual_stream
# Separate each integer with a space
integers
|> Stream.map(fn integer -> Integer.to_string(integer) <> " " end)
|> Stream.into(actual_stream)
end
# Verifies that the integers were written correctly
defp verify_written_integers(integers, written_data) do
written_integers = stream_data_to_integers(written_data)
# Verify that the exact integer order matches
written_integers
|> Enum.zip(integers)
|> Enum.each(fn {integer1, integer2} -> assert integer1 == integer2 end)
# Assert that the number of integers is as expected
assert Enum.count(integers) == Enum.count(written_integers)
:ok
end
# Verifies that the correct number of integers within a certain range
# were written
defp verify_written_integers_range(expected_count, integer_range, written_data) do
written_integers = stream_data_to_integers(written_data)
# Assert that the number of integers is as expected
assert Enum.count(written_integers) == expected_count
# Assert that each integer is in the expected range
Enum.each(written_integers, fn integer -> assert integer in integer_range end)
end
# Converts integer stream data to an enumerable containing integers
defp stream_data_to_integers(data) do
data
|> String.trim()
|> String.split(" ")
|> Stream.reject(fn line -> line == "" end)
|> Enum.map(&String.to_integer/1)
end
# Creates a test stream that reads from and writes to a string I/O device
# Returns both the stream and the device that it wraps so that the contents
# of the device can be read later on
@spec create_test_stream() :: {:ok, Enumerable.t()}
defp create_test_stream() do
# Open a string I/O device
{:ok, device} = StringIO.open("")
# Turn the string I/O device into a text line stream
{device, IO.stream(device, :line)}
end
end
This is a lot of code, so I'm going to go through it and describe what I'm doing. Like the previous unit tests, I created some test functions, with each test passing in different parameters. The test_integer_file_with_random_stream/4
function tests creating an integer file with a stream of random numbers and the test_integer_file_with_specific_integers/2
function tests creating an integer file with a specific list of integers. The reason for doing both is that I wanted to test writing a file with random integers (since that is what this project will be doing), but the quality of the test for random integers is lower than the test for a specific list of integers.
The reason is that it's impossible to read from the random integer stream in advance because doing so will alter the random numbers that it generates. The numbers are random, so reading ahead would give us different numbers than would be written to the file. So for the random number test, I just verify that the correct number of integers were written and that they were all within the expected range. For the specific integer test, I can verify that the expected sequence of numbers were written, which is a much more specific and thorough test. Testing both improves my confidence that the function being tested works as expected.
The core test function, test_integer_file/4
is called by both types of tests. This is where the core of the test logic resides.
# Tests creating an integer file
defp test_integer_file(path, num_of_integers, integers, verify) do
# Create the test stream
{test_device, test_stream} = create_test_stream()
# Setup the IntegerFile mock
IntGen.IntegerFileMock
|> expect(
:create_integer_file_stream,
fn actual_path ->
verify_create_file_stream(path, actual_path, test_stream)
end
)
|> expect(
:write_integers_to_stream,
fn enumerable, out_stream ->
verify_write_integers_to_stream(enumerable, test_stream, out_stream)
end
)
# Call the test method and verify the results
result = IntGen.create_integer_file(path, num_of_integers, integers)
assert result == :ok
# Close the test stream and get the data that was written to it
{:ok, {_, written_data}} = StringIO.close(test_device)
# Call the verification method
verify.(Enum.take(integers, num_of_integers), written_data)
end
First a test stream is created. Since we are mocking the file interface, the integers won't really be written to a file. Instead, all the integers will be written to this test stream, which pretends to be a file stream. That way I can examine what was written to the file without having an actual dependency on a file system.
Then we mock the functions in the IntegerFileMock
module. Both file functions are called by create_integer_file
, so I mock both. With mox, we mock a function using expect/3
. The first parameter is the function to be mocked, which we specify as an atom. Apparently, each function in Elixir can be represented as an atom. The second parameter is the number of times we expect the mocked function to be called; if it is called more or fewer times than expected, an error will be generated. This is an optional parameter with a default of 1. Since I expected these mocked functions to only be called once, I did not specify the second parameter. The third parameter is a function that provides the code for the mocked function. When the mocked function is called during the unit test, it is this code that is run.
Here's a diagram showing how the function calls happen during normal code execution and how the function calls happen during unit tests. Keep in mind that I am testing create_integer_file
and not the file system functionality, so I mock the file system functionality to reduce testing complexity as much as possible.
Normal Code Execution:
Unit Test Execution:
In the methods I mocked, I verify that the parameters being passed to the mocked method are correct and I return a response. The mocked create_integer_file_stream
function just returns the test stream, but the write_integers_to_stream
function was a bit more involved. In this situation, I needed to be able to analyze the final contents of the data that was written to the stream. This ideally involves an in-memory collection that I can access later. Due to the nature of immutable data in Elixir, I found great difficulty in doing this until I stumbled across the StringIO
module in Elixir.
The functions in StringIO
can create an in-memory device that deals with string data. It's well-suited for use in mocked functions in unit tests. You can give it data to be read and you can write data to it as well. This makes it very nice to use for substituting for a real device. The IO.stream/2
and IO.binstream/2
functions allow us to convert the StringIO device to a stream. When we close the StringIO device, we can obtain the data that was read from it and written to it, which we can then look at to verify that the function we're testing is working correctly. I didn't know about the StringIO functionality when I first started writing unit tests and I had to stumble around to figure out how to record the integers being written to the file. In a language with mutable data, I would have just written each integer to an outside collection, but since Elixir has immutable data, that is more difficult to pull of. In the end StringIO
turned out to be the best solution to this problem.
In my unit tests I make use of a stream that was created from a StringIO
device using the IO.stream/2
function to convert a device into a stream.
Once the test was finish, I read the contents of the StringIO
device and verified that the correct data had been written to it.
Formatting
I discovered that running "mix format" on a project will run the code through a standardized formatter. So I ran "mix format" on the project, so the code formatting has been standardized. It left most of my code as it was, but it made some formatting choices that were different than what I had chosen. I had no objections, so I kept the formatting changes.
Using this tool as part of an automatic commit process would be a good way to standardize code in a project. That's something I'll have to look into later.
Giving IntGen a Command Line Interface
I now have IntGen working nicely and its unit tests are passing. There's still one more thing to do, and that's connecting the IntGen functionality to a command line interface layer. That layer will translate command line arguments into function calls and handle whatever command line stuff that needs to be handled. In the next post, I'll discuss the CLI layer and go over the code that it consists of.