Learn With Me: Elixir - ElixirLargeSort IntGen Project Part 2 (#77)

Today I'm going to continue the development of the IntGen project, which generates a file of random integers. Previously, I had a complete set of tested and working code, but there is no command line interface. I'll need to create a command line interface to run this code from the command line.

The code discussed here can be found in my ElixirLargeSort Github project.

The Command Line Options

Most of the command line logic is parsing arguments, so I started off focusing on that. To start of with, I created a struct to hold the options. The options struct is called IntGen.CLI.Options and it is located in lib/cli_options.ex. Here's the source code for the struct.

defmodule IntGen.CLI.Options do
  @moduledoc """
  Represents a set of command line options
  """
  defstruct lower_bound: 0,
            upper_bound: 0,
            count: 0,
            output_file: ""

  # Define the stuct type definition
  @type t :: %IntGen.CLI.Options{
          lower_bound: integer(),
          upper_bound: integer(),
          count: non_neg_integer(),
          output_file: String.t()
        }

  @spec new(non_neg_integer(), integer(), integer(), String.t()) :: IntGen.CLI.Options.t()
  def new(count, lower_bound, upper_bound, output_file) do
    %IntGen.CLI.Options{
      count: count,
      lower_bound: lower_bound,
      upper_bound: upper_bound,
      output_file: output_file
    }
  end
end

IntGen accepts four options as command line arguments: the number of integers to generate (count), the range of integers to generate (lower-bound and upper-bound), and the file to write the integers to (output-file). The CLI.Options struct contains those options.

In addition, there's also a "--help" argument that shows usage information, but I handle that separately from the options.

Parsing the Options

The module that parses and validates the command line parameters is called IntGen.CLI.Args, and can be found in lib/cli_args.ex. This module contains the most extensive amount of Elixir code I've written so far, and it's more involved that Elixir code I've previously written. There are a lot of functions in this module, but none of them are large or complex. It's a lot of smaller, simpler functions that are combined to create the parsing and validation logic.

I'm going to start off with the only public function in the module, which forms the main entry point to argument parsing.

@spec parse_args(list(String.t())) :: options_response()
def parse_args(argv) do
  OptionParser.parse(argv,
    switches: [help: :boolean, count: :integer, lower_bound: :integer, upper_bound: :integer],
    aliases: [h: :help, c: :count, l: :lower_bound, u: :upper_bound]
  )
  |> args_to_options()
end

The parameter that this function receives (argv) is simply a list of tokens from the command line. This function uses the OptionParser module that is packaged with Elixir transform the arguments from a list of tokens to a tuple containing keyword lists. The OptionParser.parse/2 function receives the raw command line tokens and a list of which switches and aliases are allowed and returns a tuple containing three things:

The parsed arguments: A keyword list containing any arguments with switches, where the switch name is the key and the switch value is the value
The additional arguments: A list of strings, which represent any additional arguments that didn't correspond to switches
The invalid arguments: A list of strings containing the tokens for the switches that were listed in the options, but were rejected as invalid because the switch was of the incorrect data type

So if I call "./int_gen --count 100 --lower-bound 1 --upper-bound 100 random_integers.txt", the raw arguments will be ["--count", "100", "--lower-bound", "1", "--upper-bound", "100", "random_integers.txt"] and the parsed result is {[count: 100, lower_bound: 1, upper-bound: 100], ["random_integers.txt"], []}.

That definitely saves me some work, but the final result still isn't in the nice form of the Options structure that I would like, and the only thing that has been validated so far is the data type of the arguments. So I pass the results of Options.parse/2 to args_to_options/1.

In the best case, args_to_options/1 returns {:ok, Options.t()}. This is the set of options that can be used for further processing. However, args_to_options/1 will return {:ok, :help} if the help switch was present and it will return {:error, list(String.t())} if there were any validation errors encountered. All of this is visible in the typespecs above each function. I made the typespecs even more readable by defining a bunch of types at the top of the file.

@type parsed_switches() :: keyword()
@type parsed_additional_args() :: list(String.t())
@type parsed_args() :: {parsed_switches(), parsed_additional_args(), list()}
@type error_response() :: {:error, list(String.t())}
@type validation_response() :: :ok | error_response()
@type validation_errors() :: {parsed_switches(), parsed_additional_args(), list()}
@type options_response() :: {:ok, Options.t()} | {:ok, :help} | error_response()

So instead of having a big expression that requires a lot of close inspection and thinking, I can create more meaningful typespecs that are easier to read. Sure, you still have to go look at what an "option_response()" is, but once you do that, it becomes more meaningful and easier to read the typespec expression.

There are several args_to_options functions, but we'll first take a look at args_to_options/1, which is always called initially.

# Performs validation on the parsed arguments and converts any valid parsed arguments
# to options
@spec args_to_options(parsed_args()) :: options_response()
defp args_to_options({parsed_args, additional_args, _}) do
  # Validate the arguments
  validation_response = validate_args(parsed_args, additional_args)

  # Handle the validation response and convert to options
  args_to_options(parsed_args, additional_args, validation_response)
end

This function first validates the arguments and then calls args_to_options/3, which in turn determines what to return based on the validation response.

@spec args_to_options(
        parsed_switches(),
        parsed_additional_args(),
        atom() | validation_response()
      ) ::
        options_response()
defp args_to_options(parsed_args, additional_args, :ok) do
  args_to_options(parsed_args, additional_args)
end

defp args_to_options(_, _, validation_response) do
  validation_response
end

As you can see above, args_to_options/3 has two clauses. The first clause is called when the validation response is :ok, meaning that validation was successful. It just calls args_to_options/2 to convert the arguments into an Options struct. If the validation response was not successful, the second clause is called, which just returns the validation error response, which contains messages indicating what went wrong.

Here's how args_to_options/2 converts the parsed arguments into a CLI.Options struct.

@spec args_to_options(parsed_switches(), parsed_additional_args()) ::
        Options.t() | :help
defp args_to_options(parsed_args, additional_args) do
  # If the help switch was set, return :help, otherwise convert the arguments
  # to an options struct
  if contains_help_switch(parsed_args) do
    {:ok, :help}
  else
    {:ok,
     Options.new(
       Keyword.get(parsed_args, :count),
       Keyword.get(parsed_args, :lower_bound),
       Keyword.get(parsed_args, :upper_bound),
       hd(additional_args)
     )}
  end
end

If the arguments contain the help switch, then {:ok, :help} is returned. Otherwise, the parsed arguments are extracted from the parsed switches and the additional arguments. There can be only one possible (valid) argument in the additional arguments, the output file, so the code uses hd/1 to grab the first additional argument. Any further arguments are ignored entirely.

Validating the Arguments

The only thing we haven't discussed yet with this module is how validation is done. The validation functions take up the majority of the lines of code, and there are a lot of functions involved, but code in each validation function tends to be fairly small and understandable.

I'm going to start at the topmost validation function, validate_args/2.

@spec validate_args(parsed_switches(), parsed_additional_args()) ::
        :ok | error_response()
defp validate_args(parsed_args, additional_args) do
  if contains_help_switch(parsed_args) do
    :ok
  else
    validate_non_help_args(parsed_args, additional_args)
  end
end

If the help switch is present, the arguments are considered valid, no matter what else is there. That's because the program will ignore everything else and display only usage information.

If the help switch is not present, then validate_non_help_args/2 is called to do the "real" validation.

@spec validate_non_help_args(parsed_switches(), parsed_additional_args()) ::
        validation_response()
# Validates the non-help arguments
defp validate_non_help_args(parsed_args, other_args) do
  {_, _, errors} =
    {parsed_args, other_args, []}
    |> validate_count()
    |> validate_bounds()
    |> validate_output_file()

  if length(errors) == 0 do
    :ok
  else
    {:error, errors}
  end
end

This is the core of the validation logic. This function orchestrates smaller validation functions and makes a decision based on the combined validation result.

Writing this particular part of the validation parsing took me a while because I had to think functionally. I want to collect all the validation error messages and pass them back. My normal approach to a problem like this is to pass around a collection of messages and add messages to the collection as validation errors occur. If the validation messages collection is empty at the end, that means validation succeeded. Since data is immutable in Elixir, that approach just won't work!

So what I did is pass a tuple to each validation funciton containing the arguments and a list of errors (which starts out empty). Each validation function will then pass back a tuple with the same information, but with an error message added if it found a validation error. That tuple is then passed to the next validation function and the tuple that emerges at the end of the pipeline contains all the validation error messages that resulted.

In looking back on it, I really didn't need to return the entire tuple, since the arguments data was never modified, but that did save some repetitive argument passing since the entire package moves through the pipeline.

At the end, if the errors list passed back from the final function call is empty, then the functions returns :ok to indicate a successful validation. If not, then the error messages are returned.

I'm not going to go through every validation function because there are a lot of functions involved, but I will cover a few.

Let's first look at validate_count/1, which validates the count parameter.

@spec validate_count({parsed_switches(), parsed_additional_args(), list(String.t())}) ::
        validation_errors()
defp validate_count({parsed_args, other_args, errors}) do
  # First we check if the count exists, then we check its value
  with :ok <- validate_switch_exists(parsed_args, :count, "count"),
       :ok <- validate_count_value(Keyword.get(parsed_args, :count)) do
    {parsed_args, other_args, errors}
  else
    {:error, message} ->
      {parsed_args, other_args, message ++ errors}
  end
end

This function first validates if the count switch even exists. Then if the switch does exist, the function validates the value of the count argument. This is done using the with expression. While useful for creating temporarily-scoped variables, the with expression is also useful for calling a series of steps where the later steps are aborted if an error occurs in an earlier step.

That's the case when validating the count. If validate_switch_exists/3, a generic function for validating whether switches are present, returns anything other than :ok, validate_count_value/1 is never called. When all the validation functions return :ok, then the expression underneath the do is returned, which is the unmodified tuple that was passed into the function.

When a validation function does not return :ok, then the expression in the else clause is returned. The expression in the else clause must always be an anonymous function, where the parameter is the thing that was returned from one of the with functions that did not match :ok. In the case of validation, that's a tuple with an error message. That error message is concatenated to the list of error messages that was passed into the function, and the entire tuple (with a new error message) is returned.

That was a good example of conducting multiple validation steps where a single error stops all further validation, but what if we want to keep validating when an error is encountered? Well, then we'll just use a pipeline mechanism like we saw earlier, where a list of messages is passed from one function to another.

I'm going to go over one more validation function, validate_bounds_values/2, which validates the values in the "--lower-bound" and "--upper-bound" arguments. This function ensures that both values are integers (which will probably be ensured by OptionParser.parse/2, but I decided to include this anyway) and that the upper bound is never less than than the lower bound.

This function accepts the value of the lower bound and the value of the upper bound. It has three clauses, which make use of pattern matching.

@spec validate_bounds_values(integer(), integer()) :: validation_response()
defp validate_bounds_values(lower_bound, upper_bound)
     when is_integer(lower_bound) and
            is_integer(upper_bound) and lower_bound <= upper_bound do
  :ok
end

# This clause gets called when both bounds are integers but the lower bound is above
# the upper bound
defp validate_bounds_values(lower_bound, upper_bound)
     when is_integer(lower_bound) and
            is_integer(upper_bound) and lower_bound >= upper_bound do
  {:error, ["The lower bound cannot exceed the upper bound"]}
end

# This clause gets called when one of the bounds is not an integer
defp validate_bounds_values(lower_bound, upper_bound) do
  {:error, []}
  |> validate_integer(lower_bound, "lower bound")
  |> validate_integer(upper_bound, "upper bound")
end

The first clause is called for the happy path: both values are integers and the upper bound is greater than or equal to the lower bound. The second clause is called when the lower bound is less than the upper bound. The third clause is called when one of the bounds is not an integer, and it validates each one using a pipeline because we don't want that short-circuiting behavior associated with the with expression.

Testing Argument Validation and Parsing

Note that the CLI.Args module has no side effects: it never prints anything to the screen. Instead, it returns a result that can be displayed on the screen (or sent elsewhere) by code that specializes in side effects. This separation of logic and side effects makes this module easily testable. In fact, in retrospect I could have substituted the error message strings for atoms, and have had different code that translated the atoms to UI messages, which would have separated error the error message details from error detection, allowing different messages to be displayed in different places or under different conditions. However, for a UI this simple, it's not a big deal.

These tests are located in the IntGen.CLI.Args.Test module located in "lib/cli_args_text.exs".

First, I created a function that tests the argument parsing when the parsing is expected to be successful. It receives a set of arguments and the expected CLI.Options struct.

# Tests parse_args when success is expected
def test_with_args_success(args, expected_options) do
  Args.parse_args(args)
  {:ok, result} = Args.parse_args(args)

  assert result == expected_options
end

Then I created a function that tests argument parsing when a validation failure is expected. It receives a set of arguments and the number of errors that it expects to receive.

# Test parse_args when an error is expected
def test_with_args_error(args, num_errors) do
  {:error, error_messages} = Args.parse_args(args)

  assert length(error_messages) == num_errors
end

Finally, I created a function that handles the help switch parsing.

# Tests parse_args when a help response is expected
def test_with_args_help(args) do
  {:ok, result} = Args.parse_args(args)

  assert result == :help
end

Once I had those basic functions in place, I used them to create a series of test cases covering different parsing scenarios. Here are some of them.

test "Parsing args with full set of valid arguments" do
  test_args = ["--count", "10", "--lower-bound", "1", "--upper-bound", "100", @test_file]

  expected_options = Options.new(10, 1, 100, @test_file)

  test_with_args_success(test_args, expected_options)
end

test "Parsing args with no arguments" do
  test_args = []

  test_with_args_error(test_args, 4)
end

test "Parsing args with help argument" do
  test_args = ["--help"]

  test_with_args_help(test_args)
end

test "Parsing args with help argument and additional arguments" do
  test_args = [
    "--count",
    "10",
    "--lower-bound",
    "1",
    "--upper-bound",
    "100",
    "--help",
    @test_file
  ]

  test_with_args_help(test_args)
end

test "Parsing args with missing count argument" do
  test_args = ["--lower-bound", "1", "--upper-bound", "100", @test_file]

  test_with_args_error(test_args, 1)
end

test "Parsing args with a negative count" do
  test_args = ["--count", "-10", "--lower-bound", "1", "--upper-bound", "100", @test_file]

  test_with_args_error(test_args, 1)
end

test "Parsing args with a zero count" do
  test_args = ["--count", "0", "--lower-bound", "1", "--upper-bound", "100", @test_file]

  expected_options = Options.new(0, 1, 100, @test_file)

  test_with_args_success(test_args, expected_options)
end

There are more tests than this, but I didn't feel a need to list them all. I tried a lot of different scenarios and edge cases. I ended up finding some defects and fixed them. Now I have a suite of tests I can easily run after making changes to the IntGen code.

The CLI Interface

So now the difficult part of the CLI code is complete: parsing and validating the arguments. We still don't have a running application yet! We still need to write the main entry point and the code that glues the argument parsing and the code functionality together. I did this in the IntGen.CLI. This module is also the only module in this application that prints text to the screen. It's actually quite a simple module.

defmodule IntGen.CLI do
  @moduledoc """
  Handles the command line and options parsing logic
  """

  alias IntGen.CLI.Args

  @type parsed_args() :: {keyword(), list(String.t()), list()}

  # This is the main entry point to the application
  def main(argv) do
    argv
    |> Args.parse_args()
    |> process()
  end

  # Starts the application processing based on the parsing of the arguments
  defp process({:ok, :help}) do
    output_usage_info()

    System.halt(0)
  end

  defp process({:error, messages}) do
    Enum.each(messages, &IO.puts/1)

    IO.puts("")

    output_usage_info()
  end

  defp process({:ok, options}) do
    integer_range = options.lower_bound..options.upper_bound

    # Create the random integer stream
    random_stream = IntGen.random_integer_stream(integer_range)

    # Create the integer file using the random integer stream
    IntGen.create_integer_file(options.output_file, options.count, random_stream)
  end

  # Prints usage information
  defp output_usage_info() do
    IO.puts("""
    usage: int_gen --count <count> --lower_bound <lower bound> --upper_bound <upper bound> <file>

    example: int_gen --count 100 --lower_bound -100 --upper_bound 100 "random_integers.txt"
    """)
  end
end

After parsing the arguments, it handles the possible results. Error messages are printed to the screen along with usage information, invoking the help switch prints the usage information, and if validation and parsing was successful, the code creates a random integer stream and composes it with the IntGen.create_integer_file/3 function to write random integers to a file using the options from the command line interface.

Application Package

Once I created an application package so that I can easily run it on the command line. If you don't know what this is or know how to do it, take a look at lwm 63 where I go through what an application package is and how to create one for a project.

> mix escript.build
==> largesort_shared
Compiling 2 files (.ex)
Generated largesort_shared app
==> int_gen
Generated escript int_gen with MIX_ENV=dev

Running the IntGen Application

Now it's time to actually run this application and see what it can do!

Here's invoking the application without any arguments.

> ./int_gen
The output file must be specified
The upper bound has not been specified
The lower bound has not been specified
The count has not been specified

usage: int_gen --count <count> --lower-bound <lower bound> --upper-bound <upper bound> <file>

example: int_gen --count 100 --lower-bound -100 --upper-bound 100 "random_integers.txt"

Here's invoking it with the help option.

> ./int_gen --help
usage: int_gen --count <count> --lower-bound <lower bound> --upper-bound <upper bound> <file>

example: int_gen --count 100 --lower-bound -100 --upper-bound 100 "random_integers.txt"

Finally, I'm going to generate 20 numbers between -100 and 100.

> ./int_gen --count 10 --lower-bound -100 --upper-bound 100 "random_integers.txt"

> cat random_integers.txt
-32
39
79
37
-19
-28
-93
-76
-4
68

Very nice! It worked great the first time I ran it thanks to the unit tests I wrote.

It looks like I'm done with IntGen now, but I want to give it a bit of polish. In the Node.js and C# versions of this application, I used a progress bar to show the progress of integer generation. That's quite nice when I'm generating very large numbers of integers and I want to see how things are progressing. It's so much nicer than the appearance of doing nothing. In the next post I'm going to enable a progress bar and make code changes to support updating that progress bar.

Learn With Me: Elixir - ElixirLargeSort IntGen Project Part 2 (#77)

Kevin Peter

Kevin Peter

The Command Line Options

Parsing the Options

Validating the Arguments

Testing Argument Validation and Parsing

The CLI Interface

Application Package

Running the IntGen Application

Learn With Me: Elixir - ElixirLargeSort IntSort Project Part 4 (#82)

Learn With Me: Elixir - ElixirLargeSort IntSort Project Part 3 (#81)

Learn With Me: Elixir - ElixirLargeSort IntSort Project Part 2 (#80)

Learn With Me: Elixir - ElixirLargeSort IntGen Project Part 3 (#78)

Learn With Me: Elixir - ElixirLargeSort IntGen Project Part 1 (#76)