Learn With Me: Elixir - ElixirLargeSort IntGen Project Part 3 (#78)
Now is the time for a bit of polish for the IntGen project. I'm going to add a progress bar to show the current progress when generating random integers. This is really nice to have when generating large numbers of integers, since it gives the user some feedback on the current progress and gives them an idea of how long it will take to finish.
Integer Generation Events
In order to update a progress bar, I have to know when an integer is generated. In the Node.js and C# solutions, I did this with a callback function that I passed to the integer generation function. The integer generation process would call the callback function whenever an integer was generated.
In Elixir, that would look something like this.
update_progress_bar = fn _ -> ... end
IntGen.create_integer_file(options.output_file, options.count, random_stream, update_progress_bar)
Then I would have to modify IntGen.create_integer_file/4
in order to call the callback function.
Since this is Elixir, however, there is a much easier, lower-impact way to do this, and it's all because of the way I wrote the code.
Here's what the process/1
code looked like prior to my progress bar updates.
defp process({:ok, options}) do
integer_range = options.lower_bound..options.upper_bound
# Create the random integer stream
random_stream = IntGen.random_integer_stream(integer_range)
# Create the integer file using the random integer stream
IntGen.create_integer_file(options.output_file, options.count, random_stream)
end
I create a random integer stream and I compose it with the function that creates the integer file. The fact that I'm using streams and function composition makes it easy to insert some code that updates the progress bar. All I have to do is use Stream.each/2
on the random integer stream to observe the generation of an integer and act upon it. As long as the stream still emits the random number at the end of it, I can do this without affecting any other part of the code. This is totally awesome! Woo hoo, streams!
Here's the same code with progress bar updating.
defp process({:ok, options}) do
integer_range = options.lower_bound..options.upper_bound
# Create the random integer stream
random_stream = IntGen.random_integer_stream(integer_range)
# Intercept the integer generation for updating the progress bar
random_stream =
random_stream
# Add an index onto the number being generated
|> Stream.with_index()
# Update the progress bar
|> Stream.each(fn {_, current_index} ->
update_progress_bar(current_index + 1, options.count)
end)
# Transform the integer-index tuple back to an integer
|> Stream.map(fn {integer, _} -> integer end)
# Create the integer file using the random integer stream
IntGen.create_integer_file(options.output_file, options.count, random_stream)
end
Let's go through the changes I made to the random integer stream. My typical approach would be modifying a counter variable to keep track of the number of integers that are generated, which is what I initially tried. Since data is immutable in Elixir and the nature of Elixir closures won't allow me to update a variable in an outer scope and have it reflect in the inner scope (Javascript style), this didn't work for me. I probably could have eventually gotten something to work in this manner, but I had a realization that I can use Stream.with_index/2
to modify the stream to produce a tuple with the integer and an index.
So that's what I did here. I took the random integer stream and fed it to Stream.with_index/2
to produce an {integer, index} tuple. Then I fed that stream to Stream.each/2
and called another function to update the progress bar with the current number of integers generated so far and the total number of integers to be generated. That will allow the progress bar to update every time a random integer is emitted from the original random integer stream.
In Elixir, Stream.each/2
emits the same data that was fed into it (it's intended for observing data for the purposes of side effects), so it's still emitting tuples. So I fed the stream into Stream.map/2
to extract the random integer from the tuple and once again we have a stream that emits random integers. That stream is fed to IntGen.create_integer_file/3
and everything continues to work like it did before. I love it! It's the ultimate in orthogonal code.
I honestly didn't have this in mind when I originally wrote the code. The fact that I was using streams and combining the random integer stream with a function that creates an integer file made it all fall into place rather easily, which was a delight for me. It seems that the loose coupling that streams offer combined with separating the different pieces of functionality and then combining them made the code quite flexible.
Updating the Progress Bar
So now that we are notified when an integer is generated, and call an update_progress_bar/2
function, let's look at how I actually update the progress bar.
I used a package called progress_bar that can create various types of progress bars and spinners in a console window. All I had to do is add {:progress_bar, "~> 2.0"}
as a dependency in mix.exs, retrieve the dependency using mix deps.get
, and I was ready to use it.
Here's a look at the initial progress bar
defp update_progress_bar(current_integer, total_integers) do
ProgressBar.render(current_integer, total_integers, progress_bar_format())
end
# Returns the format of the progress bar
defp progress_bar_format() do
[
suffix: :count
]
end
Calling ProgressBar.render/3
draws/redraws the progress bar in the console, with the first argument being the current number completed, the second argument being the total amount, and the third argument being a set of options. In this case, I only used the suffix: :count
option, which draws the "(current count/total count)" at the end of the progress bar. There are other options like the characters and colors to be used in the progress bar, but in experimenting with it, I found that not all consoles supported UTF-8 characters and ANSI colors by default (Windows, I'm looking at you), so I just left it as a boring grey progress bar.
Here's what it looks like.
> ./int_gen --count 1000000 --lower-bound -2000 --upper-bound 2000 "data/random_integers.txt"
|========================================================-------| 87% (87000/1000000)
Well, that was pretty simple. Now I have a working progress bar with so much less effort than it took to implement this in the Node.js and C# versions of this application.
Woah, I/O is Slow!
But wait, there's more! It turns out that I/O operations are pretty slow. In fact, it's one of the slowest things that a computer can do. Here we have both console I/O and file I/O happening at once. The fact that I'm performing an I/O operation (updating the progress bar) every time an integer is generated makes this application slow. In fact, it's spending far more time doing I/O than it is spending on generating integers. This isn't noticeable to us when we are generating a small number of integers, but when I'm generating millions of integers, this can accumulate to make the program far slower than it would be otherwise.
To be honest, I had anticipated this, so I didn't waste a bunch of time debugging. In the very first implementation of IntGen (in Node.js), I spent a lot of time trying to figure why it was so incredibly slow. I eventually figured out that it was spending 99% of its time on I/O, most of it refreshing the progress bar every time an integer is generated. The file I/O, which makes good use of buffers and large writes to be more efficient, was a taking tiny fraction of the time that the console I/O was taking up.
So once I had the progress bar working, I immediately knew why generating large numbers of integers was taking so long. It was the console I/O!
So instead of updating the progress bar for every single integer, I instead changed the code to update the progress bar for every 1000th integer.
First, I created a module attribute to define the update frequency. This makes it so that the number is in one place and is easy to modify.
@progress_update_frequency 1000
Then I changed update_progress_bar/2
so that it only updates the progress bar every time the number of generated integers is evenly divisible by @progress_update_frequency
.
# Updates the progress bar
# This clause updates the progress bar occasionally when a larger number of integers
# is generated so that the program doesn't spend all its time on progress bar updates
defp update_progress_bar(current_integer, total_integers)
when rem(current_integer, @progress_update_frequency) == 0 do
ProgressBar.render(current_integer, total_integers, progress_bar_format())
end
# Updates the progress bar when all the integers have finished generating.
# Otherwise, it won't show at 100% unless the total happens to be evenly
# divisible by the update frequency
defp update_progress_bar(current_integer, total_integers)
when current_integer == total_integers do
ProgressBar.render(current_integer, total_integers, progress_bar_format())
end
# If the current integer does not match the update frequency, don't update
# the progress bar
defp update_progress_bar(_, _), do: :ok
I also make sure to update the progress bar at the very last integer, or the progress bar may never be refreshed to indicate 100%. Also, if the total number of integers being generated is less than @progress_update_frequency
, the progress bar would never be displayed at all. So that's why we want to make sure to update the progress bar at the very last integer.
That change had a huge impact on performance. IntGen ran far faster and the fact that integers are generated so quickly means that progress bar updates are still fast and smooth. Far less time is spent on console I/O and more time is spent on generating integers and writing them to a file. If I understood how Elixir concurrency worked, I could have one process updating the I/O while the other process generates integers and writes them to a file, but I haven't learned how to do that yet.
So now IntGen is complete! That was the easy one. IntSort is signficantly more complex and that's what we'll work on next.