Learn With Me: Elixir - Getting to Know Elixir (#2)

Before we dive into Elixir, let's get to know it a bit better. It's good to know something about the language so that we have a better idea about where it came from and what it can do.

Before I learn a language, I like to read about it and what it's capable of, as well as learning about its advantages and disadvantages. That helps me to know where to apply it and when to use it.

The Nature of Elixir

Elixir the language is a dynamically-typed language like Python and Javascript. This means that the types of data are not determined at compile time, but at runtime. A variable could refer to a string at one point and then later refer to an integer. This is in contrast to statically-typed languages such as C# and Java where types must be known at compile time and are strictly enforced. A variable in a statically-typed language must always be of the same type and cannot hold any other type of data.

This is a tradeoff between flexibility and early error detection. Statically-typed languages can catch a certain type-related class of errors at compile time and can generate more efficient compiled code with that type information, but the code is less flexible.

Elixir the language is functional. Functional languages are built with functions as first-class citizens and tend to involve a declarative tyle of programming, where you tend to specify what is to be done rather than focusing on how to do it. Functions tend to be small and simple, and bigger functions are made by composing the smaller functions together. Functional languages also tend to use immutable data. I'm hardly a functional language expert at this point, but in a future post I'll go over what I understand of functional programming. It's a different mindset than imperative programming, and it's something I have not yet mastered. Practice and looking at functional code examples is probably the key to getting better at this.

Elixir the language is oriented toward building scalable and maintainable applications. Elixir really emphasizes these principles, and I hope I'll be understanding this aspect of the language better as I learn it.

The Elixir Platform

Notice that I used the term "Elixir the language" because there is much more to Elixir than just the language. There's a whole platform it runs on and an ecosystem that surrounds it. Just like Java, Scala, and Clojure compile to JVM byte code and run on the Java virtual machine (VM), and C#, VB.NET, and F# are also compiled to IL (Intermediate Assembly) byte code and run on the Common Language Runtime, the VM for the .NET languages, Elixir is also compiled into byte code and run in a virtual machine.

The virtual machine that Elixir runs on is not a virtual machine that was created for Elixir, but an already existing virtual machine called the Erlang VM (or BEAM). It's official name appears to be BEAM, but I've seen it more often referred to as the Erlang VM, so that's what I'll call it here. Erlang is an older language (from the mid-1980s) that runs on a platform that enables scalable, concurrent, and robust code. So instead of reinventing everything, Elixir just integrated itself into a platform that already existed.

Elixir can interact with Erlang code and can call Erlang libraries if the equivalent functionality doesn't exist in an Elixir library. This is similar to how the JVM or .NET languages work.

There's more to the platform than just the Erlang VM. During the 1990s, the company Ericsson took the existing Erlang platform and added on a bunch of functionality that enabled better handling of concurrency and more robustness. Since Ericsson was developing telecommunications systems, it was invested in enabling better scalability, robustness, and concurrency, which is what you absolutely want in a telecommunications system. This additional functionality is known as the Open Telecommunications Platform (OTP) and is bundled with the Erlang VM and libraries into the Erlang/OTP platform.

The OTP functionality is used for all kinds of solutions these days, not just for telecommunications. However, the original OTP name has survived. Although OTP isn't part of the Erlang VM, it's part of the platform that Elixir uses, and I suspect I'll rarely think about what functionality is part of the VM, what comes from an Erlang library, and what was added on as part of OTP. It'll all just be a big collection of goodness I can use when doing Elixir development.

Concurrency, Scalability, and Robustness

Although I haven't yet learned Elixir, when I was learning about why people liked Elixir so much, I got an overview of how Elixir achieves concurrency, scalability, and robustness.

First of all, the functional nature of Elixir and its immutable data means that a lot of the issues that plague concurrent code are avoided: mutation, shared state, and side-effects are truly the bane of multi-threaded programming. This is a big topic on its own and I probably won't drill down into much detail on this. This article on functional programming goes into this topic, so if you're interested in learning more, give it a read.

Elixir achieves a scalable concurrency through its implementation of the actor model, which I explained in the previous post. An actor in Elixir is a very lightweight process (a virtual process in the virtual machine, not an operating system process) that consumes very little memory. Hundreds of thousands or even millions of these can run in the Erlang VM, allowing applications and services to continue to function under a very heavy load. This also allows a very high degree of concurrencty, since all these processes run independently of each other.

Communication between processes is done via messages, which contain immutable data. This allows us not to have to worry about mutable data, shared state, locks, race conditions, and all the other stuff we would have to deal with when doing concurrent programming in other languages.

Elixir is not just good at doing concurrent processing on the same machine, but makes it easy to do concurrent processing over multiple machines, enabling the easy (or easier at any rate) creation of distributed systems. An Elixir process can send a message to another Elixir process on the same machine just as easily as it can to an Elixir process running on a different machine. From what I understand, it's relatively simple to scale from two machines to hundreds (although I really want to see this to truly believe it).

Robustness is achieved by something called supervisors. I know very little about these at this point, but from what I understand from reading about Elixir, these can monitor Elixir processes and restart them when they crash. In fact, this makes Elixir processes so robust that, from what I understand, processes will sometimes deliberately let themselves crash when an error occurs, knowing that they'll be revived. Immortality can lead to some interesting consequences, I see. You're less afraid to die when you know you'll just be revived. I'm really curious to learn about this aspect of Elixir, but I suspect that this is advanced functionality I'll learn about much later. We need to master the basics first.

The ability of Elixir to make it so much easier to deal with concurrent programming is a huge benefit in the current environment. Processors have only been experiencing incremental improvements in performance, but as the number of cores increase and cheap cloud computing with multiple processors becomes cheaper and cheaper, whoever can make efficient use of parallel processing power will have a big advantage over someone who is unable to.

Immutable Data

Data in Elixir is immutable. That means it can never changed. Once the data has been created, it is set in stone (up the the point it gets garbage collected). Nothing you can do will ever change that data. Variables can be reassigned (rebound in Elixir terms) to different data, but the data itself will not change.

I imagine that most of you have dealt with constant data defined in the code that will never change, but you've also had mutable data structures available as well. You might be wonder how it is remotely possible to get anything done with immutable data. Data is used to represent state, and state changes over time.

Well, Elixir allows you to "modify" data by creating a new data structure that is the same as the original data structure plus the modifications that you wish to make. The original data structure is unchanged and anything referring to the original data structure still refers to it, but you now have another set of immutable data. Elixir embraces the concept of transforming a set of data into a new set of data rather than modifying it in place.

This sounds weird and inefficient. It's not actually as inefficient as it sounds and it avoids a huge range of nasty issues that can develop when sharing mutable state between different threads. In fact, having immutable data gains efficiency in other areas. You can share data without worrying about it getting modified by something else and just knowing that data will never, ever change allows Elixir to make some optimizations that it could never achieve with mutable data.

Elixir and its underlying Erlang platform specialize in concurrency and scalability, and immutable data goes a long way in achieving that. Elixir will make the trade-off of some inefficiency at the micro level to gain efficiency for the entire system as it scales up to more threads, processors, and machines. We'll go into some more detail on immutable data in the next post.

Garbage Collection

Like many languages in use nowadays, Elixir is a garbage-collected language. This means that you don't have to manually allocate and deallocate memory like I did back when I was doing C and C++ development. That could be a big pain, especially chasing after the inevitable memory management issues that would often make themselves known in very non-obvious ways through weird, inexplicable defects. I still feel a sense of relief that I don't have to chase that class of bugs down anymore. Well, you can still do stupid things in garbage-collected languages like holding onto a reference to data that is no longer needed, but that isn't nearly as aggravating as having to chase down defects related to memory management.

Instead of having you manualy allocate and free memory for data structures, the language runtime just automatically allocates memory for data: you don't have to worry about that at all. It also detects which data are being used and which data are no longer being referenced in the code. The garbage collector comes along once in a while, pauses execution, cleans up the unneeded data, releasing the memory back to the heap. You don't want this kind of behavior for operating-system-level code or code where every cycle matters, but the majority of development occurs at a higher level where this is not a big deal.

If you've been doing a lot of memory allocation in your code, creating and releasing a large amount of data (or millions of sets of small data) in a short amount of time, garbage collection can go from an unnoticeable pause to a significant and obvious pause. So programmers in a garbage-collected environment need to be aware of garbage collection behavior so that they can avoid code which will really stress out the garbage collector.

Elixir has the advantage that code tends to be distributed among multiple (virtual) processes for the sake of concurrency and scalability. Each process has its own heap. As a result, the garbage collector can collect garbage on each process individually rather than for the whole system at once. This means that the garbage collector cleans up a smaller amount of data at a time, resulting in a much smaller pause. If one virtual process is allocating and abandoning its memory like crazy, the lengthy garbage collection will just affect that particular process, and not the rest of the system. Elixir once again optimizes toward keeping the entire system scalable and stable.

Elixir Trade-offs

It's very rare in the software development world that something does not involve trade-offs. So naturally, Elixir does involve some trade-offs. There are some things that it does very well and some things it's just not suitable for. I've already gone into a lot of what Elixir is good at, so let's go over what you'll be giving up when using Elixir.

The fact that Elixir is a functional language with immutable data structures means that it doesn't have the CPU-efficiency that languages like C, C++, and Rust have. It's running in a VM, which reduces computing efficiency, and immutable data structures are never quite as efficient as mutable data structure. So I would not recommend Elixir for implementing your graphics engine or creating an operating system: it's just not close enough to the hardware. So a gaming client with 3D graphics should not be implemented in Elixir. However, implementing a networked game server would be an excellent use case for Elixir, especially if it will be handling a lot of connections.

Elixir is also not optimized for mathematical calculations, so it's not the best thing to use for calculations that really stress the processor. That's not to say that you can't use Elixir in applications that involve CPU-heavy calculations, but the part of the system that involves heavy calculations that put a big load on the CPU should probably be implemented in a language better suited to that task. The average computation that a typical networked application or service performs will not be a problem with Elixir, but you probably won't want be doing statistical analysis on gigabytes of data or running computation-intensive physics simulations.

Elixir is a dynamically-typed language where much less is resolved at compile time, but much more is resolved at runtime, with all the trade-offs that come with that. It gives you more flexibility, but far fewer defects are detected prior to code execution. Statically-typed languages can catch far more issues at compile time.

Most languages used in the web world have similar advantages and disadvantages (particularly the dynamic languages) and most CPU- and memory-efficent languages are not fun to use in web development or concurrent programming. Elixir is hardly unique in this aspect.

The performance advantages of Elixir aren't that obvious on a small scale. Elixir and the Erlang platform it's built on top of optimize for predictable and measured performance over all the processes running on the Erlang VM. It does not maximize the throughput of any particular process. This means the performance of Elixir won't be anything remarkable if you only have a few connections where every microsecond counts. The performance benefits will quite remarkable when there are 50,000 connections and the system remains stable and responsive. Elixir optimizes for concurrency and scaling at the system level even if it means additional inefficiency at a small scale. In other words, Elixir values macro efficiency over micro efficiency.

Another disadvantage comes from the fact that Elixir is fairly new and hasn't yet attained the popularity of languages like Javascript, Ruby, C#, Python, or Java. This means that the ecosystem hasn't had as much time to develop and the hoards of packages that can be found in the Node.js or Ruby ecosystem just aren't there. You will find packages for many things, but you won't be able to find a package for everything.

Since Elixir doesn't have packages and libraries for everything, you may sometimes have to turn to Erlang libraries for some advanced functionality that already exists in the Erlang world, when that happens, you'd have to learn at least something about Erlang in order to use the library. You may also have to write your own Elixir library if the corresponding Elixir package or Erlang library does not exist.

I suspect that the ecosystem disadvantage is something that will slowly disappear over time as the ecosystem becomes more mature, more contributors create packages, and Elixir takes the place of many Erlang libraries, but I don't expect it to catch up to the Node.js or Ruby ecosystem anytime soon.

Origin of Elixir

Elixir was created by José Valim and first released in 2011, making it a much newer programming language than most currently in use. José was very active and well-known in the Ruby community and was a core maintainer of the Rails framework. He had come to realize that Ruby was not the optimal language for solving concurrency problems and set out to find a better language for doing that.

José found the Erlang programming language and the associated Erlang virtual machine and OTP platform. He was impressed by how well it solved concurrency problems, but he missed many of the more modern tools, documentation, language constructs he'd seen in other languages. He set out to create an improved language and toolset that could run on the Erlang virtual machine and take advantage of all the functionality of the Erlang/OTP platform. That language became Elixir. It was meant to be a much-improved alternative to Erlang that could take advantage of all the goodness of the Erlang platform.

This "building on the shoulders of giants" was a good move. Elixir was able to get a massive boost from the existing Erlang platform and ecosystem that never would have been possible if José had decided to re-implement everything from scratch.

Since José Valim was very active in the Ruby community, Elixir inherited a lot of ideas and syntax from Ruby. At least that's what I'm told. I don't know any Ruby, so I won't personally be able to comment on those similarities. I imagine that someday I will learn Ruby and recognize the ideas and syntax I already know from Elixir.

Elixir Usage

Although Elixir usage has not matched any of the more popular languages such as Javascript, Python, and Ruby, it's been rapidly increasing in popularity over the years. The main draw is its scalability and fault-tolerance. Developers using Elixir come from a variety of backgrounds, but I've gotten the impression that migrating to Elixir seems to be a popular path for Ruby developers, probably because José Valim being well-regarded in the Ruby community and the carryover of many ideas and syntax from Ruby.

Elixir seems to be mainly used for creating scalable, robust, and concurrent systems, many of which are web services and web applications. The Phoenix framework helps developers create such services and applications. I plan to learn Phoenix as well, but that will come much later after I've already become familiar with Elixir.

There is an ecosystem surrounding Elixir consisting of a variety of tools. This includes a package manager and package repository. I'm not familiar enough with this ecosystem to be able to comment on how many packages are available, but I think it's safe to say that it's probably smaller than npm. :)

Elixir Adoption

From what I've picked up in reading about Elixir, systems are typically migrated to Elixir to make those systems much more scalable and fault-tolerant than the systems they are replacing. I've also seen claims of better maintainability with Elixir, although I don't know any details regarding that.

Before migrating, however, an organization needs to make sure that they can either obtain some developers already familiar with Elixir or train their existing staff to become familiar with Elixir. Usually, it's a combination of both. Although the popularity of Elixir has been steadily rising, it's not yet counted among the most popular of languages, such as C#, Java, Python, Ruby, and Javascript. As a result, it may take some effort to find developers who already know Elixir.

Most Elixir developers will have learned it fairly recently, with it being a fairly new language, and they likely took the initiative to learn it rather than having had it taught to them. Even someone with a bit of existing knowledge of Elixir and a motivation to learn could prove to be a valuable resource to the implementation of an Elixir system.

The organizational and technical aspects of Elixir adoption and some stories of companies who've successfully done so are documented in the book Adopting Elixir. I'm currently reading this book to get a better idea of what Elixir is all about, and I'm finding it to be quite insightful.

Elixir Documentation

One of the great features of Elixir is its documentation. The Elixir documentation is quite good, and I have no problem reading it. I do find it's more useful as a reference, when I want to find out more about the details of a particular feature, than it is as a tutorial. The documentation so clear and understandable. It's one of more comprehensive and well-written sets of documentation I've seen for a language and the maintainers should be commended. I appreciate the effort that has been put into the Elixir documentation.

You can find the documentation on the Elixir website. I encourage you to take a look at the website from time to time and do some reading.

Cool Tools

I'm aware of some nice tools that Elixir has. It comes with a package manager, a unit testing framework, a build tool, and an interactive shell where we can play around with Elixir code without going through a build and run process. I know very little about most of these tools so far, but I imagine I'll eventually learn a lot more about them.

Learn With Me: Elixir - Getting to Know Elixir (#2)

Kevin Peter