In the
last article I discussed how Elixir handles fault-tolerance. Now we'll go over the reasons why Erlang and Elixir are considered real-time languages and the mechanisms that enable real-time abilities.
What is a Real-Time Language?
Real-time programs must guarantee response within specified time constraints, often referred to as "deadlines"
A hard real-time system means a missed deadline would be a total system failure. However, a soft real-time system can guarantee that a subset of deadlines will be met and optimized on a per-system basis. For example, all non-IO operations are guaranteed within deadline but IO operations might not be guaranteed.
It's important to note that real time does not mean fast. It can be fast but it's not a requirement. Real time refers to the reliability of a response within a certain timeframe. The system must be predictable. If the deadline is 1 year and we return the response in 1 week then the system is real time.
Why is Erlang a Real-Time Language?
Understanding the origins of Erlang helps to explain the design choices made in the language. Erlang was designed for programming telephone exchanges. They wanted responses in milliseconds, but more importantly, they wanted them consistently. They also didn't want long-running computations blocking other computations that could be computed faster.
As explored in the previous article, if a process fails or misses a deadline then it is retried. This is why Erlang is soft real-time. It's not trying to be completely faultless, only fault-tolerant.
In short, Erlang was designed to handle a lot of long-lived but inactive processes, e.g. a call is initiated, then the system waits for the end of the call. Occasionally a process will consume a larger amount of processing but won't block other processes.
This style of system lends itself nicely to other applications such as multiplexing web-servers. BEAM is capable of fast context switching, fast process creation, and fast message passing. Processes are isolated and non-blocking. This is ideal for web-sockets and/or large numbers of concurrent requests.
What Design Choices Help Erlang Towards Real-Time Compliance?
Memory management: In most languages the garbage collection process pauses the whole system. For long running systems this can cause concurrent processes to pause when the memory runs out. BEAM takes a different approach. Each process has its own memory heap which can be garbage collected separately, completely isolated from all other processes. Not only does this prevent garbage collection from blocking all processes, it also makes garbage collection runs much faster as the heaps are much smaller.
Data in these private heaps are divided into two generations, young and old. The young generation is data that has been created since the last garbage collection run. The old generation is data that survived since the latest garbage collection run.
GC (Garbage Collection) will run each time the heap grows larger than the min_heap_size config. If a short-lived process doesn't reach the
min_heap_size GC will never run for that process.
GC can run in two modes full sweep and generational garbage collection. Full sweep will scan the young and old generation items. Generational garbage collection will scan the young generation, collect what can be disposed of and then move the remaining items into the old heap.
Each time the heap grows to the min_heap_size a generational sweep will run. If the heap grows larger than the limit set by
spawn_opt(fun(),[{fullsweep_after, N}]). This function defines the number of young generation garbage collections that need to be run before another full sweep runs.
If the GC cannot free enough space, the heap size will be automatically increased.
Real-Time Scheduling: The BEAM VM uses a preemptive scheduler meaning each process is allocated a small amount of CPU time without their cooperation, e.g., the scheduler doesn't ask if it can deschedule a task before resuming or starting the next. The scheduler is continuously context switching, resuming each process one by one. This allows all processes to continue running even when one process requires a lot of CPU cycles.
While the schedule does not require cooperation from the tasks, it does selectively schedule based on priority. The priority can be specified when spawning a process using the
erlang:process_flag/2 function. The process will be placed into the queue and then the scheduler will select a process to resume based on the priority.
Another factor in scheduling is Reductions. A reduction is a counter that increments each time a function is called. This is an estimation of how much CPU time is spent. The process could make a record of the OS time but that's expensive so a reduction is a faster approximation. Each time a process is resumed it is allocated a
reductions "budget" of 2000. Once 2000 reductions have been performed the scheduler performs the next context switch.
Wrapping UP
Through these mechanisms, Erlang and Elixir are highly predictable. This allows us to build systems that are reliable and scalable. Typically you don't really need to know too much about how this is working when using Elixir as a lot of it is handled for you.
In the next episode we'll take a look at Elixir's ability to hot-swap code on the fly.