In the
last article I discussed how Elixir handles distribution through isolated processes and message mailboxes. Now we'll take a look at how Elixir makes those processes fault-tolerant.
Previously we created a KeyValueStore that implemented a
GenServer interface. We didn't take care of errors in that module. In some situations it's a good idea to be defensive, other times it's counter-productive. We cannot know in advance all of the issues that might occur in our processes so defending against all exceptions can be an exhausting task. In Elixir it's encouraged to "let it crash" and then restart the process in a fresh state. This makes sense in a lot of contexts, e.g. if the network drops then we just restart the process until the network reconnects. Of course there are also situations where it's best to handle some known edge-cases.
We can use
Supervisors to watch our processes and reboot them if they crash. A Supervisor is a type of Process that monitors another Process and then takes some action if the child Process crashes. The Supervisor will manage all life cycle events for the child Process, including startup and shutdown.
Let's add a Supervisor to monitor our KeyValueStore
defmodule KeyValueStore.Supervisor do
use Supervisor
def start_link(opts) do
Supervisor.start_link(__MODULE__, :ok, opts)
end
def init(:ok) do
children = [KeyValueStore]
Supervisor.init(children, strategy: :one_for_one)
end
end
You might notice a couple of things here.
What are these children? A Supervisor can manage more than one Process. Each of the Processes the Supervisor manages are called children.
What is a one for one strategy? The strategy defines the action that happens when a child terminates:
- :one_for_one will restart only the process that terminated.
- :one_for_all which will terminate all and then restart children when any child process terminates.
- :rest_for_one which will restart the process that terminated and terminate and restart all siblings after the terminated process e.g. if we have the following
children = [Module1, Module2, Module3]
If Module1 crashes, all of the processes will be restarted. However, if the Module2 process crashes only the last 2 processes will be restarted.
Starting a Supervisor
When we start the Supervisor, it will call
child_spec/1 on each of its children. The child_spec/1 function will return a specification for how the process should be started. This function is automatically defined for us because we are using the GenServer module in our KeyValueStore.
def child_spec(arg) do
%{
id: __MODULE__,
start: {__MODULE__, :start_link, [arg]}
}
end
If we want to start the process in a different way, we can override this function to customize the args passed to on start. The Supervisor expects the child_spec to return a map with at minimum :id, to identify the process, and :start which defines how the module will be started. You can also pass :restart, :shutdown, and :type. I won't go over those here.
We can see from the child_spec/1 function above that that the start key defines how the module will be started. The contents of the map will be passed to
apply/3. Later on we'll update the child specification in our Supervisor and pass a name to our KeyValueStore GenServer.
Ok, now that the Supervisor has collected all of the child specifications for the children one by one, starting with the first child. It will use the value from the :start key in the child specification map to start the linked process. e.g.
%{
id: KeyValueStore,
start: %{KeyValueStore, :start_link, [name: KeyValueStore]}
}
Let's Take Our KeyValueStore For A Spin
We need to make a small change to our KeyValueStore GenServer. Previously we used
GenServer.start/3 to spawn the process, creating an unlinked process that lives outside of a supervision tree. We want to create a process linked to our supervisor so let's change the GenServer.start/3 function to
GenServer.start_link/3. The Supervisor expects our module to expose this through a start_link function where the single argument is an options map.
defmodule KeyValueStore do
def start_link(opts) do
GenServer.start_link(__MODULE__, :ok, opts)
end
end
langauge-elixir
This change also allows us to name our processes from our Supervisor. Let's do that too.
children = [
{KeyValueStore, name: :primary_key_value_store}
]
langauge-elixir
Ok, now let's spin it up and let make it crash.
KeyValueStore.Supervisor.start_link([])
:primary_key_value_store
|> KeyValueStore.set(:foo, :bar)
=> :ok
:primary_key_value_store
|> KeyValueStore.get(:foo)
=> :bar
GenServer.call(:primary_key_value_store, :invalid_call)
=> [error] GenServer :primary_key_value_store terminating
=> Last message (from #PID<0.501.0>): :invalid_call
=> State: %{foo: :bar}
=> Client #PID<0.501.0> is alive
:primary_key_value_store
|> KeyValueStore.get(:foo)
=> nil
Great! Now our process automatically boots itself back up when it crashes.
What's next?
We've covered how Elixir manages distributed systems and fault tolerance, next we'll take a look at why Elixir is considered soft real-time.
Need help?
Do you have an Elixir project that you need help with? Get in touch with us to see how we can help.