Let’s study distributed systems — 4. Leader election

Hidetatsu YAGINUMA
5 min readDec 28, 2019
From: Flickr, no modifications are made.

In distributed systems, sometimes we need to choose only one leader from some nodes. Choosing a leader is the same as granting a special privilege to the node. The process to choose a leader is often called Leader Election. The goal of leader election is that after leader election, every nodes, of course including the leader itself, knows exactly who is a leader.

Why do we want a leader?

Leader is basically expected to do some tasks which should be executed only once. Let’s say there are 3 processes. Now we want to send some WRITE requests to this distributed system. After the request, these WRITE requests must be applied to all the nodes. Of course, there must not be a data inconsistency.

Because there are multiple nodes, after one node accepts a request, it must propagate it to other nodes. If we implement it without consideration for locks or some other techniques to avoid data inconsistency, then it will cause a problem especially multiple requests comes almost at the same time.

One of the techniques of them is letting leader to manage WRITE requests. Only a leader node accepts WRITE requests, then propagate them to follower nodes. Of course, READ can be handled by non-leader nodes. This architecture, “only a leader handles WRITE requests” is used in Apache ZooKeeper[1].

To achieve what leader election can, instead of leader election, there are some other solutions.

  • Idempotent API — If a leader in the system is managing exactly-once execution, then it might be able to be replaced with idempotent-retry. If the API guarantees idempotency, it’s usually easier to implement/understand the system. Basically, in distributed systems, it’s nothing special that some processes are broken. So despite if a leader exists, implementing things idempotent is good.
  • Optimistic locking/CAS — Optimistic locking is another approach. When a WRITE request is coming, fetch original data first and save it on memory. Then, check fetched data before updating, if it’s already modified, then it cannot update the record. This is a similar approach with Idempotent API.
  • Workflow engine — Sometimes a leader manages workflow in a system. In the case, it might be able to be replaced with workflow engine such as Apache Airflow, .NET state machine workflow, or AWS Step Functions.

The benefits of leader election are;

  • There will be only 1 place to manage system’s concurrency. It makes easy to design system for human.
  • A leader can understand and control all the request in the system, so it can provide consistency to clients.

However, having a leader can cause other complexities.

  • Basically, an algorithm which chooses single leader is difficult. Especially, fully considering failure model and being able to work in real world is much complicated. Obviously, single leader is a single point of failure. Because a leader process can go down, sometimes we need to choose another leader, if the system doesn’t work without a leader. In distributed systems, sometimes the dead leader gets back to life. Always having only 1 leader is not easy.
  • Leader sometimes prevent system’s scalability. Leader can be a bottleneck when number of follower processes is increased.
  • If a leader is making something wrong, usually it causes problems to a huge amount of whole system.

Algorithm of Leader election

There are many leader election (distributed consensus) algorithms such as Leases[2], Raft[3] and Paxos[4]. In this section, let me explain a simple leader election algorithm which is none of them. I will use Token Ring network as an example, which is described in my previous post. I already described that distributed snapshot algorithm can detect that there is no token in the network. After the detection, it needs to regenerate a token in the network. In this case, we can do leader election and let the leader to to that. The same as previous post, we assume that each channel is FIFO.

Each process has its own ID, ID must be a unique, natural number. In this algorithm, a process which has the most biggest ID (and participating the election as a candidate) will become a leader. To know other processes’ ID, they will work like this pseudo code. As already described in previous post, in token ring network, processes are located like ring. each node has left node and right node, and it is not connected to any other nodes.

In this example, because the purpose of election generating token, follower processes don’t know which process is a leader. Then only know that they are not leader.

Class Process {
int ID
String status

Process() {
this.status = "waiting"
}

onReceiveElectionStart() {
if this.status == "waiting" {
sendToLeft(this.ID)
this.status = "candidate"
}
}
onReceiveIDFromRight(int ID) {
if ID == this.ID && this.status == "candidate" {
// My ID came from right
this.status == "leader"
return
}
if this.status == "waiting" || ID > this.ID {
// should not be a leader
this.status == "not_leader"
}
if ID != this.ID {
sendToLeft(ID)
}
}
}

Each process has ID and status. The status can be “waiting”… waiting leader election starts, “candidate” … is ready to be a leader, “not_leader” … not leader, “leader” … is leader. When token is missing, arbitrary processes receive Election Start Message. Only they becomes candidate of leader in the election. Processes which are not participating the election receives ID, but they just pass it to next process. If candidates received bigger ID than mine, then pass the received ID. Finally, only 1 process will receive its ID, which is originally passed by itself.

2 Leaders and no leaders

Because this is distributed systems, sometimes leader disappears all of a sudden. When leader disappears, basically, next leader will be elected quickly. However, next leader doesn’t know that previous leader finished its task. Usually, finishing tasks and telling other process tasks are finished are not atomic; so next leader might miss finishing previous task or execute task twice.

Or, 2 leaders might exist when trouble is happening in the system. In both case, the point is idempotency; if each task is idempotent, next leader can execute previous task confidently. Or, even multiple leaders can be acceptable.

Conclusion

In this article, it is described that how leader election work. Basically leader election is very useful tool for human, but it also causes another complexity. To use it in a better way, it is important to take care of idempotency always.

References

--

--

Hidetatsu YAGINUMA

Hidetatsu loves infrastructures, database, concurrent programming, transactions, distributed systems… https://github.com/hidetatz https://hidetatz.io