Let’s study distributed systems — 3. Distributed snapshots

Hidetatsu YAGINUMA
7 min readDec 22, 2019
From: Flickr

Hi there! Thank you for reading through my series of posts — Let’s study distributed systems. In my previous post, I wrote things about clock.

In this article, I am going to write something about distributed snapshots. First, let’s study what distributed snapshots is, and how it works. Then, let’s take a look in how we can use it.

What is distributed snapshots?

The problem is set like “How to capture current global state of a distributed systems”. Global state is the state of whole distributed system. Because distributed systems has multiple processes and each process works independently, it’s difficult to capture the status of every process at the same time.

In distributed snapshots world, we assume some things;

  • Channels are error-free
  • Buffer capacity is infinite
  • Message delay is indefinite, but finite
  • Message delivery is in-order — Messages are always sent in FIFO. No order change can be happen

Why is it difficult? Let’s see a simple example

Let’s say there are 2 processes which manages a state of 2 user’s coins. Each user already has 1000 coins, and they can send some of their coins to the other user. And they cannot consume coins in any other way.

Initial state is like this; Alice and Bob has 1000 coins.

I gave up to write diagram by google drawing — It’s too hard

After sending 100 coins from Alice to bob, then global state will be like this.

Be taking distributed snapshot, we want to know how many coins each user has. Because both user has 1000 coins, there are 2000 coins in the whole system.

When we try to take snapshots of both process, Bob tried to send 200 coins to Alice, unfortunately. Like;

  • Process B (Bob) sends 200 coins to Process A (Alice)
  • Process B captures its state as snapshot and save it
  • Process A captures its state as snapshot and save it
  • Process A receives 200 coins from B
Snapshot shows only 1800 coins

In this case, it seems that 200 coins are missing from system — which is actually in the channel between B and A.

This case causes missing resource, but in another case, there can be resource duplication. Suppose;

  • Process A captures its state as snapshot and save it
  • Process A sends 200 coins to Process B
  • Process A receives 200 coins from B
  • Process B captures its state as snapshot and save it
In this case, there are 2200 coins!

In this case total amount of coins is 2200 — Seems something wrong is happening in the system.

These snapshots are obviously not useful. These won’t help to understand what the system’s current state is. The problem of this snapshots model is that it’s only taking process’s snapshot. But we need channel’s snapshot as well. In next section, let’s explore how distributed snapshots algorithm can resolve this problem.

How distributed snapshots algorithm work?

In distributed snapshots, there is a special message which is called “Marker” . Marker represents a logical point of time.

The algorithm starts from a process. This is the flow of it;

/* Process P which starts the algorithm */
* Process P saves its state
* Send marker message to all the channels which is connected to P
* Loop - P receives a message from a channel until P received marker message from all the channels
* If the message is not a marker message, save message in channel's state
// -----

/* Other processes Pn */
* Process Pn saves its state when it received first marker message
* Send marker message to all the channels which is connected to Pn
* Loop - Pn receives a message from a channel until Pn received marker message from all the channels
* If the message is not a marker message, save message in channel's state

Let's see how it works using previous coins example.

M is marker. The order of M and 200 coins must not change.
  • Process A saves its state
  • Process A sends marker to Process B
  • Process B saves its state
  • Process B sends marker to Process A
  • Process B received marker message from all the channels, loop immediately stops
  • Process A received marker message from all the channels, loop immediately stops

Let’s see another example. This is an image which represents 3 processes system.

  • P2 starts the algorithm. P2 sends m2 (message2) then saves its state s2
  • P2 sends marker to P1 and P3
  • P2 receives m3, which comes before marker from P3 then P2 saves m3. Do the same thing for m1 from P1
  • P1 receives marker from P2, then P1 saves s1. Then, P1 sends marker to P2 and P3
  • Then P1 receives m4, but it’s not saved in the snapshot because it’s after marker.
  • P3 receives marker, then P3 saves s3. Then, P3 sends marker to P1 and P2. Because m5 is before marker from P1, it’s also saved.

Finally, we could get a set of process’s state (s1, s2, s3) and a set of channel’s state (m1, m3, m5) as this system’s snapshot.

Message delay

In the algorithm, message can delay as written above. Because of it, the algorithm can process different snapshot, even if the process’s work is the same. It doesn’t mean that these snapshots are invalid.

Distributed snapshots use cases

There are some use cases which distributed snapshots algorithm can be applied. In this section, 2 use cases are introduced.

Fault detection — Token Ring Network

Token ring network is a standard of networking technology which can be use in Data Link layer in OSI model. In Ethernet, there are multiple processes which can send messages. However, when multiple processes send message at the same time, connection will be invalid. To avoid it, when they send a packet in a network and it conflicts with other packet, then it stops sending and wait for random time.

In Token Ring, processes are placed like “ring” and there is always a token in the ring. There are 2 kind of tokens: Free token which represents “connection is free” and “new message can be sent”. Busy token which represents “connection is busy” and “do not send new message”.

At first, there is only 1 free token in the ring. The token is always going through the ring. When P1 wants to sent message to P2, P1 “catches” the free token, then changes it to a busy token and attach the message to it. When P2 receives it, it returns to P1, then P1 changes it to a free token again. When the token is busy token, no other processes can send message. To know more, please refer to IEEE 802.5.

In token ring network, there must not be multiple tokens. Also, missing token can cause some problems. When there are multiple tokens, then multiple processes might be able to send messages at the same time. If a token is missing, then no processes can send messages. To make token ring network work properly, the number of token must be always 1. To detect the state of token ring network, distributed snapshot algorithm can be applicable. After taking snapshot, if “number of tokens in processes” + “number of tokens in messages” != 1 then, it’s in invalid state!

Recovery from incident

Another use case is recovering from broken state. Because this is “snapshot”, we can create a system which can be recovered to an arbitrary snapshot. First, let’s see how processes are broken.

All of S1, S2, S3 must be recovered

In the diagram, P3 is in broken state. To recover the system, we need to rollback P1, P2 and P3.

When recovering to a snapshot, each process’s state must be recovered to one which is in snapshot, AND channel’s state must be recovered to contain saved messages.

In this diagram, even though broken process is only P3, other processes must be recovered as well. If only P3 gets recovered, then m5 must be missing. It causes invalid state, so both of processes and channels must be recovered.

Conclusion and What’s next?

Thank you for reading this article! In this article, we took a look into how distributed snapshots are created, and how it can be used in a real world.

In next article, we will be talking about leader election. In distributed systems, we sometimes need to elect only 1 representative process and every process must know that. Leader election algorithm is often used in server clustering.

Thank you.

References

--

--

Hidetatsu YAGINUMA

Hidetatsu loves infrastructures, database, concurrent programming, transactions, distributed systems… https://github.com/hidetatz https://hidetatz.io