Let’s study distributed systems — 1. Introduction

Why do we create distributed systems?

In most cases, there are 2 main purposes to build distributed systems;

  • To achieve high performance which single computer cannot reach
  • To keep working when some computers become unavailable

Hadoop — distributed data processing

The Apache Hadoop is a framework that enables us to process large data sets across clusters of computers.

HDFS

HDFS (Hadoop Distributed File Systems) is a distributed file systems which is expected to be used for Hadoop. To use many computers more effectively, we want to distribute data into multiple computers equally as much as possible. And, we need to copy the same data to multiple nodes because we want to avoid data lost when a node crashes. In Hadoop, multiple Datanodes have a part of data.

HDFS Architecture guide
(From HDFS Architecture guide)

Hadoop MapReduce

As the same reason, Hadoop supports distributing tasks (processing each data). In Hadoop, how data processing works consists of 2 parts; MapReduce processing platform and MapReduce application. In this article I omit description about MapReduce application because it’s not very related distributed systems itself. If you are interested, you can refer to the official tutorial.

The benefit of Distributed systems

Because Hadoop works as distributed systems, we will be getting some benefits.

  • If dataset is huge, processing time can be reduced
  • If one of Datanodes or TaskTrackers crashes, processing can be proceeded

So what’s the downside?

However, you will also be facing some difficulties of distributed systems.

  • If Namenode or JobTracker crashes, what will happen?
  • How to ensure that multiple copies of data across Datanodes are the same?

Will my system become really faster?

Also, we need to think about if we really can get benefits of distributed systems.

(From: Latency Numbers Every Programmer Should Know)

Conclusion and What’s next?

In this article, I tried to explain what is distributed systems and why we want to choose it. I’m really happy if this article can be a chance for you to dive into distributed systems.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Hidetatsu YAGINUMA

Hidetatsu YAGINUMA

Hidetatsu loves infrastructures, database, concurrent programming, transactions, distributed systems… https://github.com/hidetatz https://hidetatz.io