Research Papers + Blog posts to read if you are awed by Distributed Systems.

Akshay Kumar
2 min readJan 16, 2021

Lately, I fell in love with Distributed Systems. Mind you, Distributed systems are really hard, and to begin with, I hated working with them initially :) But once I made peace with the nuances of distributed systems, I started loving every aspect of it. This is a list of research papers + blog posts I read or bookmarked for later read collected from various sources. I will keep updating the list as and when I get a chance to.

Disclaimer(s):

I don’t maintain the URLs and they may break anytime. You can simply search for the phrase on Google if that happens and read those papers online.

I am certainly inclined towards papers released by Google since I read their papers most often than not. However, there are plenty more amazing ones out there, and will be happy to include them too. Please respond in the comments.

Distributed Data Processing :

> A relational model of data for Large Shared Data Banks [I would say the birth of SQL and RDBMSes started here. This is not a distributed system paper, but I believe anyone working with databases should read this paper]

> The Rise of Cloud Computing Systems [Has amazing breadth and describes a lot of technologies that make Distributed Systems tick]

> MapReduce: Simplified Data Processing on Large Clusters

> Vision Paper: Towards an Understanding of the Limits of Map-Reduce Computation

> The Google File System [Inspired HDFS]

> Dremel: Interactive Analysis of Web-Scale Datasets [Describes a few techniques that BigQuery uses for its blazing fast speed. I summarized the encoding technique in detail here ]

> Bigtable: A Distributed Storage System for Structured Data [Inspired Cassandra]

> Datacenter-scale Computing

> Spanner: Google’s Globally Distributed Database [Inspired evolution of distributed relational databases possible. Like Cockroach DB. Underlying implementations are different, though.]

> Spark: Cluster Computing with Working Sets

Distributed Cluster Management Systems:

> Large-scale cluster management at Google with Borg

> Omega: flexible, scalable schedulers for large compute clusters

> Kubernetes — Scheduling the Future at Cloud Scale

> Borg: the Next Generation

> Kubernetes 101: Pods, Nodes, Containers, and Clusters [An introductory medium blog post by Daniel Sanche]

Misc :

> What is Load Balancing [I had no idea about load balancing and this is an amazing post that explains the concept from a beginner’s standpoint]

> Why Vector Clocks are Easy

> Why Vector Clocks Are Hard

> Why Cassandra doesn’t need vector clocks

> The Part-Time Parliament [Explains Distributed Consensus protocol — Paxos]

> In Search of an Understandable Consensus Algorithm (Extended Version)

> “The hows and whys of a distributed SQL database” by Alex Robinson

> CRDTs and the Quest for Distributed Consistency

> Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services [This paper explains CAP theorem]

If you are interested in even more research papers on Distributed Systems, you can check a lot of them from Google here

--

--

Akshay Kumar

Aspiring writer | Data Lover | Travel Geek | Polymath | Polyglot | Cloud Maniac