Building a Redis Clone from Scratch – Part 1: In-Memory KV Store with TCP
Greetings,
I hope you all are doing well. Today I'm starting a new project series on this newsletter. We will build a distributed key-value store, like Redis, from scratch using Java.
Obviously it won't be fully featured like Redis, but it will have all the core concepts such as distributed systems, networking, storage engines, consistency models, and consensus algorithms.
One thing to note is that I'm not an expert in distributed systems; I'm just a curious person who is starting to learn distributed systems. That's why I've decided to build this in public to get everyone's reviews.
I might make mistakes while developing this project and will improve them along the way.
What are we going to build?
We will start with a basic hashmap and then try to convert it into a distributed key-value store. Below are the things we are going to cover.
In-Memory KV Store
TCP Server Networking
Simple Command Parser
Snapshotting (Data Persistence)
Multi Node Design & Clustering
Raft Algorithm for leader election
Consistent Hashing for data partitioning
Gossip Protocol for Communication
Expiry & TTL
Pub / Sub Mechanism
Much more if my dopamine does not die…
Part 1 Outlines
Let's first discuss what we are going to implement in this part.
Learn what a Key Value store is.
Simple key-value-based data storing system.
TCP Server to handle GET, SET & DELETE commands to do appropriate actions.
What is a Key Value Store?
A key-value store is one of the simplest forms of databases. Think of it like a dictionary or a hashmap—where you store data as a pair of key
and value
.
The
key
is a unique identifier (like a word).The
value
is the data associated with that key (like the definition).
So when you ask the store for a value using a key, it gives it to you instantly. There’s no fancy querying or relations like in SQL—just raw speed and simplicity.
Under the hood, most key-value stores use an in-memory structure (like a hashmap) to achieve O(1) time complexity for reads and writes, making them blazing fast.
Creating A Simple Storage System
Let's first start by creating a new Java Maven project. Use your favourite IDE to spin up a project and create a new package `store`.
In this store, create a new class named `KVStore` which will have all the logic for storing key-value data.
Here is the code for the KVStore class.
This is a self-explanatory code in which a class KVStore is responsible for storing, retrieving, and deleting data from a ConcurrentHashMap via a method exposed by it.
It follows the Singleton Design Pattern so that only one instance of this class is created for a lifetime.
This class also implements the Serializable interface. I'll discuss this later to move forward without making it complex.
We are using ConcurrentHashMap to store data because it is thread-safe. In case we make this Redis clone multi-threaded, we don't need to care about thread safety.
As of now we have a storage to store, retrieve, and delete key-value-based data.
TCP Server To Handle Requests
External applications will use TCP instead of HTTP to communicate with our database. There is no point in using HTTP here because it will bring extra networking overhead and decrease the speed of our database.
Let’s create another package `server` and create a class `TCPServer` in it.
Let’s start by creating a TCPServer class and providing it some basic stuffs that it needs.
Our class will have a port on which our TCP server will run and an instance of KVStore.
After that, create a method `startServer` that will start the TCP server and wait for connections. When a connection is accepted, it will spawn a new thread and give this connection to another method `handleConnection` that we will implement next.
To understand this code, let's understand how external applications will send commands to our database.
An application can connect to our TCP server and send us the following data via TCP:
SET name sushant
GET name
DEL name
Then in the switch case, we are matching if our request starts from SET, GET, or DEL and performing the appropriate action by calling KVStore methods.
From the TCP connection we will get a string as input. Later we split it via " " and stored it in the tokens array.
Later we are using the tokens array to check what kind of request it is and validate it against some rules. For example:
The GET command must have 2 arguments (GET name).
The SET command must have 3 arguments (SET name Sushant).
The DEL command must have 2 arguments (DEL name).
If any of the conditions fail, we are not executing operations and printing. correct usage of command. This is what I call a simple parsing protocol.
Let’s Plugin Things Together
In the `main` method, create a new instance of `KVStore`. Create an instance of `TCPServer` also and pass the KVStore instance in it and the port on which you want to run TCP.
How To Test?
Once you start your application, it will open port 6349 (or as per your configuration) for TCP connections, and you can use Telnet for connecting and testing your logic.
Use command:
telnet 127.0.0.1 6439
Summary
So in this part we created a simple key-value storing system and a TCP server for accepting commands over TCP and operating on our key-value database. In the next posts, we convert this single application system into a distributed system. You can subscribe to this newsletter for free, and I'll make sure you get those posts directly in your inbox.