Inside Cassandra: The Internals That Make It Fast and Massively Scalable
Hi,
I’ve been reading about Apache Cassandra since the last 2 weeks. I spent time reading a lot of online blogs and then ended up reading Cassandra: The Definitive Guide. In this post we will discuss what makes Cassandra so much faster and more scalable.
What is Apache Cassandra?
It is a NoSQL database that is aimed at providing high speed and massive scalability. It can run in a distributed environment without needing tons of configuration. It has a write-optimized behavior.
What makes Cassandra different from SQL Databases?
Horizontally Scalable: Cassandra can be easily horizontally scaled by adding more nodes without doing a lot of configuration.
Write Heavy: Cassandra write-heavy systems perform extremely well as compared to SQL databases.
Flexible Schema: In Cassandra each row can have different columns, while the schema is fixed in SQL databases.
No Single Point of Failure: A single master coordinating with all nodes in SQL can become a single point of failure. Cassandra does not have any master-slave mechanism; hence, it avoids a single point of failure.
Data Distribution: In Cassandra data is distributed among all nodes with the help of consistent hashing. This balances data storage in a distributed environment.
Why Cassandra is scalable?
Cassandra is designed to run in a distributed environment that contains a group of Cassandra instances and is called a cluster. There must be at least 3 nodes (instances) of Cassandra to form a cluster. The client can connect to any node in this cluster and perform read/write operations.
A node receiving a write request is called a coordinator node. The coordinator node uses the consistent hashing technique to determine which node will be responsible for storing data. Similarly, when a read request is received, the coordinator node will use the same technique to find which node has the data and get it from it.
Of course a lot of stuff happens between these read & write operations, but we will see them later when we discuss speed.
Adding new nodes to cluster
When a cluster is created, the same nodes are marked as seed nodes. When a new node is added to a cluster, it contacts these seed nodes to learn about cluster topology. Once new nodes learn enough about the cluster, they can operate on their own. Every time a new node is added to the cluster, it will contact seed nodes.
What makes Cassandra so much faster?
Log Structured Storage
Cassandra never updates data “in place.” Instead, all writes are append-only—they go to a commit log and then to an in-memory structure (memtable) and are later flushed to disk as immutable SSTables. Disk writes are sequential, not random, and modern disks, especially SSDs, are much faster with sequential I/O.
Understand it in such a way that Cassandra is a book, and instead of rewriting existing pages, Cassandra adds new pages to the end and regularly cleans up old pages in the background.
No Joins, No Foreign Keys
There is no concept of joins and foreign keys because these make our queries complex, and a complex query will fetch data from multiple tables, which is a CPU-intensive process. Instead Cassandra follows the concept of denormalization, in which the same data is stored in multiple tables for faster read access by sacrificing storage.
Distributed Nature
There is no single master in Cassandra, and hence operations can be handled by any node. There won’t be a situation where a single node is assigned more work.
In Memory Caching
Cassandra uses Memtable (we will discuss this later in this post), an in-memory data structure that holds data until it is persistently written to disk. Hot data can be directly served from memory, which is very fast as compared to disk reads.
Bloom Filters
If data is not available in cache, we must go to disk. But before reading the disk, Cassandra checks if the data is even available or not via Bloom Filter; this prevents unnecessary disk reads.
Tunable Consistency
In Cassandra, copies of data are stored in several nodes for fault tolerance. While writing data, we can configure from how many nodes we want acknowledgement for a successful write. If you set consistency to ONE, Cassandra can return success after the first replica responds—much faster than waiting for all nodes.
Important Data Structures & Mechanism
There are several data structures and mechanisms in Cassandra that make writing and reading data much faster.
Commit Log
An append-only data structure that prevents in-place updating of data. This is like a registry that sequentially writes all the operations Cassandra needs to do and keeps doing those operations in the background. The client application does not need to wait till the operation is completed. If a write operation is successfully logged into the commit log, it will be considered a success.
Memtable
It is an in-memory write buffer for recent writes. The commit log hands over the write to the memtable, which stores it in memory. Once the size of the Memtable is full, all the data is flushed (stored permanently) into the disk, and a new Memtable is created. Each table has its own Memtable.
SSTable
SSTable stands for Sorted String Table. It is the actual structure that stores data into disk. It is an immutable file on disk that stores rows of data in a sorted order by partition key. Once written, an SSTable is never modified; updates or deletions create new SSTables instead.
How data writing works in Cassandra?
Client sends write request to a Cassandra node.
The coordinator node identifies which node is responsible for storing the data using consistent hashing.
The coordinator node sends data to the owner node.
The owner node writes data into the commit log and returns a success response.
Client application continues.
In background data is sent from Commit Log to Memtable and then to SSTable. While the client application does not need to wait for completion of this process.
How data reading works in Cassandra?
Client sends read request.
The coordinator node identifies which node has the required data.
The coordinator node requests data from the owner node.
The owner node checks if data is present in the row cache.
If not, Key Cache is checked to see which SSTable might have the required data.
A bloom filter is used to check if data is really available before reading SSTable.
If yes, data is retrieved from SSTable.
Final Thoughts
So this was how Cassandra works. If you find this post good you can subscribe to this newsletter. It’s completely free.
Thank You