Setting the Stage for Distributed Storage Link to heading
Non-distributed Storage (Single-Master System):
Imagine your favorite local coffee shop. Keeping track of customer preferences on a simple notepad (single-server database) is a breeze. New entries are readily added, and the system is clear-cut. However, as the shop’s popularity explodes, the notepad becomes a bottleneck.
Read Replication Link to heading
Read Replication:
- The coffee shop thrives and attracts more customers.
- To handle the increased workload, the barista team expands.
- The single pad of paper (master database) is replicated for the new baristas (read replicas).
- The head barista (master) updates the original pad, then copies the changes to the replicas (asynchronously).
Benefits:
- Scales read traffic: Customers can be served faster by multiple baristas (read replicas) accessing customer preferences.
Drawbacks:
- Eventual Consistency: Updates might not be immediately reflected on all copies. Ordering a new favorite drink might not be available to all baristas right away.
- Increased Complexity: Managing and maintaining multiple copies introduces operational overhead.
- Limited for Write-Heavy Workloads: Read replication is not ideal for scenarios with frequent updates (writes).
Sharidng Link to heading
Sharding for Scalability:
- Read replication becomes insufficient for a highly popular coffee shop (heavy write workload).
- A single head barista updating all favorite drinks creates a bottleneck.
Sharding Approach:
- Sharding distributes the workload by splitting the data based on a key (customer name).
- Two head baristas (shards) are responsible for different name ranges (alphabetical).
- Each shard maintains its own read replica set for scalability.
Benefits:
- Increases write throughput by distributing updates across multiple shards.
Drawbacks:
- Increased Complexity: Clients need to know which shard to access for their data (routing).
- Limited Data Model: Sharding works best with data models where every query has a common key.
- Limited Data Access Patterns: Complex queries that span multiple shards require gathering data from all shards (scatter-gather), reducing efficiency.
Reference Link to heading
Distributed Systems in One Lesson by Tim Berglund: https://www.youtube.com/watch?v=Y6Ev8GIlbxc&ab_channel=DevoxxPoland