Chia Zhe Min

Takeaways and thoughts from flipping through the software developer's bible for building distributed systems.

DDIA gives the entire context behind current systems and patterns, design considerations and therefore the whys to every critical pattern, structure, offering, capability, as well as pros vs cons

Important concepts like serializability vs materializing conflicts

Race conditions in distributed databases

Religiously reading through this bible helps you to stop abstracting away the important concepts, concepts and details in distributed software and finally start understanding how to address common points of failure.

Contention in a database workload refers to a situation where multiple transactions or queries compete for the same resources, leading to performance bottlenecks. This can happen due to:

Lock Contention – Multiple transactions try to access the same row, table, or index, causing delays due to waiting for locks to be released.
CPU Contention – Too many queries compete for CPU resources, slowing down execution.
I/O Contention – Queries overload disk or network I/O, leading to slow reads/writes.
Memory Contention – Insufficient memory for caching or query execution, causing excessive disk swapping.

Reducing contention often involves indexing, query optimization, workload balancing, or using concurrency control mechanisms like MVCC (Multiversion Concurrency Control).

Detailed Explanation of Locks and Related Concepts

Shared Locks (S Locks)
- A shared lock allows multiple transactions to read a resource (e.g., row, page, or table) but not modify it.
- Other transactions can also acquire a shared lock simultaneously, ensuring consistent reads.
- However, if a transaction wants to modify the resource, it must wait until all shared locks are released.
- Used in SELECT ... FOR SHARE queries in some databases.
Exclusive Locks (X Locks)
- An exclusive lock prevents all other transactions from accessing the locked resource in any way (neither reading nor writing).
- This ensures no other transaction can interfere while the locking transaction makes changes.
- Typically applied during INSERT, UPDATE, or DELETE operations.
- Used in SELECT ... FOR UPDATE queries in some databases.
Predicate Locks
- A special type of lock that applies to a condition or range of values rather than specific rows.
- Used in Serializable Isolation to prevent phantom reads (i.e., new rows appearing unexpectedly in a transaction).
- For example, if a transaction runs SELECT * FROM orders WHERE amount > 1000;, a predicate lock would prevent new rows from being inserted that match this condition (amount > 1000).
- Implemented in Serializable Snapshot Isolation (SSI) in some databases.
Phantom Reads & Phantom Locks
- A phantom read occurs when a transaction reads a set of rows, but a concurrent transaction inserts or deletes rows that match the same query criteria, leading to inconsistent results.
- Example:
  1. Transaction A: SELECT * FROM employees WHERE salary > 50000;
  2. Transaction B: INSERT INTO employees (id, name, salary) VALUES (10, 'Alice', 60000);
  3. Transaction A runs the same query again and sees a new row that wasn’t there before.
- Phantom locks (or predicate locks) prevent phantom reads by locking ranges of data rather than individual rows.

Summary Table

Lock Type	What It Does	Allows Concurrent Reads?	Allows Concurrent Writes?	Use Case Example
Shared Lock (S)	Allows multiple transactions to read a resource	✅ Yes	❌ No	SELECT ... FOR SHARE
Exclusive Lock (X)	Prevents all other transactions from accessing the resource	❌ No	❌ No	UPDATE or DELETE
Predicate Lock	Locks a condition (e.g., amount > 1000) instead of specific rows	✅ Yes (if condition does not change)	❌ No (prevents inserts that match the condition)	Ensuring serializable isolation
Phantom Reads	A read anomaly where new rows appear during a transaction	Can occur	Can occur	Running the same query twice and getting different results due to new rows

Use Case / Priority	Use Functional Programming (FP)	Use Object-Oriented Programming (OOP)
Data transformation / pipelines	✔️ map/filter/reduce style fits well	❌ Not ideal
Concurrency / parallelism	✔️ Stateless, easier to parallelize	⚠️ Requires careful state handling
Immutability / predictability	✔️ Core feature	❌ Must be enforced manually
Modeling real-world entities	❌ Less intuitive	✔️ Natural fit
Large, stateful applications	⚠️ Can be complex	✔️ Designed for this
Code reuse	✔️ Via composition	✔️ Via inheritance/polymorphism
Testability	✔️ Pure functions = easier tests	⚠️ More mocks/stubs needed

🔹 Use FP when you're doing data-heavy, stateless, or concurrent logic. 🔹 Use OOP when you're modeling real-world systems with entities and state. 🔹 Use both when your language allows it — they complement each other well.

More to come...

[WIP] Designing Data-Intensive Applications

Detailed Explanation of Locks and Related Concepts

Summary Table