System Design Masterclass | HLD & LLD Deep Dive

Module 01

Prerequisites & Foundations

Before we architect skyscrapers, we pour the foundation. This module ensures you have the conceptual toolkit — networking, databases, concurrency, and environment setup — required to reason about distributed systems without hand-waving.

1.1 What You Need Before Starting

System design sits at the intersection of computer science, software engineering, and product thinking. You do not need a PhD, but you do need comfort with certain primitives. Think of it like learning to navigate the open ocean: you don't need to build the boat on day one, but you must understand tides, compass headings, and how sails catch wind.

✅ Required Comfort Level

• Basic programming in any language (Python, Java, JavaScript, Go)
• Understanding of variables, functions, loops, and basic data structures
• Familiarity with HTTP (you've made or consumed a REST API)
• Willingness to learn math-lite concepts (Big-O, percentages, orders of magnitude)

🎯 Helpful But Not Mandatory

• Prior backend or full-stack development experience
• Exposure to cloud platforms (AWS, GCP, Azure)
• Database query experience (SQL or NoSQL)
• Systems programming or OS course material

Analogy — The Island Cartographer: A cartographer mapping an archipelago doesn't need to have sailed every route, but they must understand scale, coordinates, and how islands connect via shipping lanes. System designers map software islands (services) connected by network lanes (APIs, queues, databases).

1.2 Networking Fundamentals

Every distributed system is, at its core, computers talking to each other over a network. When you design a chat app, a payment gateway, or a video streaming platform, you are really designing who talks to whom, over what protocol, with what latency budget, and what happens when the message never arrives.

The Network Stack — How Data Travels from App to Wire

Key Concepts You Must Internalize

IP Address & DNS

An IP address is a street address for a machine. DNS is the phone book that translates api.example.com into 203.0.113.42. In system design, DNS is also a load distribution tool (round-robin, geo-routing).

TCP vs UDP

TCP is reliable, ordered, connection-oriented — like registered mail with delivery confirmation. Use it for HTTP, database connections, file transfers. UDP is fire-and-forget — like shouting across a lagoon. Use it for live video, gaming, DNS queries where speed beats guaranteed delivery.

HTTP/HTTPS & REST

HTTP is the lingua franca of web APIs. REST is an architectural style using HTTP verbs (GET, POST, PUT, DELETE) on resources identified by URLs. HTTPS adds TLS encryption — non-negotiable for production systems handling user data.

Latency, Bandwidth, Throughput

Latency is how long one request takes (ms). Bandwidth is pipe width (Mbps). Throughput is completed requests per second (RPS/QPS). A wide pipe (bandwidth) doesn't help if each message takes forever (latency).

1.3 Database Fundamentals

Data is the treasure buried on every island in your architecture. Choosing where and how to store it determines consistency, scalability, and operational complexity. At a foundation level, understand the two great families of databases and their trade-offs.

SQL vs NoSQL — When to Use Which

The ACID vs BASE Mental Model

ACID (Atomicity, Consistency, Isolation, Durability) guarantees that database transactions behave predictably — critical for banking. BASE (Basically Available, Soft state, Eventually consistent) accepts temporary inconsistency in exchange for availability and partition tolerance — common in globally distributed systems. You'll revisit this deeply when we cover CAP theorem in Module 13.

-- Foundational SQL you'll encounter in LLD discussions
CREATE TABLE users (
  id         BIGSERIAL PRIMARY KEY,
  email      VARCHAR(255) UNIQUE NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_users_email ON users(email);
-- Indexes are the "table of contents" — trade write speed for read speed

1.4 Computer Science Building Blocks

These concepts appear in every system design discussion. You don't need to implement a B-tree from scratch, but you must speak fluently about them when justifying design decisions.

Concept	Layman's Terms	System Design Relevance
Big-O Notation	How cost grows as input grows	Choosing algorithms, estimating database query cost
Hash Tables	Instant lookup by key (O(1) average)	Caching, sharding keys, consistent hashing
Trees & Graphs	Hierarchical or connected data structures	File systems, org charts, social networks, DNS
Queues & Stacks	FIFO vs LIFO processing order	Message queues, job schedulers, undo buffers
Concurrency	Multiple things happening "at once"	Thread pools, race conditions, locks, async I/O
Memory Hierarchy	Fast/small (CPU cache) → slow/big (disk)	Why caching layers exist at every level

The Memory Hierarchy — Why Caching Is Everywhere

1.5 Environment & Tooling Setup

Hands-on practice reinforces theory. Set up a lightweight environment for sketching architectures, running local services, and experimenting with APIs.

1. Diagramming Tools

• Excalidraw (free, hand-drawn aesthetic) — great for interviews
• draw.io / diagrams.net — professional architecture diagrams
• Mermaid — diagram-as-code in Markdown (used in this course)

2. Local Development Stack

# Recommended baseline tooling
# macOS (Homebrew)
brew install git node python@3.12 docker

# Verify installations
git --version && node --version && python3 --version && docker --version

# Optional: run local Redis + PostgreSQL via Docker
docker run -d --name local-redis -p 6379:6379 redis:7-alpine
docker run -d --name local-postgres -e POSTGRES_PASSWORD=dev \
  -p 5432:5432 postgres:16-alpine

3. API Testing

Install curl (built into macOS/Linux) or use Postman / HTTPie to probe REST endpoints. Understanding request/response cycles is essential for API design modules later.

curl -X GET https://api.github.com/users/octocat
curl -X POST https://httpbin.org/post -H "Content-Type: application/json" \
  -d '{"message": "hello from system design course"}'

1.6 How to Use This Masterclass

Read sequentially first. Modules build on each other. Skipping to "Design Twitter" without understanding caching is like sailing without charts.
Sketch as you read. Redraw every diagram from memory on paper or Excalidraw. Active recall beats passive reading 10:1.
Complete every quiz. Each module ends with MCQs designed to surface gaps in understanding. Read explanations even for questions you got right.
Time-box deep dives. Aim for 15–20 minutes per module section, 5 minutes per quiz. The full course targets 5–6 hours.
Revisit case studies. Modules 16–18 apply everything. Return to them after completing the theory modules.

Module 02

Introduction to System Design

What exactly is "system design"? Why does every senior engineering interview include it? This module defines the discipline, distinguishes high-level from low-level design, and introduces the structured thinking framework you'll use throughout this course.

2.1 What Is System Design?

System design is the process of defining the architecture, components, modules, interfaces, and data flows for a software system to satisfy specified requirements. It answers: "Given a problem like 'build Instagram' or 'process 1 million payments per hour,' how do we decompose it into reliable, scalable, maintainable pieces?"

It is not coding. It is not picking React vs Vue. It is the engineering decision-making layer that sits above individual features — deciding how services communicate, where state lives, what fails gracefully, and what trade-offs you accept.

Analogy — Designing a Resort Island: Building one beach bungalow is "feature development." Designing the entire resort — where the power plant goes, how freshwater reaches every villa, how guests move between islands, what happens during a hurricane — that's system design. You're the master planner, not the carpenter.

The System Design Landscape

2.2 High-Level Design vs Low-Level Design

The single most important distinction in this entire course. Confusing HLD with LLD is like confusing a city zoning map with a building's floor plan — both are "design," but they operate at different zoom levels.

Dimension	High-Level Design (HLD)	Low-Level Design (LLD)
Focus	Architecture & components	Internal structure & logic
Audience	Architects, tech leads, interviewers	Implementing engineers
Artifacts	Architecture diagrams, data flow, API contracts	Class diagrams, sequence diagrams, ER schemas
Key Questions	Microservices or monolith? SQL or NoSQL?	Which pattern? Which data structure? Thread-safe?
Example (URL Shortener)	Load balancer → API servers → Redis cache → DB cluster	`UrlService.createShortUrl()`, base62 encoding, DB schema
When in SDLC	Early — before major implementation	Just before / during implementation

HLD Example — URL Shortener Architecture (Bird's-Eye View)

2.3 Functional vs Non-Functional Requirements

Every system design session begins with requirements. Splitting them correctly prevents you from over-engineering features nobody asked for while ignoring the constraints that actually break production systems.

Functional Requirements (FRs)

What the system does — features and behaviors.

• Users can create a short URL from a long URL
• Users can redirect via the short URL
• Users can view click analytics
• Custom alias URLs are supported

Non-Functional Requirements (NFRs)

How well the system performs — quality attributes.

• Scalability: 100M URLs, 10K reads/sec
• Availability: 99.99% uptime
• Latency: Redirect < 100ms p99
• Durability: Zero URL data loss
• Security: Rate limiting, abuse prevention

Pro tip: In interviews, always clarify NFRs before drawing boxes. "How many users? Read/write ratio? Latency target? Consistency requirements?" These numbers drive every subsequent decision — cache or not, SQL or NoSQL, sync or async.

2.4 The Structured Design Process

Senior engineers don't freestyle. They follow a repeatable framework that ensures nothing critical is missed. Internalize this 7-step process — you'll use it in every case study module.

The 7-Step System Design Framework

Step 2 in Action: Back-of-the-Envelope Math

Estimation separates senior engineers from junior ones. You don't need exact numbers — you need the right order of magnitude.

Example: URL Shortener scale estimation
─────────────────────────────────────
Assumptions:
  • 100M new URLs/month
  • Read:Write ratio = 100:1
  • Average URL stored = 500 bytes
  • Retention = 5 years

Writes/sec  = 100M / (30 × 24 × 3600) ≈ 40 writes/sec
Reads/sec   = 40 × 100 = 4,000 reads/sec
Storage     = 100M × 12 × 5 × 500B = 3 TB (raw, before replication)

→ Reads dominate → aggressive caching (Redis) is essential
→ Writes are modest → single DB shard may suffice initially
→ 3 TB is manageable → no exotic storage needed yet

2.5 Trade-offs — The Core Skill

There is no perfect architecture — only trade-offs aligned with requirements. System design is less about finding the "right answer" and more about articulating why you chose A over B given constraints C and D.

Common System Design Trade-off Spectrum

When you propose a design, always pair it with: "I'm choosing X because [requirement]. The trade-off is Y, which we mitigate by Z." This sentence structure alone elevates interview performance dramatically.

2.6 Architectural Styles Preview

Before diving into individual patterns in later modules, orient yourself to the three dominant architectural styles. We'll explore each in depth in Module 4.

🏠

Monolith

Single deployable unit. Simple to develop and debug. Harder to scale individual components.

🏘️

Microservices

Independent services with own databases. Scales teams and components. Adds network complexity.

⚡

Serverless

Functions as a service. Zero server management. Cold starts and vendor lock-in are trade-offs.

Module 03

Requirements Engineering & Constraints

The difference between a senior engineer and a junior one often appears in the first five minutes of a design session: seniors interrogate requirements before touching a whiteboard. This module teaches you to extract, classify, prioritize, and constrain requirements like an architect surveying land before laying a foundation.

3.1 Why Requirements Come First

A system designed for 100 users looks nothing like one designed for 100 million. A banking ledger demands different consistency guarantees than a social media "like" counter. Requirements are the contract between problem and solution — miss them, and you optimize for the wrong thing entirely.

Analogy — The Tide Surveyor: Before building a pier, surveyors measure tides, storm surges, and seabed depth. Skipping this step means your pier floods at high tide or collapses in a storm. Requirements engineering is your tide survey — it tells you what forces your system must withstand.

Requirements Flow Into Every Design Decision

3.2 Functional Requirements — The Feature Contract

Functional requirements describe what the system must do. They should be specific, testable, and unambiguous. Vague requirements like "the system should be fast" are NFRs in disguise — functional requirements name concrete behaviors.

Writing Testable Functional Requirements

Weak (Vague)	Strong (Testable)
Users can share content	Users can generate a shareable link valid for 7 days
System handles search	Users can search products by name with prefix matching
Support notifications	Users receive push notifications within 30s of a new message

User Stories vs Use Cases

User stories (Agile): "As a [role], I want [feature], so that [benefit]." Great for prioritization. Use cases: step-by-step interaction flows including edge cases and alternate paths. Great for design completeness. In system design interviews, narrate use cases aloud while the interviewer nods or redirects.

Use Case: Create Short URL
─────────────────────────
Actor:     Registered user
Precond:   User is authenticated
Main flow:
  1. User submits long URL
  2. System validates URL format
  3. System generates unique 7-char code
  4. System persists mapping
  5. System returns short URL
Alt flow 3a: User provides custom alias → check uniqueness
Alt flow 4a: DB unavailable → return 503, do not return broken link

3.3 Non-Functional Requirements — The Quality Taxonomy

NFRs are where system design lives. They are often under-specified in interviews on purpose — the interviewer wants to see you ask the right clarifying questions. Memorize this taxonomy:

The NFR Wheel — Eight Quality Attributes

Scalability

Can the system handle 10× growth without redesign? Horizontal vs vertical scaling path.

Availability

Uptime SLA (99.9% = 8.7 hrs downtime/year). Redundancy, failover.

Latency

p50, p95, p99 response times. Tail latency matters at scale.

Durability

Once acknowledged, data survives crashes. Replication, backups.

The "ilities" interview trick: When stuck, run through scalability, availability, reliability, maintainability, security, latency, durability, and cost. You will surface at least two NFRs the interviewer expected you to ask about.

3.4 Constraints — The Boundaries You Cannot Cross

Constraints are hard limits. Unlike NFRs (which are targets you optimize toward), constraints are immovable walls. Ignoring them invalidates your entire design.

Technical Constraints

Must use existing PostgreSQL cluster. Must integrate with legacy SOAP API. Must run on-premises (no cloud). Team only knows Python.

Business Constraints

Launch in 3 months. Budget capped at $5K/month infra. Must support only US users initially. No third-party data sharing.

Regulatory Constraints

GDPR (EU data residency). HIPAA (health data encryption). PCI-DSS (payment card handling). SOC 2 audit requirements.

Constraint Triangle — Pick Two, Sacrifice One

3.5 Scope Management — MoSCoW Prioritization

Not every requirement ships in v1. MoSCoW prevents scope creep and forces explicit prioritization — critical in interviews when the interviewer says "we have 45 minutes."

Priority	Meaning	URL Shortener Example
Must Have	Non-negotiable for launch	Create short URL, redirect, uniqueness
Should Have	Important but not blocking	Custom aliases, expiration dates
Could Have	Nice if time permits	Click analytics dashboard
Won't Have	Explicitly out of scope	QR code generation, A/B testing

In interviews, state your assumptions: "I'll treat analytics as a Should Have and focus the core design on redirect latency and scale." This shows product thinking and time management.

3.6 The Clarification Question Bank

Memorize and adapt these questions. Ask 5–8 at the start of any design session. The answers reshape your entire architecture.

SCALE
  • How many daily active users (DAU)? Total registered users?
  • Read-to-write ratio?
  • Expected growth rate (6 months, 1 year)?
  • Peak vs average traffic (burst factor)?

PERFORMANCE
  • Latency targets (p50, p99)?
  • Throughput (requests/sec)?
  • Real-time vs batch acceptable?

DATA
  • How much data stored per entity?
  • Retention period?
  • Consistency requirements (strong vs eventual)?
  • Can we lose data? Under what conditions?

USERS & GEO
  • Global or single region?
  • Mobile, web, or both?
  • Authenticated vs anonymous users?

CONSTRAINTS
  • Existing tech stack or greenfield?
  • Budget / team size / timeline?
  • Regulatory requirements (GDPR, HIPAA)?

Module 04

High-Level Design (HLD) Architecture

High-Level Design is the art of drawing the right boxes and arrows. You decide what major components exist, how they communicate, and where data flows — without writing a single line of implementation code. This module covers architectural styles, when to use each, and how to produce interview-grade HLD diagrams.

4.1 Anatomy of a Good HLD Diagram

A strong HLD diagram answers four questions at a glance: Who calls whom? Where does data live? What are the failure points? How does traffic scale? It uses consistent notation, labels protocols on arrows, and groups related components.

HLD Diagram Legend — Standard Notation

4.2 Monolith vs Microservices vs Serverless

The most common architectural decision. There is no universal winner — only the right fit for your team's size, scale, and operational maturity.

Three Architectural Styles Compared

Factor	Monolith	Microservices	Serverless
Team size	1–10 engineers	10–100+ (Conway's Law)	Small teams, event workloads
Deploy complexity	Low	High (CI/CD per service)	Very low
Scaling	Vertical + replicate all	Per-service horizontal	Auto-scale per function
Best for	MVPs, early stage	Large orgs, varied scale	Spiky, infrequent workloads

4.3 Layered (N-Tier) Architecture

The classic pattern: separate presentation, business logic, and data access into distinct layers. Each layer only talks to the layer directly below it. Simple, well-understood, and still the backbone of most enterprise applications.

Three-Tier Architecture

When to use: CRUD-heavy business apps, internal tools, e-commerce backends. Watch out for: the "anemic domain model" where the logic layer becomes a thin pass-through — keep business rules in the business layer, not scattered in controllers.

4.4 Event-Driven Architecture (EDA)

Instead of services calling each other directly (tight coupling), producers emit events to a message broker. Consumers subscribe and react independently. This enables loose coupling, async processing, and natural audit trails.

Event-Driven Flow — Order Placed Example

Benefits: decoupling, resilience (consumers can retry), scalability (add consumers without changing producer). Costs: eventual consistency, debugging complexity, need for idempotent consumers.

4.5 Data Flow & Component Interaction

Every HLD must show read path and write path separately — they often have different performance characteristics and caching strategies.

Read Path (URL Shortener redirect):
  Client → CDN edge → Load Balancer → API Server → Redis cache (hit?) → return 301
                                              ↓ miss
                                         PostgreSQL read replica → populate cache → return 301

Write Path (create short URL):
  Client → Load Balancer → API Server → generate code → PostgreSQL primary (write)
                                                    → async replicate to read replicas
                                                    → optionally warm Redis cache

4.6 Choosing Your Architecture Style

Use this decision flowchart mentally during interviews:

Architecture Selection Decision Tree

Module 05

API Design & Communication Patterns

APIs are the contracts between your system's islands. Poor API design creates coupling, versioning nightmares, and cascading failures. This module covers REST principles, protocol selection (REST vs gRPC vs GraphQL), sync vs async patterns, and production-grade API hygiene.

5.1 REST API Design Principles

REST (Representational State Transfer) models your system as resources identified by URLs, manipulated via HTTP verbs. Good REST APIs are predictable, cacheable, and self-describing.

Resource-Oriented URL Design (URL Shortener)
────────────────────────────────────────────
POST   /api/v1/urls              Create short URL       → 201 Created
GET    /api/v1/urls/{code}       Get URL metadata       → 200 OK
GET    /api/v1/urls/{code}/stats Get click analytics    → 200 OK
DELETE /api/v1/urls/{code}       Deactivate short URL   → 204 No Content

GET    /{code}                   Redirect (not /api/)   → 301 Moved Permanently

Anti-patterns to avoid:
  POST /api/createShortUrl       (verb in URL — use nouns)
  GET  /api/deleteUrl?id=123     (mutation via GET — never)
  GET  /api/v1/getAllUsers       (RPC style disguised as REST)

HTTP Status Codes You Must Know

200 OK — successful GET/PUT

201 Created — successful POST

204 No Content — successful DELETE

400 Bad Request — client error

401 Unauthorized — auth required

404 Not Found — resource missing

409 Conflict — duplicate resource

429 Too Many Requests — rate limited

500 Internal Server Error

503 Service Unavailable — overload

5.2 REST vs gRPC vs GraphQL

Three dominant API paradigms — each optimized for different constraints. Choosing the wrong one creates friction for years.

API Paradigm Comparison

5.3 Synchronous vs Asynchronous Communication

Synchronous: caller waits for response (HTTP request/response). Simple mental model, but creates temporal coupling — if the callee is slow or down, the caller suffers. Asynchronous: caller sends message and continues (queue/event). Decouples availability but introduces complexity (ordering, duplicates, eventual consistency).

Sync vs Async Communication Patterns

5.4 Idempotency, Retries & Rate Limiting

Distributed systems fail mid-request. Networks drop packets. Clients retry. Without idempotency, a payment API called twice charges the customer twice.

Idempotent operations produce the same result no matter how many times they're executed. GET, PUT, DELETE are naturally idempotent. POST is not — solve with Idempotency-Key headers stored in Redis with TTL.

// Client sends idempotency key on POST
POST /api/v1/payments
Headers: Idempotency-Key: "550e8400-e29b-41d4-a716-446655440000"

// Server logic:
if (redis.exists(idempotency_key)) {
  return cached_response;  // duplicate — return same result
}
result = process_payment(request);
redis.setex(idempotency_key, 86400, result);
return result;

// Rate limiting (Token Bucket algorithm)
// 100 requests/minute per API key → return 429 when exceeded
// Headers: X-RateLimit-Remaining: 42, X-RateLimit-Reset: 1620000000

5.5 API Versioning & Documentation

APIs evolve. Versioning strategy prevents breaking existing clients. Common approaches:

• URL versioning: /api/v1/users — simple, explicit, most common
• Header versioning: Accept: application/vnd.myapi.v2+json — clean URLs, harder to test in browser
• Query param: /api/users?version=2 — least recommended

Document APIs with OpenAPI (Swagger) specs — they generate interactive docs, client SDKs, and mock servers. In system design interviews, listing 3–4 key endpoints with request/response shapes demonstrates API thinking without writing full specs.

Module 06

Data Modeling & Database Selection

Data outlives code. The schema and storage engine you choose on day one constrains every feature for years. This module teaches you to model entities, choose between SQL and NoSQL, understand normalization trade-offs, and plan for horizontal data scaling.

6.1 Entity-Relationship Modeling

Before picking a database, model your entities (nouns: User, Order, URL) and relationships (verbs: places, contains, maps-to). This ER diagram drives your schema regardless of SQL or NoSQL.

ER Diagram — URL Shortener

-- Relational schema derived from ER diagram
CREATE TABLE short_urls (
  id          BIGSERIAL PRIMARY KEY,
  code        VARCHAR(10) UNIQUE NOT NULL,
  long_url    TEXT NOT NULL,
  user_id     BIGINT REFERENCES users(id),
  created_at  TIMESTAMPTZ DEFAULT NOW(),
  expires_at  TIMESTAMPTZ,
  is_active   BOOLEAN DEFAULT TRUE
);
CREATE INDEX idx_short_urls_code ON short_urls(code);      -- redirect lookup
CREATE INDEX idx_short_urls_user ON short_urls(user_id);   -- user's URLs list

6.2 Normalization vs Denormalization

Normalization eliminates redundancy — data lives in one place (3NF). Updates are consistent but reads may require JOINs. Denormalization duplicates data for faster reads — common in read-heavy systems at scale.

Normalized (3NF)

users table + orders table + order_items table. Insert order = 3 table writes. Read order with items = JOIN query.

Denormalized

orders document embeds user_name and items array. Read order = single document fetch. Update user name = update many documents.

Rule of thumb: Normalize for write-heavy, consistency-critical systems (banking). Denormalize for read-heavy systems where query speed matters more than storage redundancy (feeds, analytics dashboards).

6.3 SQL vs NoSQL — Decision Framework

Database Selection Decision Tree

6.4 Sharding & Partitioning

When a single database node can't hold your data or serve your query load, you partition (split) data across multiple nodes. Horizontal partitioning (sharding) splits rows by a shard key. Vertical partitioning splits columns or tables by feature.

Horizontal Sharding by User ID

Shard key selection is critical — a bad key (e.g., country) creates hot shards. Good keys distribute evenly (user_id hash, UUID). Avoid cross-shard JOINs — design queries to hit a single shard.

6.5 Data Access Patterns Drive Everything

The single most important database design principle: design your schema around how data is read and written, not around how it looks on a whiteboard.

Access Pattern	Storage Choice	Why
Lookup by primary key	SQL B-tree index / DynamoDB	O(log n) or O(1) retrieval
Session / hot key cache	Redis	Sub-ms in-memory access
Full-text search	Elasticsearch	Inverted indexes for text
Time-series metrics	TimescaleDB / InfluxDB	Optimized for time-range queries
Social graph traversal	Neo4j / adjacency lists	JOINs on graphs are expensive in SQL

Module 07

Caching Strategies

Caching is the single highest-ROI optimization in system design. A well-placed cache turns 100ms database queries into 1ms memory lookups and can reduce database load by 90%+. This module covers every caching layer, pattern, eviction policy, and the infamous cache invalidation problem.

7.1 Why Caching Exists — The Memory Hierarchy at Scale

You learned in Module 1 that CPU cache is 100× faster than RAM, which is 1000× faster than disk. Distributed systems follow the same principle: **keep hot data as close to the consumer as possible**. Every cache layer trades freshness for speed.

The Distributed Cache Hierarchy

Analogy — The Beach Snack Shack: Instead of every tourist sailing to the mainland warehouse (database) for water, you place snack shacks (caches) at the beach, pier, and hotel lobby. Most requests never leave the island. You restock shacks periodically — that's cache invalidation.

7.2 Core Caching Patterns

Four fundamental patterns govern how application code interacts with cache and database. Know all four — interviews often ask you to pick one and justify it.

Cache-Aside (Lazy Loading) — Most Common Pattern

Pattern	How It Works	Trade-off
Cache-Aside	App checks cache; on miss, reads DB and populates cache	Simple; stale data possible between writes
Read-Through	Cache itself loads from DB on miss	Cleaner app code; cache library must support it
Write-Through	Write goes to cache AND DB synchronously	Consistent; higher write latency
Write-Behind	Write to cache; async flush to DB later	Fast writes; risk of data loss on crash

// Cache-Aside pseudocode (URL redirect)
function getLongUrl(shortCode):
  cached = redis.get("url:" + shortCode)
  if cached:
    return cached                          // cache HIT (~1ms)

  row = db.query("SELECT long_url FROM short_urls WHERE code = ?", shortCode)
  if row:
    redis.setex("url:" + shortCode, 3600, row.long_url)  // TTL 1 hour
  return row.long_url                      // cache MISS (~10ms)

7.3 CDN Caching — Edge Proximity

A Content Delivery Network (CDN) caches static and dynamic content at edge servers geographically close to users. A user in Tokyo hits a Tokyo edge node instead of your US-origin server — slashing latency from 300ms to 20ms.

Cache-Control headers control CDN behavior: max-age=3600 (cache 1 hour), no-cache (revalidate every time), private (browser only, not CDN). For URL shortener redirects, CDN can cache 301 responses for popular short codes.

7.4 Eviction Policies & TTL

Caches have finite memory. When full, something must go. TTL (Time To Live) expires entries automatically. Eviction policies decide what to remove when memory is full.

LRU (Least Recently Used)

Evict the item not accessed for the longest time. Default in Redis. Good general-purpose policy.

LFU (Least Frequently Used)

Evict the item accessed fewest times. Better when access patterns have long-tail popularity (viral content).

TTL strategy: Short TTL (60s) for frequently changing data. Long TTL (24h) for static content. Jitter TTL (random ±10%) to prevent synchronized mass expiration.

7.5 Cache Invalidation — The Hard Problem

Phil Karlton famously said: "There are only two hard things in Computer Science: cache invalidation and naming things." When source data changes, stale cache entries must be updated or removed.

TTL-based expiration

Simplest — let entries expire naturally. Acceptable staleness window.

Write-invalidate

On DB write, delete cache key. Next read repopulates. Most common with cache-aside.

Write-update

On DB write, update cache entry directly. Keeps cache warm but more complex.

// Write-invalidate on URL update
function updateLongUrl(shortCode, newUrl):
  db.execute("UPDATE short_urls SET long_url = ? WHERE code = ?", newUrl, shortCode)
  redis.del("url:" + shortCode)   // invalidate — next read refreshes cache

7.6 Thundering Herd & Cache Stampede

When a popular cache key expires, thousands of concurrent requests all miss simultaneously and hammer the database — a cache stampede. Mitigations:

• Mutex / lock: Only one request rebuilds cache; others wait or return stale
• Probabilistic early expiration: Randomly refresh before TTL expires
• Never expire hot keys: Background refresh before expiration
• Request coalescing: Deduplicate in-flight requests for same key

Module 08

Load Balancing & Horizontal Scaling

One server has limits — CPU cores, memory, network bandwidth. Load balancing distributes traffic across multiple servers so no single machine becomes the bottleneck. Combined with horizontal scaling, this is how systems grow from handling hundreds to millions of requests per second.

8.1 Vertical vs Horizontal Scaling

Vertical scaling (scale up): add more CPU/RAM to one machine. Simple but has a ceiling — the biggest cloud instance costs 10× more for 2× performance. Horizontal scaling (scale out): add more machines. The path to internet scale, but requires load balancing and stateless design.

Horizontal Scaling with Load Balancer

8.2 Layer 4 vs Layer 7 Load Balancing

L4 (Transport Layer)

Routes based on IP + port. Fast, no content inspection. Cannot route by URL path or HTTP headers.

Examples: AWS NLB, HAProxy (TCP mode)

L7 (Application Layer)

Routes based on HTTP headers, URL path, cookies. Can terminate SSL, inject headers, route /api to one pool and /static to another.

Examples: AWS ALB, Nginx, Envoy

8.3 Load Balancing Algorithms

Algorithm	Behavior	Best For
Round Robin	Rotate through servers sequentially	Equal-capacity, uniform requests
Weighted Round Robin	More traffic to more powerful servers	Mixed instance sizes
Least Connections	Route to server with fewest active connections	Long-lived connections, variable request duration
IP Hash	Same client IP → same server	Session affinity without cookies
Consistent Hashing	Minimal redistribution when servers added/removed	Distributed caches, sharding

8.4 Stateless Servers & Session Affinity

For true horizontal scaling, application servers must be stateless — any server can handle any request. Session data lives in Redis, not server memory. When state is unavoidable, sticky sessions (session affinity) route the same user to the same server — but this complicates scaling and failover.

Best practice: Externalize all state to Redis/DB. Avoid sticky sessions unless absolutely required (WebSocket connections are a common exception).

8.5 Health Checks & Auto-Scaling

Load balancers continuously health check backends — HTTP GET /health every 10s. Unhealthy servers are removed from rotation automatically. Auto-scaling groups add/remove servers based on CPU, request count, or custom metrics — paying only for capacity you need.

Auto-scaling policy example:
  Scale OUT when: avg CPU > 70% for 3 minutes
  Scale IN  when: avg CPU < 30% for 10 minutes
  Min instances: 2  |  Max instances: 20  |  Desired: 4

Health check endpoint:
  GET /health → 200 { "status": "ok", "db": "connected", "redis": "connected" }

Module 09

Message Queues & Async Processing

Not every operation needs an immediate response. Message queues decouple producers from consumers, absorb traffic spikes, and enable reliable background processing. This module covers queue fundamentals, delivery guarantees, and when to reach for Kafka vs RabbitMQ vs SQS.

9.1 Why Message Queues?

Without queues, every operation is synchronous — the user waits for email sending, image resizing, and analytics logging before seeing "Order Placed." Queues let you acknowledge fast and process slow.

• Decoupling: producer doesn't know about consumers
• Buffering: absorb traffic spikes without overwhelming downstream
• Reliability: messages persist if consumer is temporarily down
• Scalability: add more consumers to process faster

Point-to-Point vs Pub/Sub

9.2 Delivery Guarantees

The hardest problem in messaging: ensuring messages are processed exactly once in a world where networks fail and consumers crash. In practice, you choose a guarantee and design idempotent consumers.

Guarantee	Meaning	Risk
At-most-once	Fire and forget — message may be lost	Data loss acceptable (metrics, logs)
At-least-once	Message delivered ≥1 times; consumer must be idempotent	Duplicates possible — most common in production
Exactly-once	Processed precisely once	Expensive; Kafka transactions, or dedup at consumer

9.3 Kafka vs RabbitMQ vs SQS

Apache Kafka

Distributed commit log. High throughput, message replay, event sourcing. Retains messages for days/weeks. Best for event streams and analytics pipelines.

RabbitMQ

Traditional message broker. Complex routing (exchanges, bindings). Messages deleted after ack. Best for task queues and RPC patterns.

AWS SQS

Fully managed, serverless queue. Standard (at-least-once) or FIFO (exactly-once ordering). Best for AWS-native async workloads with zero ops.

9.4 Dead Letter Queues & Backpressure

When a message fails processing repeatedly (poison message), it moves to a Dead Letter Queue (DLQ) for manual inspection — preventing infinite retry loops that block the queue.

Backpressure occurs when consumers can't keep up with producers. Solutions: scale consumers, throttle producers, increase queue capacity, or shed load (drop low-priority messages).

// Idempotent consumer pattern
function processOrderEvent(event):
  if redis.setnx("processed:" + event.id, 1, ttl=86400):
    charge_payment(event)
    send_confirmation_email(event)
  else:
    log("Duplicate event, skipping")  // safe to ignore

9.5 Event Sourcing Preview

Instead of storing current state, event sourcing stores every state change as an immutable event log. Current state is reconstructed by replaying events. Kafka's commit log is naturally suited for this pattern — we'll see it applied in case studies.

Module 10

Low-Level Design (LLD) Fundamentals

HLD tells you what services exist. LLD tells you how they're built inside — classes, methods, algorithms, database schemas, and interaction sequences. This module bridges architecture diagrams to implementable code through SOLID principles, UML diagrams, and structured design thinking.

10.1 HLD to LLD — The Zoom-In Transition

After HLD defines the URL Shortener's boxes (API Server, Redis, PostgreSQL), LLD zooms into the API Server box and asks: What classes exist? What methods do they expose? How does encoding work? What exceptions are thrown?

From HLD Box to LLD Class Diagram

10.2 SOLID Principles

SOLID guides maintainable object-oriented design. Internalize these — interviewers probe them in LLD rounds.

S — Single Responsibility

A class should have one reason to change. UrlService handles URL logic; EmailService handles email. Don't mix them.

O — Open/Closed

Open for extension, closed for modification. Add new encoding strategies (Base62, Base64) via interface without changing UrlService.

L — Liskov Substitution

Subtypes must be substitutable for base types. Any UrlRepository implementation (PostgreSQL, MongoDB) must honor the same contract.

I — Interface Segregation

Don't force classes to implement methods they don't use. Separate ReadableUrlStore and WritableUrlStore if consumers differ.

D — Dependency Inversion

Depend on abstractions, not concretions. UrlService depends on IUrlRepository interface, not PostgreSQL directly — enables testing with mocks.

10.3 Sequence Diagrams — Object Interactions Over Time

Sequence diagrams show who calls whom, in what order, over time. Essential for LLD interviews when explaining a use case flow.

Sequence Diagram — Create Short URL

10.4 LLD Code Structure — Layered Implementation

// Interface (Dependency Inversion)
interface UrlRepository {
  save(url: ShortUrl): ShortUrl
  findByCode(code: string): ShortUrl | null
  existsByCode(code: string): boolean
}

// Service layer (business logic)
class UrlService {
  constructor(
    private repo: UrlRepository,
    private encoder: UrlEncoder,
    private cache: CacheClient
  ) {}

  async createShortUrl(longUrl: string, userId?: string): Promise<ShortUrl> {
    this.validateUrl(longUrl)
    const code = await this.generateUniqueCode()
    const url = new ShortUrl(code, longUrl, userId)
    const saved = await this.repo.save(url)
    await this.cache.set(`url:${code}`, longUrl, 3600)
    return saved
  }

  private async generateUniqueCode(): Promise<string> {
    // retry loop with collision detection
  }
}

10.5 LLD Interview Approach

Clarify scope: "Are we designing the full system or one component?"
Identify entities: nouns become classes (User, Order, ParkingSpot)
Define relationships: associations, inheritance, composition
Walk through use cases: draw sequence diagrams for 2–3 main flows
Handle edge cases: concurrency, validation, error handling
Discuss extensibility: how would you add feature X without rewriting?

Module 11

OOP Design Patterns for LLD

Design patterns are reusable solutions to recurring object-oriented design problems. In LLD interviews, naming the right pattern — and explaining why it fits — signals senior-level thinking. This module covers the patterns you'll use most, with concrete examples from parking lots, payment systems, and notification engines.

11.1 What Are Design Patterns?

A design pattern is a proven template for structuring classes and their interactions. Patterns are not copy-paste code — they are a shared vocabulary. Saying "I'll use Strategy for payment methods" instantly communicates intent to any experienced engineer.

The Gang of Four (GoF) Pattern Categories

Interview tip: Don't name-drop patterns without context. Always follow: "I'm using [Pattern] because [problem], which gives us [benefit] at the cost of [trade-off]."

11.2 Creational Patterns

Singleton — One Instance Only

Guarantees a class has exactly one instance with global access. Use for database connection pools, configuration managers, thread pools. Caution: overuse creates hidden dependencies and makes testing hard.

class DatabaseConnectionPool {
  private static instance: DatabaseConnectionPool
  private constructor() {}  // prevent direct instantiation

  static getInstance(): DatabaseConnectionPool {
    if (!DatabaseConnectionPool.instance) {
      DatabaseConnectionPool.instance = new DatabaseConnectionPool()
    }
    return DatabaseConnectionPool.instance
  }
}

Factory / Factory Method — Delegate Object Creation

Encapsulates object creation logic. Client asks for a "Notification" without knowing if it's Email, SMS, or Push. Classic LLD example: Design a Notification System.

interface Notification { send(message: string, recipient: string): void }

class EmailNotification implements Notification { /* ... */ }
class SmsNotification implements Notification { /* ... */ }
class PushNotification implements Notification { /* ... */ }

class NotificationFactory {
  static create(type: 'email' | 'sms' | 'push'): Notification {
    switch (type) {
      case 'email': return new EmailNotification()
      case 'sms':   return new SmsNotification()
      case 'push':  return new PushNotification()
    }
  }
}

Builder — Construct Complex Objects Step by Step

When an object has many optional fields (Order with items, discounts, shipping, gift wrap). Builder provides fluent API and validates before construction.

const order = new OrderBuilder()
  .setCustomer(userId)
  .addItem(productId, quantity)
  .applyCoupon('SAVE10')
  .setShippingAddress(address)
  .build()  // validates all required fields before creating Order

11.3 Structural Patterns

Adapter — Bridge Incompatible Interfaces

Wraps a legacy or third-party API so it conforms to your interface. Your payment system expects PaymentProcessor; Stripe SDK has a different API — Adapter bridges the gap.

Decorator — Add Behavior Without Subclassing

Wraps an object to add responsibilities dynamically. A base Coffee can be wrapped with MilkDecorator, then WhipDecorator. In systems: add logging, caching, or encryption layers around a service.

Decorator Pattern — Layering Behaviors

Facade — Simplified Interface to Complex Subsystem

OrderFacade.placeOrder() internally coordinates inventory, payment, shipping, and notification services — client sees one simple method.

Proxy — Stand-In With Controlled Access

Proxy controls access to a real object. Use cases: lazy loading (load image only when displayed), access control, remote proxy (RPC stub), caching proxy.

11.4 Behavioral Patterns — The LLD Workhorses

Strategy — Swap Algorithms at Runtime

Define a family of algorithms, encapsulate each, and make them interchangeable. The #1 pattern in LLD interviews.

// Parking Lot LLD — different pricing strategies
interface PricingStrategy {
  calculateFee(entryTime: Date, exitTime: Date): number
}

class HourlyPricing implements PricingStrategy { /* $5/hour */ }
class FlatRatePricing implements PricingStrategy { /* $20/day */ }
class WeekendPricing implements PricingStrategy { /* 1.5× multiplier */ }

class ParkingTicket {
  constructor(private strategy: PricingStrategy) {}
  getFee(entry: Date, exit: Date) { return this.strategy.calculateFee(entry, exit) }
}

Observer — Publish/Subscribe Within Code

When one object's state change must notify many dependents. OrderSubject notifies EmailObserver, InventoryObserver, AnalyticsObserver on order placement. Mirrors event-driven architecture at the code level.

Observer Pattern — Order Placed Event

Command — Encapsulate Actions as Objects

Turn requests into objects with execute() and undo(). Powers undo/redo in text editors, job queues, and transaction systems. Each command stores the receiver and parameters.

State — Behavior Changes With Internal State

Object behavior changes when its state changes. A VendingMachine behaves differently in Idle, HasMoney, Dispensing, and OutOfStock states — each state is its own class implementing a common interface.

interface VendingState {
  insertCoin(machine: VendingMachine): void
  selectProduct(machine: VendingMachine, code: string): void
  dispense(machine: VendingMachine): void
}

class IdleState implements VendingState {
  insertCoin(m) { m.setState(new HasMoneyState()); m.addCredit(1) }
  selectProduct(m, code) { throw new Error("Insert coin first") }
  dispense(m) { throw new Error("No product selected") }
}

11.5 Pattern Selection Guide for LLD Interviews

Problem You Face	Reach For	LLD Example
Multiple interchangeable algorithms	Strategy	Payment methods, pricing rules, routing
Notify many components on state change	Observer	Order events, stock price alerts
Create objects without specifying exact class	Factory	Notification channels, DB drivers
Object behavior depends on current state	State	Elevator, vending machine, workflow
Undo/redo or queue operations	Command	Text editor, task scheduler
Add features without modifying class	Decorator	Logging, caching, compression layers
Simplify complex subsystem API	Facade	Order placement, checkout flow
Integrate incompatible third-party API	Adapter	Legacy payment gateway wrapper

11.6 Anti-Patterns to Avoid

• God Object: one class does everything — violates SRP, untestable
• Pattern overload: forcing Factory + Strategy + Decorator + Observer into a simple CRUD app
• Premature abstraction: "We might need 10 payment methods someday" — YAGNI applies
• Singleton abuse: global mutable state makes unit testing a nightmare
• Inheritance over composition: deep class hierarchies break Liskov; favor interfaces + composition

Golden rule: Start simple. Introduce a pattern only when you can name the specific problem it solves. Interviewers reward clarity over complexity.

Module 12

Concurrency & Thread Safety

Modern servers handle thousands of requests simultaneously. Concurrency unlocks performance — but shared mutable state is the root of nearly every production bug that can't be reproduced locally. This module teaches you to reason about threads, races, locks, deadlocks, and the design patterns that keep multi-threaded systems correct.

12.1 Concurrency vs Parallelism

Concurrency is about dealing with many things at once — structuring your program so multiple tasks make progress. Parallelism is about doing many things at once — literally executing on multiple CPU cores simultaneously. You can have concurrency without parallelism (single-core time-slicing) and parallelism without much concurrency (embarrassingly parallel batch jobs).

Processes vs Threads

Analogy — Shared Kitchen: Threads are chefs in the same kitchen (shared memory). If two chefs grab the same knife (shared variable) without coordinating, someone gets cut (race condition). Processes are separate kitchens — safer but slower to pass ingredients between them.

12.2 Race Conditions & Critical Sections

A race condition occurs when the correctness of your program depends on the unpredictable timing of thread execution. The classic example: two threads increment a shared counter — both read 0, both write 1, result is 1 instead of 2.

// RACE CONDITION — counter may be less than 1000
let counter = 0

// Thread A and Thread B both run this:
counter = counter + 1   // NOT atomic! Read → increment → write = 3 steps

// After 1000 increments from 2 threads, counter might be 847, not 1000

A critical section is the code region that accesses shared resources. Only one thread may execute the critical section at a time — protected by synchronization primitives.

Race Condition Timeline — Lost Update

12.3 Synchronization Primitives

Primitive	What It Does	Use Case
Mutex (Lock)	Only one thread holds the lock at a time	Protecting critical sections
Read-Write Lock	Many readers OR one writer	Read-heavy caches, config stores
Semaphore	Limits concurrent access to N threads	Connection pool (max 10 DB connections)
Atomic Operations	Hardware-guaranteed indivisible read-modify-write	Counters, flags without full locks
Condition Variable	Thread waits until a condition is signaled	Producer-consumer queues, thread pools

// Mutex-protected counter — thread safe
mutex = new Mutex()
counter = 0

function increment():
  mutex.lock()
  try:
    counter = counter + 1    // critical section — only one thread here
  finally:
    mutex.unlock()

// Semaphore — limit concurrent DB connections to 10
dbSemaphore = new Semaphore(10)

function queryDatabase(sql):
  dbSemaphore.acquire()
  try:
    return db.execute(sql)
  finally:
    dbSemaphore.release()

12.4 Deadlock — When Threads Wait Forever

A deadlock occurs when two or more threads are blocked forever, each waiting for a resource held by another. All four Coffman conditions must be true simultaneously:

Mutual exclusion — resource held by only one thread at a time
Hold and wait — thread holds one resource while waiting for another
No preemption — resources can't be forcibly taken away
Circular wait — A waits for B, B waits for A

Classic Deadlock — Two Threads, Two Locks

Deadlock Prevention Strategies

• Lock ordering: always acquire Lock 1 before Lock 2 — breaks circular wait
• Lock timeout: tryLock(timeout) — abort and retry instead of waiting forever
• Minimize lock scope: hold locks for the shortest time possible
• Avoid nested locks: redesign to need only one lock

12.5 Thread-Safe Design Patterns

Immutability

Objects that cannot change after creation are inherently thread-safe. No locks needed. Java String, Python tuple, event objects in event sourcing.

Thread-Local Storage

Each thread gets its own copy of a variable. No sharing = no races. Request context, DB transaction handles per thread.

Thread Pool

Fixed set of worker threads processing a task queue. Avoids thread creation overhead. Tomcat, Node.js worker threads, Java ExecutorService.

Concurrent Collections

java.util.concurrent, Python asyncio queues, Go channels. Built-in thread-safe data structures instead of rolling your own locks.

// Producer-Consumer with blocking queue (thread-safe by design)
queue = new BlockingQueue<Task>(capacity=100)

// Producer thread
function producer():
  while running:
    task = generateTask()
    queue.put(task)          // blocks if queue full (backpressure)

// Consumer threads (pool of N workers)
function consumer():
  while running:
    task = queue.take()      // blocks if queue empty
    process(task)

12.6 Concurrency in LLD & System Design

In LLD interviews, concurrency appears in parking lots (multiple entry/exit gates), elevators (multiple requests), ticket booking (seat reservation races), and rate limiters. Key questions to address:

• What shared state exists? Who reads/writes it?
• What happens if two users book the same seat simultaneously?
• Can you use optimistic locking (version numbers) vs pessimistic locking (mutex)?
• Should you use database transactions (SELECT FOR UPDATE) instead of in-memory locks?

Optimistic vs Pessimistic locking: Pessimistic = lock the row before reading (safe, lower concurrency). Optimistic = read freely, check version on write, retry if conflict (higher concurrency, good when conflicts are rare). E-commerce inventory with low contention → optimistic. Bank transfers → pessimistic.

Module 13

Reliability, Fault Tolerance & CAP Theorem

Production systems fail — disks crash, networks partition, deployments go wrong. Reliability is not about preventing all failures; it's about designing systems that continue serving users correctly despite them. This module covers availability math, fault tolerance patterns, the CAP theorem, consistency models, and replication strategies that underpin every distributed database decision.

13.1 Reliability, Availability & Durability

Three terms often confused — each measures a different dimension of system trustworthiness:

Reliability

System performs correctly even when things go wrong. Fault tolerance + recoverability + absence of data corruption.

Availability

System is operational and responding to requests. Measured as uptime percentage (SLA).

Durability

Once data is acknowledged as written, it survives crashes. Replication + backups ensure no loss.

The Nines of Availability

Availability	Downtime / Year	Typical Use
99% (two nines)	3.65 days	Internal tools, dev environments
99.9% (three nines)	8.7 hours	Standard SaaS products
99.99% (four nines)	52 minutes	Payment systems, e-commerce
99.999% (five nines)	5.2 minutes	Telecom, critical infrastructure

Key insight: Each additional nine costs roughly 10× more in engineering and infrastructure. Don't over-engineer — match availability targets to business requirements.

13.2 Fault Tolerance & Eliminating SPOFs

A Single Point of Failure (SPOF) is any component whose failure takes down the entire system. Fault tolerance means redundancy at every critical layer — no single server, rack, or data center is indispensable.

Eliminating Single Points of Failure

Failover Strategies

• Active-Passive: standby replica takes over on primary failure (faster recovery, wasted idle capacity)
• Active-Active: all nodes serve traffic simultaneously (higher utilization, conflict resolution needed)
• Health-check driven: load balancer detects failure and reroutes within seconds

13.3 The CAP Theorem — The Fundamental Trade-off

Eric Brewer's CAP theorem states that a distributed data store can provide at most two of three guarantees simultaneously during a network partition:

C — Consistency

Every read receives the most recent write or an error

A — Availability

Every request receives a non-error response (may be stale)

P — Partition Tolerance

System continues despite network failures between nodes

CAP Theorem — Pick Two During a Partition

Critical nuance: Partitions WILL happen in distributed systems — so P is non-negotiable. The real choice is CP vs AP during a partition: reject requests to stay consistent (CP), or serve potentially stale data to stay available (AP).

13.4 PACELC — CAP Extended

Daniel Abadi's PACELC theorem extends CAP: If there is a Partition (P), choose Availability (A) or Consistency (C). Else (E), choose Latency (L) or Consistency (C). This captures the trade-off even when the network is healthy — strong consistency often requires coordination that adds latency.

System	During Partition	Normal Operation
DynamoDB / Cassandra	AP — stay available	EL — low latency, eventual consistency
MongoDB / HBase	CP — reject writes	EC — consistent but higher latency
PostgreSQL (single node)	CA (no partition possible)	EC — strong consistency

13.5 Consistency Models

Consistency exists on a spectrum — not just "strong" or "eventual." Choose based on what your users can tolerate.

Strong Consistency

After a write completes, all reads see the new value. Required for bank balances, inventory counts. Implemented via consensus (Paxos, Raft) or single-leader replication.

Eventual Consistency

Given no new writes, all replicas converge to the same value eventually. Acceptable for social media likes, view counts, DNS. Cassandra, DynamoDB default.

Causal Consistency

Middle ground — if event A caused event B, everyone sees A before B. Good for chat messages, comment threads.

Read-Your-Writes Consistency

A user always sees their own updates. Critical for profile edits, settings changes. Route user reads to the node that handled their write.

13.6 Replication Strategies

Leader-Follower (Primary-Replica) Replication

Strategy	Description	Example
Leader-Follower	One leader handles writes; followers replicate and serve reads	PostgreSQL, MySQL replication
Multi-Leader	Multiple nodes accept writes; conflict resolution needed	Multi-region CouchDB, offline-first apps
Leaderless	Any node accepts reads/writes; quorum-based consistency	Cassandra, DynamoDB (quorum reads/writes)

// Quorum consistency (leaderless — Cassandra/DynamoDB)
// N = number of replicas, W = write quorum, R = read quorum
// Strong consistency when W + R > N

N = 3 replicas
W = 2  (write must succeed on 2 of 3 nodes)
R = 2  (read from 2 of 3 nodes)
W + R = 4 > N = 3  →  guaranteed to overlap → consistent read

13.7 Designing for Failure — Interview Framework

In system design interviews, always address failure explicitly. Walk through this checklist:

Identify SPOFs: What single component failure kills the system?
Add redundancy: N+1 at every layer (servers, DB replicas, AZs)
Choose CAP position: CP for banking, AP for social feeds — justify it
Define failover: automatic vs manual, RTO (recovery time) and RPO (data loss window)
Plan degradation: what features can be disabled under stress? (circuit breakers, graceful degradation)
Backup & restore: how do you recover from catastrophic failure?

Module 14

Security in System Design

Security is not a feature you bolt on at the end — it is an architectural property woven through every layer. A system design interview that ignores authentication, encryption, and threat modeling will be incomplete. This module teaches you to design systems that protect confidentiality, integrity, and availability against real-world attacks.

14.1 The CIA Triad & Defense in Depth

All security goals reduce to three pillars. Every control you design maps to at least one:

Confidentiality

Only authorized parties access data. Encryption, access controls, least privilege.

Integrity

Data is not altered unauthorized. Hashing, digital signatures, audit logs.

Availability

System remains accessible to authorized users. DDoS protection, redundancy, rate limiting.

Defense in depth layers multiple security controls so no single failure compromises the system — like a castle with moat, walls, guards, and a vault.

Defense in Depth — Security Layers

14.2 Authentication vs Authorization

These are distinct concerns — conflating them is a common design mistake.

Authentication (AuthN)

"Who are you?"

• Password + bcrypt/argon2 hashing
• Multi-factor authentication (MFA)
• OAuth 2.0 / OpenID Connect (SSO)
• API keys, mTLS for service-to-service

Authorization (AuthZ)

"What can you do?"

• Role-Based Access Control (RBAC)
• Attribute-Based Access Control (ABAC)
• JWT claims / OAuth scopes
• Policy engines (OPA, AWS IAM)

OAuth 2.0 Authorization Code Flow

// JWT structure (Header.Payload.Signature)
// Payload contains claims — never store secrets in JWT!
{
  "sub": "user-123",
  "role": "admin",
  "exp": 1620000000,
  "iat": 1619996400
}

// API request
GET /api/v1/users
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...

// Server validates: signature valid? not expired? role permits action?

14.3 Encryption — In Transit & At Rest

Encryption in transit: TLS 1.3 for all client-server and service-to-service communication. HTTPS is non-negotiable. Internal microservices should use mTLS (mutual TLS) so services authenticate each other.

Encryption at rest: Database-level encryption (AES-256), disk encryption, and field-level encryption for PII (SSN, credit cards). Use a Key Management Service (AWS KMS, HashiCorp Vault) — never hardcode keys.

Technique	Purpose	Example
TLS / HTTPS	Encrypt data in transit	All public APIs, web traffic
AES-256 at rest	Encrypt stored data	Database disk encryption, S3 SSE
bcrypt / argon2	One-way hash passwords	Never store plaintext passwords
HMAC / SHA-256	Integrity verification	Webhook signatures, JWT signing
Tokenization	Replace sensitive data with tokens	PCI-DSS credit card handling

14.4 Common Attack Vectors & Defenses

SQL Injection

Attacker injects SQL via input fields. Defense: parameterized queries / prepared statements — never concatenate user input into SQL.

Cross-Site Scripting (XSS)

Malicious scripts injected into web pages. Defense: output encoding, Content-Security-Policy headers, sanitize user HTML.

CSRF (Cross-Site Request Forgery)

Tricks authenticated user into unwanted actions. Defense: CSRF tokens, SameSite cookies, verify Origin header.

DDoS (Distributed Denial of Service)

Overwhelms servers with traffic. Defense: CDN absorption, rate limiting, WAF, auto-scaling, anycast routing.

Broken Access Control

User accesses another user's data (IDOR). Defense: authorize every request server-side, never trust client-sent user IDs alone.

// WRONG — SQL injection vulnerable
query = "SELECT * FROM users WHERE email = '" + userInput + "'"

// RIGHT — parameterized query
query = "SELECT * FROM users WHERE email = ?"
db.execute(query, [userInput])

// Rate limiting at API gateway
// 100 req/min per IP → 429 Too Many Requests
// Prevents brute force and DDoS amplification

14.5 Secrets Management & Least Privilege

Never commit secrets to git. Use dedicated secret stores with automatic rotation:

• HashiCorp Vault — dynamic secrets, encryption as a service
• AWS Secrets Manager / GCP Secret Manager — cloud-native rotation
• Environment injection — secrets injected at runtime, not baked into images

Principle of least privilege: every service account, API key, and IAM role gets only the minimum permissions required. A compromised read-only analytics service shouldn't be able to delete production databases.

14.6 Zero Trust & Security in Interviews

Zero Trust assumes no user or service is trusted by default — even inside the corporate network. Every request is authenticated, authorized, and encrypted. Micro-segmentation limits blast radius if one service is compromised.

Security Checklist for System Design Interviews

Authentication: How do users/services prove identity? (OAuth, JWT, API keys)
Authorization: Who can access what? (RBAC, resource-level checks)
Encryption: TLS in transit, encryption at rest for PII
Input validation: Sanitize all external input at API boundary
Rate limiting: Prevent abuse and brute force
Audit logging: Who did what, when — immutable logs for forensics
Compliance: GDPR, HIPAA, PCI-DSS if applicable

Pro tip: Mentioning security proactively — even briefly — distinguishes senior candidates. "I'll place an API gateway with TLS termination, JWT validation, and rate limiting before traffic hits services" shows production thinking.

Module 15

Observability & Monitoring

You built it, you deployed it, it's serving traffic — but is it healthy? Observability is the discipline of understanding system internal state from external outputs. Without metrics, logs, and traces, debugging a production incident at 3 AM is guesswork. This module teaches you to instrument systems like a senior SRE.

15.1 Monitoring vs Observability

Monitoring tells you when something is wrong — predefined dashboards and alerts fire when thresholds breach. Observability lets you ask why — exploring arbitrary questions about system behavior you didn't anticipate when writing alerts.

Monitoring

"Is CPU above 80%?" Known unknowns. Dashboards, alerts, uptime checks. Reactive — you defined what to watch in advance.

Observability

"Why did checkout latency spike for users in EU only?" Unknown unknowns. Ad-hoc queries across metrics, logs, and traces.

Analogy — Ship Navigation: Monitoring is the dashboard warning light ("engine temperature high"). Observability is the ability to inspect any part of the engine, review the captain's log, and trace the ship's route to diagnose why temperature rose.

15.2 The Three Pillars of Observability

Metrics, Logs, and Traces — The Three Pillars

The pillars are complementary — metrics tell you something is wrong, logs tell you what happened, traces tell you where in the call chain it happened. Correlating all three (via trace IDs in logs) is the gold standard.

15.3 Golden Signals & RED/USE Methods

Google's SRE team defines four Golden Signals every user-facing service should monitor:

Latency — time to serve a request (distinguish success vs error latency)

Traffic — demand on the system (requests/sec, connections)

Errors — rate of failed requests (5xx, timeouts, exceptions)

Saturation — how "full" the system is (CPU, memory, queue depth)

RED Method (for services)

Rate (requests/sec) · Error rate · Duration (latency distribution)

USE Method (for resources)

Utilization · Saturation · Errors — applied per resource: CPU, memory, disk, network.

// Example Prometheus metrics (RED)
http_requests_total{method="GET", status="200", endpoint="/api/urls"} 45230
http_requests_total{method="GET", status="500", endpoint="/api/urls"} 12
http_request_duration_seconds{quantile="0.99", endpoint="/api/urls"} 0.087

// Alert rule
ALERT HighErrorRate
  IF rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.01
  FOR 5m
  LABELS { severity="critical" }

15.4 SLIs, SLOs, SLAs & Error Budgets

Reliability targets must be measurable and agreed upon. This hierarchy connects engineering to business:

SLI (Service Level Indicator)

A measured metric. "Percentage of successful HTTP requests" or "p99 redirect latency."

SLO (Service Level Objective)

Internal target. "99.9% of requests succeed" or "p99 latency < 100ms."

SLA (Service Level Agreement)

Contractual commitment to customers with financial penalties. SLO should be stricter than SLA to provide buffer.

Error budget: If SLO is 99.9% monthly, you have 0.1% budget for failures (~43 minutes/month). When budget is exhausted, freeze feature launches and focus on reliability. This aligns product velocity with stability.

15.5 Distributed Tracing

In microservices, a single user request traverses dozens of services. Distributed tracing assigns a unique trace_id at the edge and propagates it through every service call, creating a waterfall of spans.

Distributed Trace — Checkout Request Waterfall

OpenTelemetry is the vendor-neutral standard for generating traces, metrics, and logs. Instrument once, export to Jaeger, Datadog, or Honeycomb.

15.6 Alerting & On-Call Best Practices

• Alert on symptoms, not causes: "Users experiencing high error rate" not "CPU is 82%"
• Every alert must be actionable: if no one can do anything, it's a dashboard metric, not an alert
• Reduce noise: group related alerts, use alert routing (PagerDuty, Opsgenie)
• Severity levels: P1 (wake someone up) vs P3 (fix next business day)
• Runbooks: every P1/P2 alert links to step-by-step remediation docs

// Structured logging — JSON for machine parsing
{
  "timestamp": "2024-06-15T10:23:45.123Z",
  "level": "ERROR",
  "service": "url-shortener-api",
  "trace_id": "abc123def456",
  "message": "Database connection timeout",
  "user_id": "user-789",
  "duration_ms": 5023,
  "endpoint": "POST /api/v1/urls"
}
// trace_id links this log entry to the distributed trace in Jaeger

15.7 Observability in System Design Interviews

Mentioning observability proactively signals production experience. Cover these in every design:

What to measure: Golden signals / RED metrics for each service
SLOs: "99.9% redirect success, p99 < 100ms"
Health checks: /health endpoint for load balancer
Tracing: propagate trace ID across services for debugging
Alerting: alert on error rate and latency SLO breaches
Dashboards: per-service Grafana boards for on-call engineers

Module 16 · Case Study

Design a URL Shortener (bit.ly / TinyURL)

The canonical system design interview question. We'll walk through the complete 7-step framework — requirements, scale, API, HLD, deep dives, trade-offs, and LLD — producing an interview-ready design.

16.1 Step 1 — Clarify Requirements

Functional Requirements

• Given a long URL → return a short unique URL
• Given short URL → 301 redirect to original
• Optional custom alias (e.g., /my-link)
• Optional expiration date
• Analytics: click count per URL (Should Have)

Non-Functional Requirements

• 100M new URLs/month, 5-year retention
• Read:Write ratio = 100:1
• Redirect latency p99 < 100ms
• 99.99% availability for redirects
• Short codes: as short as possible (base62)

16.2 Step 2 — Back-of-the-Envelope Estimation

Writes/sec  = 100M / (30 × 24 × 3600)  ≈ 40/sec
Reads/sec   = 40 × 100                  ≈ 4,000/sec
Storage     = 100M × 12 × 5 × 500 bytes ≈ 3 TB (raw)
QPS peak    = 4,000 × 3 (burst factor) ≈ 12,000 reads/sec peak

→ Reads dominate → Redis cache + CDN essential
→ Writes modest → single DB shard OK initially
→ 3 TB → plan sharding at ~10 TB

16.3 Step 3 — API Design

POST   /api/v1/urls          → { long_url, custom_alias?, ttl? }  → 201 { short_url }
GET    /api/v1/urls/{code}   → metadata + click stats              → 200
DELETE /api/v1/urls/{code}   → deactivate                          → 204
GET    /{code}               → 301 redirect (separate from /api/)

16.4 Step 4 — High-Level Architecture

URL Shortener — Complete HLD

16.5 Step 5 — Deep Dive: Short Code Generation

Two approaches — discuss both in interviews:

Counter + Base62

DB auto-increment ID → encode to base62 (a-z, A-Z, 0-9). 7 chars = 62⁷ ≈ 3.5 trillion URLs. Simple, no collisions. Requires centralized counter (single DB or range allocation per server).

MD5/Hash + Collision Retry

Hash long URL, take first 7 chars. Collision risk — retry with salt. No centralized counter but collisions increase with scale.

// Base62 encode
function encode(num):
  chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
  result = ""
  while num > 0:
    result = chars[num % 62] + result
    num = num // 62
  return result.padStart(7, 'a')

16.6 Step 6–7 — Trade-offs & LLD Touchpoints

• 301 vs 302 redirect: 301 permanent (browser caches — good for CDN), 302 allows changing destination
• Cache-aside for reads; write-invalidate on URL update
• Async analytics via Kafka — don't block redirect for click logging
• LLD classes: UrlService, UrlRepository, Base62Encoder, CacheClient

Module 17 · Case Study

Design a Chat System (WhatsApp / Messenger)

Real-time messaging at scale requires WebSockets, presence tracking, message ordering, and offline delivery. This case study covers one-on-one and group chat for 500M DAU.

17.1 Requirements & Scale

• FRs: 1:1 chat, group chat (max 500), media sharing, read receipts, online presence
• NFRs: 500M DAU, message delivery < 500ms, 99.9% availability, offline message sync
• Scale: 500M DAU × 40 msgs/day = 20B msgs/day ≈ 230K msgs/sec

17.2 Architecture — WebSockets & Chat Servers

Chat System HLD

Key design: Clients maintain persistent WebSocket connections to chat servers. Redis stores user_id → server_id mapping so messages route to the correct server. Cassandra stores message history (write-heavy, time-series friendly).

17.3 Message Flow & Offline Delivery

User A sends message → Chat Server 1 → persist to Cassandra → publish to Kafka
Lookup User B in Redis → connected to Chat Server 2 → push via WebSocket
If User B offline → store in inbox table → push notification via APNs/FCM on reconnect
Group chat: fan-out message to all member connections (or fan-out on read for large groups)

17.4 Trade-offs

• WebSocket vs long polling: WebSocket for real-time; fallback to long polling for restrictive networks
• Small groups: fan-out on write. Large groups (1000+): fan-out on read
• Message ordering: per-conversation sequence numbers; causal ordering for group chats
• Sticky sessions required for WebSocket — or use Redis pub/sub between chat servers

Module 18 · Case Study

Design a News Feed (Twitter / Instagram)

The news feed is the core of every social platform. The central design decision: pre-compute feeds on write (fan-out on write) or assemble on read (fan-out on read). This module walks through both and when to hybridize.

18.1 Requirements & Scale

• FRs: Post tweets/photos, follow users, view personalized home feed, like/comment
• NFRs: 300M DAU, feed load < 500ms, 500 follows max per user (simplified)
• Scale: 300M DAU, 200M posts/day, avg user follows 200 people, read:write ≈ 100:1

18.2 Fan-out on Write vs Fan-out on Read

Fan-out Strategies Compared

18.3 Feed Architecture

Post service writes to posts table. Fan-out worker pushes post IDs into each follower's Redis feed cache (sorted set by timestamp). Feed service reads top N post IDs from cache, hydrates full post content from posts store.

// Redis feed cache per user (sorted set — score = timestamp)
ZADD feed:user-123  1620000000  "post-456"
ZADD feed:user-123  1620003600  "post-789"

// Get home feed — top 20 most recent
ZREVRANGE feed:user-123 0 19

// Celebrity threshold: if follower_count > 1M → skip fan-out, pull on read

18.4 Key Trade-offs

• Storage: pre-computed feeds use more storage (post ID × followers) but enable fast reads
• Ranking: production feeds use ML ranking — start with chronological, mention ranking as extension
• Media: store images/videos in CDN (S3 + CloudFront), feed cache stores only metadata + URLs

Module 19

Interview Framework & Best Practices

Knowledge alone doesn't pass interviews — execution does. This module synthesizes everything into a battle-tested framework for the 45–60 minute system design interview, with communication tactics, time management, and common pitfalls.

19.1 The 45-Minute Timeline

Phase	Time	Activity
Clarify	5 min	Requirements, scale, constraints — ask 5–8 questions
Estimate	5 min	Back-of-envelope: QPS, storage, bandwidth
API + HLD	15 min	Draw architecture — boxes, arrows, data flows
Deep Dive	15 min	Interviewer-directed: DB, cache, bottlenecks
Wrap-up	5 min	Trade-offs, extensions, what you'd do with more time

19.2 Communication Tactics That Win

• Think aloud: narrate your reasoning — silence makes interviewers nervous
• State assumptions: "I'll assume 100M DAU unless you say otherwise"
• Propose, don't dictate: "I'd lean toward Redis here — does that align with your constraints?"
• Name trade-offs: every decision gets a "because X, trade-off is Y, mitigated by Z"
• Check in: "Should I go deeper on the database layer or move to caching?"
• Draw while talking: diagrams on whiteboard/Excalidraw beat pure verbal descriptions

19.3 Common Mistakes to Avoid

❌ Jumping to microservices without justification

❌ Skipping requirements clarification

❌ Over-engineering (Kafka for 10 users/sec)

❌ Ignoring single points of failure

❌ No numbers — "a lot of users" without QPS

❌ Silent for 10 minutes drawing

19.4 What Interviewers Evaluate

Problem Solving

Structured approach, handles ambiguity

Technical Depth

Knows how components work, not just names

Trade-off Analysis

Articulates why A over B given constraints

Communication

Clear, collaborative, receptive to hints

Module 20

Capstone Review & Next Steps

You've completed all 20 modules. This capstone ties the full curriculum together — a mental map of everything you've learned, a self-assessment checklist, and a roadmap for continued mastery.

20.1 The Complete Mental Map

System Design Knowledge Map

20.2 Self-Assessment Checklist

Can you confidently explain each of these without notes?

☐ CAP theorem and when to choose CP vs AP

☐ Cache-aside pattern and invalidation

☐ Horizontal vs vertical scaling

☐ Leader-follower DB replication

☐ Consistent hashing for sharding

☐ OAuth 2.0 flow and JWT structure

☐ Fan-out on write vs read for feeds

☐ WebSocket chat architecture

☐ SLI / SLO / error budgets

☐ Strategy and Observer design patterns

20.3 Continued Learning Roadmap

• Practice: Design 2 systems/week on Excalidraw — Uber, Netflix, Dropbox, Rate Limiter
• Read: "Designing Data-Intensive Applications" by Martin Kleppmann (the bible)
• Watch: System Design Interview channels — mock interviews with narration
• Build: Implement a URL shortener or chat app — theory becomes intuition through code
• Mock interviews: Pramp, interviewing.io, or peer practice with this course's frameworks

🏝️ Congratulations!

You've completed the System Design Masterclass — all 20 modules, 200+ quiz questions, and 3 full case studies. You have the foundation of a senior engineer. Now go build, practice, and ace those interviews.