System Design Interview Guide 2026 — Framework, Examples & FAANG Questions
Master the 2026 system design interview: complete framework (requirements, estimation, API, high-level design, deep dive, bottlenecks), worked examples (URL shortener, Twitter feed), and junior vs staff vs principal expectations.
Last updated: May 2026
TL;DR
System design interviews are the round that decides senior+ offers — and the round most engineers under-prepare for. Unlike LeetCode, there’s no single “right answer” — you’re being assessed on whether you can scope an ambiguous problem, drive a structured conversation, and make defensible engineering tradeoffs in 45–60 minutes. The framework that wins: requirements (5 min) → back-of-envelope estimation (5 min) → API design (5 min) → high-level architecture (10 min) → deep dive on 2–3 components (15 min) → bottlenecks and scaling (5 min) → wrap-up (5 min). FAANG, fintech, and infrastructure companies all ask system design — at senior, staff, and principal levels the depth requirements scale dramatically. OphyAI Coding Interview is the Premium screenshot/diagram-analysis tool built specifically for live coding and system design rounds; OphyAI Interview Copilot supports behavioral and verbal portions; OphyAI Interview Coach drills SD questions with structured AI feedback.
What System Design Interviews Test
System design interviews test five distinct skills:
- Scoping — Can you turn an ambiguous prompt (“design Twitter”) into specific requirements?
- Estimation — Can you reason about scale (QPS, storage, bandwidth) with rough numbers?
- Architecture — Can you propose a high-level design that meets the requirements?
- Tradeoffs — Can you weigh consistency vs. availability, cost vs. latency, simplicity vs. flexibility?
- Communication — Can you drive a 45–60 minute conversation, respond to pushback, and stay structured?
The bar scales with seniority. Junior engineers are tested on basic distributed systems literacy. Senior engineers are expected to drive the entire conversation, anticipate failure modes, and connect design choices to operational reality. Staff and principal engineers are evaluated on the kinds of tradeoff conversations they’d have with cross-org stakeholders — capacity planning, multi-region failover, data residency, organizational ownership.
The Universal Framework (45–60 Min)
| Phase | Time | Output |
|---|---|---|
| 1. Clarify requirements | 5 min | Functional + non-functional requirements, scope cuts |
| 2. Back-of-envelope estimation | 3–5 min | QPS, storage, bandwidth, key throughput numbers |
| 3. API design | 3–5 min | 3–5 endpoints with request/response shapes |
| 4. High-level architecture | 10–15 min | Boxes and arrows: clients, services, data stores, caches, queues |
| 5. Deep dive | 15–20 min | 2–3 components explored in depth (chosen by interviewer or you) |
| 6. Bottlenecks and scaling | 5 min | Hot spots, sharding strategy, replication, caching |
| 7. Wrap-up | 2–3 min | What you’d build next, what you’d monitor, what could go wrong |
The exact time split varies — some interviewers prefer more time on deep dive, others on tradeoffs. Driving the structure visibly (“I’ll spend 5 minutes on requirements, then estimation…”) signals senior-level conversation management.
Phase 1: Requirements
The single most common failure: jumping into architecture before scoping. Spend the first 5 minutes asking questions and writing requirements on the whiteboard or shared doc.
Functional requirements
What does the system do? Start with the 2–3 core user actions.
For “design Twitter”: users post tweets (text-only, 280 chars), users follow other users, users see a home timeline of recent tweets from accounts they follow.
For “design a URL shortener”: users submit a long URL and receive a short URL, users click a short URL and get redirected to the long URL.
For “design YouTube”: users upload videos, users watch videos with adaptive bitrate streaming, users discover videos via search and recommendations.
Non-functional requirements
- Scale — DAU, MAU, QPS estimates, storage needs
- Latency — Read latency target (often p99 < 200ms), write latency target
- Availability — SLA target (99.9%, 99.99%, 99.999%)
- Consistency — Strict (financial transactions), eventual (social feed), causal (chat)
- Durability — Can data loss be tolerated? For how long?
Out of scope
Explicitly call out what you’re not designing:
- Authentication (assume OAuth)
- Spam/abuse detection (defer)
- Analytics pipeline (separate system)
- Internationalization
This is a senior-level signal — making explicit cuts demonstrates judgment.
Phase 2: Back-of-Envelope Estimation
You don’t need exact numbers. You need order of magnitude.
Numbers every engineer should memorize
- 100K seconds per day (actually 86,400, but 100K for quick math)
- 1 KB = average tweet, 1 MB = average JPEG, 100 MB = average MP3
- 1 GB = 10⁹ bytes; 1 TB = 10¹² bytes
- L1 cache: 1 ns; L2 cache: 4 ns; main memory: 100 ns; SSD: 100 μs; HDD: 10 ms; cross-datacenter: 100 ms
- 1 server can handle ~1K QPS for simple reads; ~100 QPS for complex queries
Example: Twitter estimation
- 500M MAU, 200M DAU
- Each user reads 50 tweets/day, posts 1 tweet/day
- Read QPS: 200M × 50 / 100K = 100K QPS reads
- Write QPS: 200M × 1 / 100K = 2K QPS writes
- Read:write ratio = 50:1 (read-heavy → cache aggressively)
- Tweet size: 280 chars + metadata = ~1 KB
- Storage: 200M writes/day × 1 KB = 200 GB/day = ~70 TB/year
- Bandwidth: reads at 100K QPS × 1 KB = 100 MB/s = ~10 TB/day
Example: URL shortener estimation
- 100M new URLs/month, 10B redirects/month
- Read QPS: 10B / 30 / 100K = ~3.3K QPS
- Write QPS: 100M / 30 / 100K = ~33 QPS
- Read:write ratio = 100:1
- Storage per URL: ~500 bytes (long URL + short code + metadata)
- 100M URLs/month × 500 bytes = 50 GB/month = 600 GB/year
Phase 3: API Design
Sketch 3–5 endpoints with request/response shapes. Be specific about authentication, pagination, and error handling.
Example: Twitter API
POST /api/v1/tweets
Body: { content: string, media_ids?: string[] }
Response: { tweet_id: string, created_at: timestamp }
GET /api/v1/users/{user_id}/timeline?limit=20&cursor=<opaque>
Response: { tweets: Tweet[], next_cursor: string }
POST /api/v1/users/{user_id}/follow
Response: { success: bool }
Example: URL shortener API
POST /api/v1/shorten
Body: { long_url: string, custom_alias?: string, ttl_days?: int }
Response: { short_url: string, short_code: string }
GET /{short_code}
Response: 302 redirect to long_url
Cursor-based pagination, not offset — offset breaks at scale. Idempotency keys for writes. Rate limit headers in responses.
Phase 4: High-Level Architecture
Draw boxes and arrows. Common components:
- Clients — Mobile apps, web app, third-party integrations
- Edge — CDN (CloudFront, Fastly), API gateway, load balancer
- Application services — Stateless services behind load balancers
- Data stores — Relational (PostgreSQL, MySQL), document (MongoDB), key-value (Redis, DynamoDB), wide-column (Cassandra), graph (Neo4j), search (Elasticsearch)
- Caches — Redis or Memcached for hot reads
- Queues / message brokers — Kafka, SQS, Pulsar for async processing
- Batch / stream processing — Spark, Flink, Beam for analytics
- Storage — S3 / blob storage for media, GFS / HDFS for large files
Example: URL shortener high-level
- Client → API Gateway → Shortener Service
- Shortener Service writes to PostgreSQL (URL mappings) and Redis cache (hot short codes)
- Client → CDN → Redirector Service for redirect path (lookups Redis, falls back to PostgreSQL)
- Async pipeline: redirects → Kafka → analytics processor → data warehouse
Example: Twitter home timeline (the “feed” problem)
Two classic approaches:
Pull (compute at read time):
- When user requests timeline, query all accounts they follow, merge tweets, return top 20
- Cheap writes, expensive reads — collapses for users following many accounts
Push / fanout (compute at write time):
- When user A tweets, write the tweet to every follower’s timeline cache
- Fast reads, expensive writes — collapses for celebrities with millions of followers (“celebrity problem”)
Hybrid (real world):
- Fanout for normal users
- Pull-on-read for celebrities (mix their tweets into the timeline at read time)
- Cache aggressively for hot users
Drawing this on the whiteboard, explaining the tradeoffs, and proposing the hybrid is a senior-level signal.
Phase 5: Deep Dive
The interviewer will usually pick 2–3 components for deep dive. Be ready to go deep on:
- Database schema and indexing — exact tables, columns, indexes, query patterns
- Sharding / partitioning strategy — sharding key, consistent hashing, hotspot handling
- Caching strategy — what to cache, TTLs, invalidation, cache-aside vs. write-through
- Consistency model — strong, eventual, causal; how you handle conflicts
- Replication — leader-follower, multi-leader, leaderless (Dynamo-style)
- Async processing — queue choice, idempotency, exactly-once vs. at-least-once
Example deep dive: URL shortener short-code generation
Three options, each with tradeoffs:
Option A: Counter + base62 encoding
- Maintain a monotonically increasing counter
- Encode counter in base62 (a-z, A-Z, 0-9) → 6 characters supports 62⁶ ≈ 56B URLs
- Pros: short codes, sequential, predictable
- Cons: counter is a single point of contention; predictable codes are scrapable
Option B: Random short code with collision check
- Generate random 6-character string, check if it exists, retry on collision
- Pros: unpredictable, no contention
- Cons: extra read per write, collision rate grows as space fills
Option C: Pre-allocated batches per service
- Counter service hands out batches of 1000 IDs to each shortener service
- Pros: no per-request contention; sequential within batch
- Cons: lost IDs on service restart; coordination overhead
For 100B URLs at 33 QPS writes: Option C is overkill, Option A has contention issues, Option B is the right pick. Walk through this reasoning explicitly.
Example deep dive: Twitter timeline cache
- Cache structure: Redis sorted set per user, scored by timestamp
- TTL: 7 days, with active users refreshing
- Cache size: 200M users × 20 cached tweets × 1 KB = ~4 TB
- Eviction: LRU; cold users fall back to DB-backed pull
- Invalidation: tweet edit or delete invalidates relevant entries
- Sharding: shard by user_id; each Redis cluster node holds ~1M users
Phase 6: Bottlenecks and Scaling
After the high-level + deep dive, the interviewer wants to see you anticipate failure modes.
Common bottlenecks:
- Hot keys / hot shards — Celebrities, viral content, popular short codes
- Cache stampede — When a hot key expires, thousands of requests hit the DB
- Thundering herd — Many clients retrying on failure simultaneously
- Cross-region latency — Strict consistency across geographies is expensive
- Storage growth — When does your data outgrow your shards?
- Write amplification — Fanout writes scale with follower count
Mitigations to know cold:
- Cache warming and tiered caching
- Probabilistic early expiration (refresh-ahead caching)
- Exponential backoff with jitter for retries
- Bloom filters for negative lookups
- Read replicas for read-heavy workloads
- Sharding strategy review (consistent hashing, dynamic resharding)
- Write batching and async pipelines
Phase 7: Wrap-Up
End by summarizing what you’d build first, what’s deferred, and what could go wrong.
“To recap: I designed a URL shortener with a stateless shortener service backed by PostgreSQL with a Redis cache, generating random 6-character short codes with collision check on insert. Reads go through a CDN to a redirector service that hits Redis first and PostgreSQL on cache miss. We can scale to 100K QPS reads horizontally. What I’d build next: analytics pipeline for click tracking, abuse detection for malicious URLs, and a TTL system for ephemeral links. The biggest risk is hot short codes — viral links — which I’d handle with aggressive CDN caching.”
This kind of wrap-up signals seniority. Most candidates skip it; doing it well leaves a strong final impression.
Junior vs. Senior vs. Staff vs. Principal Expectations
| Level | Expectation |
|---|---|
| Junior (L3/E3/SDE1) | Often not asked SD; if asked, basic CRUD design + simple distributed systems literacy |
| Mid (L4/E4/SDE2) | Can drive the framework with prompting; identifies obvious tradeoffs |
| Senior (L5/E5/SDE3) | Drives the framework independently, anticipates failure modes, can deep-dive any component |
| Staff (L6/E6/Staff) | Connects design to org structure, capacity planning, multi-region; pushes back on the prompt itself |
| Principal (L7/E7/Principal) | Discusses architectural strategy at a product/portfolio level; design quality + influence + judgment |
The same prompt (“design Twitter”) will be graded differently at each level. Senior+ engineers are expected to identify the celebrity problem within minutes. Staff engineers are expected to discuss multi-region replication and data residency. Principal engineers are expected to discuss how the design choice affects 5 downstream teams.
For coding-specific prep, see our technical interview prep for software engineers guide.
Top Companies That Ask System Design
FAANG / MAANG
- Google — L4+ candidates get SD rounds; L6+ candidates get 2 SD rounds (one product, one infrastructure)
- Meta — All E4+ candidates get SD; E5+ candidates get 2 SD rounds. Heavy emphasis on tradeoffs.
- Amazon — All SDE2+ candidates get SD plus Leadership Principles behavioral. Operational excellence emphasized.
- Apple — SD scaled by team; infrastructure teams emphasize systems depth, app teams emphasize iOS/macOS architecture.
- Netflix — Senior+ only (Netflix doesn’t hire junior). Heavy emphasis on streaming, recommendations, and operational scale.
Fintech and infrastructure
- Stripe, Block (Square), Plaid, Wise — Distributed systems + financial consistency emphasis
- Databricks, Snowflake, Confluent, MongoDB — Heavy data systems / storage depth
- Cloudflare, Fastly, Akamai — Edge, networking, CDN depth
- Datadog, Splunk, Elastic — Observability and time-series data systems
- HashiCorp, GitHub, GitLab — Developer infrastructure systems
High-growth tech
- Uber, DoorDash, Airbnb, Instacart — Marketplace and matching systems
- TikTok, Snapchat — Real-time media and feed systems
- Notion, Linear, Figma, Vercel — Collaboration and developer tools systems
For deep-dive on visual / coding analysis during these technical rounds, see our coding interview product page — the OphyAI Premium tool that analyzes screenshots and diagrams in real time during live coding and system design rounds.
Worked Example: Design a URL Shortener (Full Walkthrough)
Requirements (5 min)
- Functional: shorten long URL → short URL; redirect short URL → long URL
- Non-functional: 100M URLs/month, 10B redirects/month, p99 redirect latency < 100ms, 99.99% availability
- Out of scope: user accounts, custom domains, analytics dashboard (mention as future work)
Estimation (5 min)
- Read QPS: 10B / month / 30 days / 100K sec ≈ 3.3K QPS
- Write QPS: 100M / 30 / 100K ≈ 33 QPS
- Read:write ratio 100:1 → read-heavy, cache aggressively
- Storage: 100M URLs × 500 B = 50 GB/month, ~600 GB/year
- Bandwidth: 3.3K QPS × 500 B = ~1.5 MB/s reads
API (3 min)
- POST /api/v1/shorten → returns short_code
- GET /{short_code} → 302 redirect
High-level architecture (10 min)
- Client → CDN → API Gateway → Shortener Service / Redirector Service
- Shortener writes to PostgreSQL (primary store) and Redis (hot cache)
- Redirector reads Redis first, PostgreSQL on miss
- Async pipeline: redirects → Kafka → analytics (out of scope but mentioned)
Deep dive (15 min)
- Short code generation: random 6-char base62 (62⁶ ≈ 56B), collision check on insert; expected collision probability low until ~30B URLs
- Database: PostgreSQL with
short_codeprimary key, single-table design; sharded byshort_codehash at ~1B URLs - Caching: Redis cluster, cache short_code → long_url with 24-hour TTL; cache warming for top 1% of links
Bottlenecks and scaling (5 min)
- Hot short codes (viral links) → aggressive CDN caching at edge
- Cache stampede on TTL expiration → use probabilistic early expiration
- Storage growth → re-shard PostgreSQL when single shard exceeds 500 GB
- Multi-region: read replicas in each region, async replication; accept eventual consistency for short code lookups
Wrap-up (2 min)
“Built a 2-tier system: write path through Shortener Service to PostgreSQL + Redis. Read path through CDN → Redirector → Redis → PostgreSQL. Scales to 100K QPS reads with horizontal scaling and aggressive caching. Next I’d add: analytics pipeline, abuse detection, multi-region active-active for global availability. The biggest risk is hot keys — viral short codes — handled via CDN.”
Worked Example: Design Twitter Home Timeline
Requirements (5 min)
- Functional: users post tweets (text, 280 chars), follow users, see timeline of recent tweets from followed accounts
- Non-functional: 500M MAU, 200M DAU, p99 timeline latency < 200ms, 99.99% availability, eventually consistent ordering acceptable
- Out of scope: media, DMs, search, ads
Estimation (3 min)
- Read QPS: 200M × 50 reads/day / 100K sec = 100K QPS
- Write QPS: 200M × 1 tweet/day / 100K sec = 2K QPS
- Read:write = 50:1
- Storage: 2K writes/sec × 1 KB × 100K sec = 200 GB/day; 70 TB/year
- Follower graph: avg 200 followers/user, 100B total edges
API (3 min)
- POST /api/v1/tweets
- GET /api/v1/users/{user_id}/timeline (cursor-paginated)
- POST /api/v1/users/{user_id}/follow
High-level architecture (10 min)
- Tweets stored in Cassandra (high write throughput, sharded by tweet_id)
- Follower graph in dedicated graph store or wide-column store (sharded by user_id)
- Timeline cache in Redis (sorted set per user, top 1000 tweet IDs)
- Hybrid fanout:
- Normal user posts: async fanout to all followers’ timeline caches
- Celebrity (>10K followers) posts: pull at read time
- At read: merge cached fanout timeline + recent celebrity tweets from accounts user follows
Deep dive: hybrid fanout (15 min)
- Normal fanout: post → message queue → fanout worker → for each follower, push tweet_id to follower’s Redis sorted set
- Celebrity threshold: 10K followers (tunable)
- At read time: read user’s cached timeline + query “celebrity tweets from accounts I follow in last 24h” → merge by timestamp
- Cache size: 200M users × 1K tweet IDs × 50 bytes = ~10 TB; sharded across Redis cluster
Bottlenecks (5 min)
- Celebrity fanout (avoided via hybrid)
- Hot cache shards for popular users — replicate aggressively
- Tweet write amplification (1 tweet → 200 follower-cache writes) — mitigated with celebrity threshold
- Multi-region: tweet writes go to home region, async replicated globally
Wrap-up (2 min)
“Hybrid fanout balances read latency for normal users with write cost for celebrities. Reads hit Redis with celebrity pull-on-read at fan-in. Scales to 100K QPS reads with horizontal Redis + Cassandra. Next: media handling, search index, recommendation re-ranker. Biggest risk: Redis cache hot shards — mitigated with consistent hashing and per-shard replication.”
Preparation Timeline: 6–10 Weeks
| Week | Focus |
|---|---|
| 1–2 | Read Designing Data-Intensive Applications (Kleppmann), watch Hussein Nasser’s distributed systems videos |
| 3–4 | Drill the framework on 5 classic problems: URL shortener, Twitter, YouTube, Uber, WhatsApp |
| 5–6 | Build cheat sheets for caching strategies, sharding, replication, consensus algorithms |
| 7–8 | 10+ mock interviews. Time yourself at 45 min. Record and review for filler words and pacing |
| 9–10 | Company-specific prep. Practice deep-dives on systems aligned with target company (e.g., feed systems for Meta, distributed transactions for Stripe) |
For interactive SD practice with screenshot/diagram analysis, use the OphyAI Coding Interview tool — built specifically for live coding and system design rounds. Drill behavioral and verbal scaffolding with the OphyAI Interview Coach. For your live interview, the OphyAI Interview Copilot provides discreet real-time prompts.
Common Mistakes
Jumping into architecture without requirements. The biggest red flag. Spend 5 full minutes on requirements before drawing anything.
Over-engineering. Proposing Kafka + Cassandra + Spark for a problem that needs PostgreSQL + Redis signals you’re cargo-culting buzzwords.
Under-engineering. Proposing a single PostgreSQL for a 100K QPS read system signals you don’t understand scale.
Hand-waving on deep dive. “We’d use a database” is not a deep dive. Name the database, specify the schema, justify the choice.
Ignoring failure modes. Strong candidates anticipate what breaks at 10x scale, during a regional outage, when the cache cluster fails.
Talking too much, listening too little. If the interviewer is silent for 10 minutes, you’re not running an interview, you’re lecturing. Pause for input.
Not using the whiteboard. Verbal-only system design interviews are harder for both sides. Draw boxes and arrows even if remote (most interviewers share an Excalidraw or Miro link).
FAQ
How long is a typical system design interview?
45–60 minutes is standard at FAANG and most large tech companies. Some companies (Stripe, Databricks) extend to 75 or 90 minutes for senior+ levels. Smaller startups sometimes compress to 30 minutes.
Do junior engineers get system design interviews?
Usually not for L3/E3 (new grad). Some companies introduce a “lite” SD round at L4. By L5 / senior, SD is standard and weighted heavily.
What’s the difference between a system design interview and a coding interview?
Coding interviews test your ability to write correct code under time pressure (LeetCode-style). System design interviews test your ability to architect a distributed system under ambiguity. Both are required for senior+ tech hiring; system design typically carries more weight at higher levels.
Can I use an AI tool during a system design interview?
For virtual SD interviews, AI tools like the OphyAI Coding Interview (Premium screenshot/diagram analysis) and OphyAI Interview Copilot can provide structured prompts and analyze diagrams in real time. Different companies have different policies — some prohibit external tools in technical rounds. See our interview copilot ethics discussion for context.
What books should I read for system design prep?
The canonical list: “Designing Data-Intensive Applications” (Kleppmann), “System Design Interview” Vol 1 & 2 (Alex Xu), “Database Internals” (Petrov). For deep dives: papers like the Google File System paper, MapReduce, Dynamo, Spanner, and the Raft consensus paper.
How important is naming specific technologies (Kafka, Cassandra, etc.)?
Important but not decisive. Naming a specific technology shows you have hands-on awareness. But you should always be ready to justify the choice and discuss alternatives. “I’d use Kafka for the async pipeline because [reasons]; alternatively, SQS or Pulsar would work, with these tradeoffs…”
Do I need to know specific algorithms (consistent hashing, Raft, Paxos)?
At senior+ levels, yes — you should be able to explain consistent hashing, leader election, quorum-based reads, and at least one consensus algorithm at a working level. You don’t need to implement them, but you need to discuss them coherently when they come up.
How do I improve at system design if I haven’t built large-scale systems myself?
Read system design case studies from real companies (engineering blogs from Uber, Discord, Cloudflare, Stripe, Pinterest). Each is a worked example of how a real team scaled. Then drill the framework on 20+ practice problems until the structure is automatic.
Prepare for System Design with OphyAI
System design interviews reward structured thinking and rehearsed pattern recognition. The candidates who land staff+ offers have walked through 30+ practice problems and can drive any prompt through the universal framework without thinking about it.
- Practice live coding and system design with screenshot/diagram analysis in the Premium OphyAI Coding Interview — built specifically for live SD and coding rounds
- Drill SD scoping and deep dives with structured AI feedback in OphyAI Interview Coach
- Get real-time support on live virtual SD interviews with the OphyAI Interview Copilot
- Track every interview loop with the Application Tracker
Related guides
- Technical interview prep for software engineers
- Google interview guide
- Meta interview guide
- Amazon interview guide
- Stripe interview guide
- Databricks interview guide
- Behavioral interview questions and answers
For Premium screenshot and diagram analysis during live coding and system design rounds, see OphyAI Coding Interview. For more, see our Best AI Interview Copilot 2026 comparison.
Tags:
Share this article:
Get Real-Time Help in Your Next Interview
OphyAI's AI Interview Copilot listens live on Zoom, Teams, and Meet — invisibly suggesting tailored answers based on your resume. 16x cheaper than Final Round AI. Free trial, no card required.
Related Articles
AI Interview Coach vs AI Interview Copilot: Which Do You Actually Need?
Understand the difference between AI interview coaches and AI interview copilots. Learn when to use each, whether you need both, and how OphyAI offers both tools for complete interview preparation.
Read more →
AI Interview Copilot: The Complete Guide for 2026
Everything you need to know about AI interview copilots — what they are, how they work, top tools compared, and how to use one ethically in your next interview.
Read more →
AI Interview Copilot for Software Engineer Interviews in the US
A practical guide to using AI interview copilot support for software engineers interviewing in the US, including how to stay natural, handle pressure, and prepare better with OphyAI.
Read more →