System Design Fundamentals: A Beginner's Guide to Building Scalable Systems

Q: ### Do I need to know system design for job interviews?

Yes. According to interviewing.io, **system design is the most weighted round** in senior engineering interviews at FAANG companies. For junior roles, understanding the fundamentals (this guide) is enough. For senior roles, you need to design systems end-to-end.

Q: Should I start with microservices or a monolith?

Start with a **monolith**. Microservices add significant operational complexity (deployment, monitoring, inter-service communication). Build a well-structured monolith first, then extract services as needed when you hit specific scaling or team-size bottlenecks.

Q: How much traffic can a single server handle?

A well-optimized Node.js server can handle **10,000-50,000 requests per second** for simple API endpoints. A PostgreSQL database can handle **10,000-20,000 queries per second** depending on query complexity. These numbers cover the needs of most applications.

Q: What is the best resource to learn system design?

Start with **"Designing Data-Intensive Applications" by Martin Kleppmann** — it is the gold standard. For interview prep, "System Design Interview" by Alex Xu is excellent. For free resources, the system design roadmap at roadmap.sh is comprehensive.

TL;DR: System design is the practice of deciding how different parts of a software system fit together — where data is stored, how requests flow, and how the system handles growth. This guide covers the essential building blocks: load balancers, caches, databases, message queues, CDNs, and API design. Whether you are preparing for interviews or building real systems, these fundamentals apply everywhere.

What is System Design?

System design is structured problem-solving for software. You break a problem into parts, assign responsibilities to each part, and define how they communicate. Every application you use — Google Search, Netflix, WhatsApp — is a system made up of these fundamental components working together.

Google processes over 8.5 billion searches per day. Netflix serves 260+ million subscribers across 190 countries. These systems did not start complex — they evolved from simple architectures by applying the same principles covered in this guide.

The Building Blocks of System Design

Every scalable system is built from a combination of these core components:

Component	What It Does	Real-World Analogy
Load Balancer	Distributes traffic across multiple servers	A receptionist directing patients to available doctors
Cache	Stores frequently accessed data in fast memory	A sticky note on your desk with common phone numbers
Database	Persistent storage for application data	A filing cabinet with organized records
CDN	Serves static files from servers close to users	Local branches of a national library
Message Queue	Decouples services by buffering messages	A mailbox that holds letters until the recipient reads them
API Gateway	Single entry point for all client requests	A front desk that routes visitors to the right department
Reverse Proxy	Sits in front of servers, handling SSL, compression	A security guard who also carries your bags

Load Balancers

A load balancer distributes incoming requests across multiple servers so no single server gets overwhelmed.

Why it matters: Without load balancing, a single server handles all traffic. If it crashes, your entire application goes down. With load balancing, traffic is spread across multiple servers — if one fails, the others continue serving users.

Load Balancing Algorithms

Round Robin — Sends requests to servers in order (1, 2, 3, 1, 2, 3...)
Least Connections — Sends to the server with the fewest active connections
IP Hash — Routes the same user to the same server (useful for session persistence)
Weighted Round Robin — Assigns more traffic to more powerful servers

# Nginx load balancer configuration example
upstream backend_servers {
    least_conn;
    server backend1.example.com:3000 weight=3;
    server backend2.example.com:3000 weight=2;
    server backend3.example.com:3000 weight=1;
}
 
server {
    listen 80;
    location / {
        proxy_pass http://backend_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Trade-off: Load balancers add a network hop (slight latency increase) but dramatically improve reliability and throughput.

Caching

Caching stores frequently accessed data in a fast storage layer (usually in-memory) to avoid expensive database queries or API calls.

According to research by Amazon, every 100ms of latency costs 1% in sales. Caching is one of the most effective ways to reduce response times.

Caching Strategies

Strategy	How It Works	Best For
Cache-Aside	App checks cache first; on miss, fetches from DB and updates cache	General-purpose, most common
Write-Through	Every write goes to cache AND database simultaneously	Strong consistency needed
Write-Behind	Writes go to cache first, then async to database	High write throughput
Read-Through	Cache automatically fetches from DB on miss	Simplified application code

// Cache-Aside pattern example with Redis
import Redis from "ioredis";
 
const redis = new Redis();
 
async function getUserProfile(userId: string) {
  // Step 1: Check cache
  const cached = await redis.get(`user:${userId}`);
  if (cached) {
    return JSON.parse(cached);
  }
 
  // Step 2: Cache miss — fetch from database
  const user = await db.users.findById(userId);
 
  // Step 3: Store in cache with 1-hour expiry
  await redis.set(`user:${userId}`, JSON.stringify(user), "EX", 3600);
 
  return user;
}

Trade-off: Caching improves read performance dramatically but introduces cache invalidation complexity — one of the hardest problems in computer science.

Databases: SQL vs NoSQL

Choosing the right database is one of the most impactful system design decisions.

Feature	SQL (PostgreSQL, MySQL)	NoSQL (MongoDB, DynamoDB)
Data Model	Tables with rows and columns	Documents, key-value, graphs, or wide-column
Schema	Fixed schema, enforced	Flexible schema, can vary per document
Relationships	Strong (JOINs, foreign keys)	Weak (denormalized, embedded documents)
Scaling	Vertical (scale up the server)	Horizontal (add more servers)
ACID Compliance	Full ACID support	Varies (eventual consistency common)
Best For	Complex queries, transactions, relationships	High throughput, flexible data, rapid iteration
Examples	Banking, e-commerce, ERP systems	Social media feeds, IoT data, content management

The practical answer: Start with SQL (PostgreSQL is an excellent default). Move to NoSQL when you have a specific need — like storing millions of documents with varying structures, or when you need horizontal scaling beyond what a single database server can handle. If you go with MongoDB, make sure to secure your queries against NoSQL injection attacks.

Content Delivery Networks (CDNs)

A CDN is a network of servers distributed across the globe that caches and serves static content (images, CSS, JavaScript) from the server closest to the user.

Without CDN: A user in Tokyo requests an image from a server in New York — ~200ms round trip. With CDN: The same image is served from a CDN edge server in Tokyo — ~20ms round trip.

Major CDN providers include Cloudflare (serves ~20% of all web traffic), AWS CloudFront, and Vercel Edge Network (built into Next.js).

Trade-off: CDNs dramatically improve performance for static content but add complexity for dynamic content that changes frequently.

Message Queues

Message queues decouple services by allowing them to communicate asynchronously. Instead of Service A calling Service B directly (and waiting for a response), Service A puts a message in a queue, and Service B processes it when ready.

Without Queue:     User → API → Send Email → Wait... → Response (slow)
With Queue:        User → API → Queue Message → Response (fast)
                                    ↓
                              Email Worker → Send Email (async)

Popular message queues: RabbitMQ, Apache Kafka (handles trillions of messages per day at LinkedIn), Amazon SQS, Redis Pub/Sub.

// Message queue example with BullMQ (Redis-based)
import { Queue, Worker } from "bullmq";
 
// Producer: Add job to queue
const emailQueue = new Queue("emails");
 
async function handleContactForm(data: ContactFormData) {
  // Save to database (fast)
  await db.contacts.create(data);
 
  // Queue email sending (non-blocking)
  await emailQueue.add("send-welcome", {
    to: data.email,
    name: data.name,
  });
 
  return { success: true }; // Respond immediately
}
 
// Consumer: Process jobs from queue
const worker = new Worker("emails", async (job) => {
  await sendEmail({
    to: job.data.to,
    subject: `Thanks, ${job.data.name}!`,
    template: "welcome",
  });
});

Trade-off: Message queues improve responsiveness and reliability but add infrastructure complexity and make debugging harder (messages can get lost or duplicated).

API Design

APIs define how different parts of your system (and external clients) communicate. The two dominant styles in 2026:

REST — Resource-based, uses HTTP methods (GET, POST, PUT, DELETE). Widely understood and well-tooled.

GraphQL — Query-based, clients request exactly the data they need. Reduces over-fetching.

// REST API endpoint
// GET /api/users/123
// Returns: { id: 123, name: "Ajay", email: "...", posts: [...] }
 
// GraphQL query — client chooses what it needs
// query {
//   user(id: 123) {
//     name
//     posts { title }
//   }
// }

Practical recommendation: Use REST for most applications. Consider GraphQL when you have complex, nested data requirements and multiple client types (web, mobile, third-party).

The CAP Theorem

The CAP theorem states that a distributed system can guarantee at most two of three properties:

Consistency — Every read receives the most recent write
Availability — Every request receives a response
Partition Tolerance — The system continues working despite network failures

Since network partitions are unavoidable in distributed systems, the real choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance).

CP systems (e.g., MongoDB with majority write concern, HBase): Prioritize correct data over availability. If a partition happens, some requests may fail.

AP systems (e.g., Cassandra, DynamoDB): Prioritize availability over consistency. Every request gets a response, but data might be slightly stale.

Practical example: A banking system needs CP — you cannot show incorrect balances. A social media feed can use AP — showing a slightly stale feed for a few seconds is acceptable.

Horizontal vs Vertical Scaling

Vertical Scaling (Scale Up): Add more CPU, RAM, or storage to your existing server. Like upgrading from a sedan to a truck.

Horizontal Scaling (Scale Out): Add more servers. Like adding more sedans to a fleet.

Aspect	Vertical Scaling	Horizontal Scaling
Complexity	Low — upgrade hardware	High — distributed system design
Cost	Expensive at high end	Cost-effective with commodity hardware
Limit	Hardware ceiling	Virtually unlimited
Downtime	Often requires restart	Zero downtime possible
Data Consistency	Easy (single server)	Complex (distributed state)
Best For	Small-medium apps, databases	Large-scale web apps, microservices

Practical advice: Scale vertically first (it is much less complex). Switch to horizontal scaling when you hit the limits of a single machine or need high availability.

Putting It All Together

Here is how these components combine in a typical web application architecture:

User → CDN (static files)
     → Load Balancer
         → API Server 1 → Cache (Redis) → Database (Primary)
         → API Server 2 → Cache (Redis) → Database (Replica)
         → API Server 3 → Message Queue → Background Workers

CDN serves static assets (images, CSS, JS)
Load Balancer distributes API requests across servers
API Servers handle business logic
Cache stores hot data to reduce database load
Database with read replicas for scalability
Message Queue handles async tasks (emails, notifications)

FAQ

Do I need to know system design for job interviews?

Yes. According to interviewing.io, system design is the most weighted round in senior engineering interviews at FAANG companies. For junior roles, understanding the fundamentals (this guide) is enough. For senior roles, you need to design systems end-to-end.

Should I start with microservices or a monolith?

Start with a monolith. Microservices add significant operational complexity (deployment, monitoring, inter-service communication). Build a well-structured monolith first, then extract services as needed when you hit specific scaling or team-size bottlenecks.

How much traffic can a single server handle?

A well-optimized Node.js server can handle 10,000-50,000 requests per second for simple API endpoints. A PostgreSQL database can handle 10,000-20,000 queries per second depending on query complexity. These numbers cover the needs of most applications.

What is the best resource to learn system design?

Start with "Designing Data-Intensive Applications" by Martin Kleppmann — it is the gold standard. For interview prep, "System Design Interview" by Alex Xu is excellent. For free resources, the system design roadmap at roadmap.sh is comprehensive.

Resources

What is System Design?

The Building Blocks of System Design

Every scalable system is built from a combination of these core components:

Component	What It Does	Real-World Analogy
Load Balancer	Distributes traffic across multiple servers	A receptionist directing patients to available doctors
Cache	Stores frequently accessed data in fast memory	A sticky note on your desk with common phone numbers
Database	Persistent storage for application data	A filing cabinet with organized records
CDN	Serves static files from servers close to users	Local branches of a national library
Message Queue	Decouples services by buffering messages	A mailbox that holds letters until the recipient reads them
API Gateway	Single entry point for all client requests	A front desk that routes visitors to the right department
Reverse Proxy	Sits in front of servers, handling SSL, compression	A security guard who also carries your bags

Load Balancers

A load balancer distributes incoming requests across multiple servers so no single server gets overwhelmed.

Load Balancing Algorithms

Round Robin — Sends requests to servers in order (1, 2, 3, 1, 2, 3...)
Least Connections — Sends to the server with the fewest active connections
IP Hash — Routes the same user to the same server (useful for session persistence)
Weighted Round Robin — Assigns more traffic to more powerful servers

# Nginx load balancer configuration example
upstream backend_servers {
    least_conn;
    server backend1.example.com:3000 weight=3;
    server backend2.example.com:3000 weight=2;
    server backend3.example.com:3000 weight=1;
}
 
server {
    listen 80;
    location / {
        proxy_pass http://backend_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Trade-off: Load balancers add a network hop (slight latency increase) but dramatically improve reliability and throughput.

Caching

Caching stores frequently accessed data in a fast storage layer (usually in-memory) to avoid expensive database queries or API calls.

According to research by Amazon, every 100ms of latency costs 1% in sales. Caching is one of the most effective ways to reduce response times.

Caching Strategies

Strategy	How It Works	Best For
Cache-Aside	App checks cache first; on miss, fetches from DB and updates cache	General-purpose, most common
Write-Through	Every write goes to cache AND database simultaneously	Strong consistency needed
Write-Behind	Writes go to cache first, then async to database	High write throughput
Read-Through	Cache automatically fetches from DB on miss	Simplified application code

// Cache-Aside pattern example with Redis
import Redis from "ioredis";
 
const redis = new Redis();
 
async function getUserProfile(userId: string) {
  // Step 1: Check cache
  const cached = await redis.get(`user:${userId}`);
  if (cached) {
    return JSON.parse(cached);
  }
 
  // Step 2: Cache miss — fetch from database
  const user = await db.users.findById(userId);
 
  // Step 3: Store in cache with 1-hour expiry
  await redis.set(`user:${userId}`, JSON.stringify(user), "EX", 3600);
 
  return user;
}

Trade-off: Caching improves read performance dramatically but introduces cache invalidation complexity — one of the hardest problems in computer science.

Databases: SQL vs NoSQL

Choosing the right database is one of the most impactful system design decisions.

Feature	SQL (PostgreSQL, MySQL)	NoSQL (MongoDB, DynamoDB)
Data Model	Tables with rows and columns	Documents, key-value, graphs, or wide-column
Schema	Fixed schema, enforced	Flexible schema, can vary per document
Relationships	Strong (JOINs, foreign keys)	Weak (denormalized, embedded documents)
Scaling	Vertical (scale up the server)	Horizontal (add more servers)
ACID Compliance	Full ACID support	Varies (eventual consistency common)
Best For	Complex queries, transactions, relationships	High throughput, flexible data, rapid iteration
Examples	Banking, e-commerce, ERP systems	Social media feeds, IoT data, content management

Content Delivery Networks (CDNs)

A CDN is a network of servers distributed across the globe that caches and serves static content (images, CSS, JavaScript) from the server closest to the user.

Without CDN: A user in Tokyo requests an image from a server in New York — ~200ms round trip. With CDN: The same image is served from a CDN edge server in Tokyo — ~20ms round trip.

Major CDN providers include Cloudflare (serves ~20% of all web traffic), AWS CloudFront, and Vercel Edge Network (built into Next.js).

Trade-off: CDNs dramatically improve performance for static content but add complexity for dynamic content that changes frequently.

Message Queues

Without Queue:     User → API → Send Email → Wait... → Response (slow)
With Queue:        User → API → Queue Message → Response (fast)
                                    ↓
                              Email Worker → Send Email (async)

Popular message queues: RabbitMQ, Apache Kafka (handles trillions of messages per day at LinkedIn), Amazon SQS, Redis Pub/Sub.

// Message queue example with BullMQ (Redis-based)
import { Queue, Worker } from "bullmq";
 
// Producer: Add job to queue
const emailQueue = new Queue("emails");
 
async function handleContactForm(data: ContactFormData) {
  // Save to database (fast)
  await db.contacts.create(data);
 
  // Queue email sending (non-blocking)
  await emailQueue.add("send-welcome", {
    to: data.email,
    name: data.name,
  });
 
  return { success: true }; // Respond immediately
}
 
// Consumer: Process jobs from queue
const worker = new Worker("emails", async (job) => {
  await sendEmail({
    to: job.data.to,
    subject: `Thanks, ${job.data.name}!`,
    template: "welcome",
  });
});

Trade-off: Message queues improve responsiveness and reliability but add infrastructure complexity and make debugging harder (messages can get lost or duplicated).

API Design

APIs define how different parts of your system (and external clients) communicate. The two dominant styles in 2026:

REST — Resource-based, uses HTTP methods (GET, POST, PUT, DELETE). Widely understood and well-tooled.

GraphQL — Query-based, clients request exactly the data they need. Reduces over-fetching.

// REST API endpoint
// GET /api/users/123
// Returns: { id: 123, name: "Ajay", email: "...", posts: [...] }
 
// GraphQL query — client chooses what it needs
// query {
//   user(id: 123) {
//     name
//     posts { title }
//   }
// }

Practical recommendation: Use REST for most applications. Consider GraphQL when you have complex, nested data requirements and multiple client types (web, mobile, third-party).

The CAP Theorem

The CAP theorem states that a distributed system can guarantee at most two of three properties:

Consistency — Every read receives the most recent write
Availability — Every request receives a response
Partition Tolerance — The system continues working despite network failures

Since network partitions are unavoidable in distributed systems, the real choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance).

CP systems (e.g., MongoDB with majority write concern, HBase): Prioritize correct data over availability. If a partition happens, some requests may fail.

AP systems (e.g., Cassandra, DynamoDB): Prioritize availability over consistency. Every request gets a response, but data might be slightly stale.

Practical example: A banking system needs CP — you cannot show incorrect balances. A social media feed can use AP — showing a slightly stale feed for a few seconds is acceptable.

Horizontal vs Vertical Scaling

Vertical Scaling (Scale Up): Add more CPU, RAM, or storage to your existing server. Like upgrading from a sedan to a truck.

Horizontal Scaling (Scale Out): Add more servers. Like adding more sedans to a fleet.

Aspect	Vertical Scaling	Horizontal Scaling
Complexity	Low — upgrade hardware	High — distributed system design
Cost	Expensive at high end	Cost-effective with commodity hardware
Limit	Hardware ceiling	Virtually unlimited
Downtime	Often requires restart	Zero downtime possible
Data Consistency	Easy (single server)	Complex (distributed state)
Best For	Small-medium apps, databases	Large-scale web apps, microservices

Practical advice: Scale vertically first (it is much less complex). Switch to horizontal scaling when you hit the limits of a single machine or need high availability.

Putting It All Together

Here is how these components combine in a typical web application architecture:

User → CDN (static files)
     → Load Balancer
         → API Server 1 → Cache (Redis) → Database (Primary)
         → API Server 2 → Cache (Redis) → Database (Replica)
         → API Server 3 → Message Queue → Background Workers

CDN serves static assets (images, CSS, JS)
Load Balancer distributes API requests across servers
API Servers handle business logic
Cache stores hot data to reduce database load
Database with read replicas for scalability
Message Queue handles async tasks (emails, notifications)