System design starts with functional requirements: what features to build. Non-functional requirements define how the system behaves: response times, uptime guarantees, and user capacity. These constraints fundamentally shape architectural decisions.
Scale drives complexity
A chat app for 100 users needs basic request-response patterns. The same app for 100 million users requires caching layers, message queues, and distributed databases. Twitter's timeline demonstrates this clearly: simple database queries work for thousands of users, but millions require pre-computed feeds cached in Redis.
The big four constraints
Most systems optimize for four key non-functional requirements: availability (uptime), scalability (handling growth), latency (response speed), and consistency (data accuracy). Each constraint involves trade-offs that influence every architectural choice.
Availability: Staying Online
Availability measures system uptime as a percentage. 99.9% availability allows 8.77 hours of downtime per year, while 99.99% permits just 52.6 minutes. This difference drives completely different architectural approaches.
Redundancy prevents single points of failure
Production systems use multiple servers across different data centers. When one fails, traffic automatically routes to healthy instances. Netflix runs identical services across three AWS availability zones - if an entire zone fails, users keep streaming without interruption.
Health checks enable fast recovery
Load balancers continuously probe backend servers every few seconds. Failed health checks trigger automatic traffic rerouting within 10-30 seconds. Slack's load balancers detect server failures and redistribute websocket connections before users notice disconnections.
Geographic distribution reduces blast radius
Critical systems replicate across continents. When AWS's us-east-1 region experiences outages, services like Route 53 continue operating from other regions. This isolation prevents single-region failures from causing global outages.
Scalability: Handling Growth
Scalability means maintaining performance as load increases. The key insight: scaling out with many small servers typically beats scaling up with larger machines. This approach enables elastic growth and reduces failure impact.
Stateless services scale horizontally
Each server handles requests independently without storing user sessions or temporary data. This design allows adding or removing servers based on demand. Uber's ride-matching service runs hundreds of identical stateless containers that AWS auto-scaling launches during peak hours.
Caching reduces database pressure
Frequently accessed data stays in fast memory instead of slow disk storage. Reddit caches post rankings and user profiles in Redis, handling millions of requests without overwhelming their PostgreSQL database. Cache hit rates above 90% are common for read-heavy workloads.
Async processing smooths traffic spikes
Message queues decouple request handling from background processing. When users upload videos to YouTube, the upload completes immediately while transcoding happens asynchronously. This prevents encoding delays from affecting user experience during traffic surges.
Latency: Speed Matters
Latency measures response time - the delay between request and response. Users expect sub-second responses for web pages and millisecond responses for API calls. High latency kills user engagement and business metrics.
Geographic proximity reduces network delay
Content Delivery Networks place static assets near users. Cloudflare serves images and CSS files from 270+ global locations, reducing average latency from 200ms to 20ms for distant users. Physical distance between servers and users creates unavoidable delays - light travels at a fixed speed.
Database optimization eliminates bottlenecks
Slow queries kill performance. Instagram optimized their photo feed queries from 1 second to 50ms by adding database indexes and reducing joins. Query profiling reveals which operations consume the most time, enabling targeted optimizations.
Edge computing processes data locally
Instead of sending all requests to central servers, edge nodes handle computations near users. Fastly runs customer code at edge locations, processing API requests without round-trips to origin servers. This approach reduces latency by 10x for globally distributed applications.
Consistency: Data Accuracy
Consistency ensures all system replicas show identical data. In distributed systems, this becomes complex because network partitions and server failures can create temporary disagreements between nodes.
Strong consistency sacrifices availability
Bank account balances require immediate consistency across all servers - users can't withdraw the same $100 twice. Traditional banks use synchronous replication where all database replicas must confirm writes before transactions complete. This guarantees accuracy but creates single points of failure.
Eventual consistency enables scale
Social media systems tolerate temporary inconsistencies for better performance. When you post on Facebook, different users might see the post at slightly different times as updates propagate across data centers. The system eventually converges to a consistent state within seconds.
Choosing the right consistency model
Most systems mix consistency levels by operation type. Amazon's shopping cart uses eventual consistency (items might briefly disappear then reappear), but payment processing requires strong consistency (charges must be exact). This hybrid approach optimizes for both user experience and data correctness.
Design Patterns for Non-Functional Requirements
Different architectural patterns address specific non-functional requirements. Understanding these relationships helps choose appropriate solutions for system constraints.
Requirement | Core Strategy | Key Technologies | Production Example |
---|---|---|---|
Availability | Eliminate single points of failure | Load balancers, multi-region deployment, health checks | Netflix: 3 availability zones per service |
Scalability | Scale horizontally with stateless services | Auto-scaling, caching, message queues | Uber: Container orchestration for demand spikes |
Latency | Minimize network hops and processing time | CDNs, database indexes, edge computing | Instagram: 50ms photo feed queries |
Consistency | Choose appropriate consistency model | Synchronous replication, consensus algorithms | Amazon: Strong for payments, eventual for recommendations |
Trade-offs shape architecture
These requirements often conflict. Strong consistency reduces availability during network partitions. Low latency increases complexity and cost. Successful systems prioritize requirements based on business needs rather than trying to optimize everything equally.