Non-functional Requirements in System Design Interviews

Functional requirements tell us the features we need to implement. Non-functional requirements, on the other hand, describe how the system should behave and the constraints under which it must operate. Non-functional requirements are critical aspects that determine how well a system operates under specific conditions, such as a high number of users. Handling a smaller user base, like 100 users, would require a simpler design than managing a system used by a million users. For instance, the design of Twitter's timeline for 100 users might involve pulling data from a database each time it's needed. However, this approach can quickly create a bottleneck when scaled to a million users, necessitating pre-fetching of data into the cache before a user accesses their timeline.

The most commonly non-functional requirements are availability, scalability, performance (latency and throughput), consistency.


Definition: Availability refers to the degree to which a system is operational and accessible when needed. It's typically expressed as a percentage of uptime over the total time.

How to Achieve in Design:

  • Use load balancers to distribute network traffic evenly across servers.
  • Implement a health monitoring system to detect failures promptly, and set up automated processes for failover and recovery.
  • Implement redundant hardware and software components, which can include multiple servers in different geographic locations, also known as availability zones or regions.


  • Amazon S3 achieves high availability by using redundant storage and automatic failover mechanisms.
  • Google Cloud Spanner uses replication and synchronous writes across multiple zones to ensure high availability.


Definition: Scalability is a system's ability to handle increased load without a significant drop in performance.

How to Achieve in Design:

  • Utilize stateless servers, allowing for the addition of more servers as demand increases (also known as auto-scaling and horizontal scaling).
  • Optimize database performance and use caching to reduce load.
  • Implement message queues for asynchronous processing, helping to manage the processing of tasks in the background and bridge the gap in processing speed between services.


  • Twitter uses a message queue system called Kestrel to help handle high volumes of tweets, showcasing effective scalability.
  • Netflix uses a combination of caching, partitioning, and load balancing to handle the massive load of streaming requests.


Definition: Latency is the delay before a data transfer begins after a request has been made.

How to Achieve in Design:

  • Optimize database queries and employ efficient algorithms and data structures.
  • Use caching to store and quickly retrieve frequently accessed or recently accessed data.
  • Implement Content Delivery Networks (CDNs) to serve static content closer to users, reducing the delay caused by the physical distance between the server and the client.
  • Optimize network performance and employ edge computing where appropriate to reduce the round trip time of requests.


  • Cloudflare uses a global CDN to reduce latency for its users, delivering content faster by serving it from locations closer to the end-user.
  • Google Search uses a variety of techniques including caching, efficient data structures, and algorithms to provide low latency results.


Definition: Consistency ensures that all nodes see the same data at the same time in a distributed system.

How to Achieve in Design:

  • Selecting the proper level of consistency. Depending on the system's needs, opt for a stronger or weaker consistency model. Strong consistency guarantees that all nodes see the same data at the same time, while eventual consistency allows for temporary inconsistencies between nodes.
  • Use database transactions or consensus algorithms in distributed systems, ensuring all nodes agree on the state of the system. However, this comes at a cost of higher complexity.


  • Distributed databases like Apache Cassandra can be configured for strong or eventual consistency depending on the needs of the application. This flexibility enables applications to balance consistency needs with performance and availability considerations.
  • Amazon DynamoDB uses eventual consistency by default but also offers strong consistency options depending on application requirements.

Here’s a table summarizing what we have learned:

Non-functional RequirementDefinitionTechnologies to Achieve
AvailabilityOperational and accessible when needed. % uptime/total time.Load Balancers, Data Replication, Availability Zones, Monitoring
ScalabilityHandle increased load without performance drop.Stateless Servers, Caching, Message Queues, Data Partitioning
LatencyTime taken to respond to requests.DB Query Optimization, Algorithms, Caching, CDNs, Edge Computing
ThroughputWork or transactions handled in a given time frame.DB Query Optimization, Algorithms, Caching, CDNs, Edge Computing
ConsistencyAll nodes see the same data at the same time.Consistency Level, Distributed DB Transactions, Consensus Algos

System Design Components and Non-functional Requirements

Here’s another view, mapping non-functional requirements to commonly used system design components. We will dive deep into each of the non-functional requirements in the following articles.

Non-functional RequirementLoad BalancingCachingPartitioningReplicationMessage QueueBatch Processing
Consistency (affected by)XXX

Note that for each of the rows, if there’s a ‘x’, it means the technologies can help with the non-functional requirement except the last row “consistency”. The consistency requirement is really about picking the right level of consistency and then choosing technologies to build it to satisfy the consistency requirement. Checking this row means whether the technology affects consistency.

System Design Components

Load Balancing

Load balancing distributes network or application traffic across a number of servers. This can improve availability by ensuring no single server becomes a bottleneck or point of failure. It also aids scalability by allowing systems to handle increased traffic by distributing the load. Furthermore, load balancing can improve latency by reducing the time it takes for a server to respond to a request because the load is evenly distributed. How it affects non-functional requirements:

  • Availability: Load balancing improves availability by distributing network traffic across multiple servers, eliminating single points of failure. If one server goes down, the load balancer redirects traffic to the remaining operational servers.
  • Scalability: By evenly distributing load, load balancers allow more requests to be served simultaneously, supporting system scaling.
  • Latency: Load balancing can reduce latency by ensuring that no individual server is overwhelmed with traffic, maintaining quick response times.


Caching is the process of storing a copy of data in a temporary storage area (cache) so future requests for that data are served up faster. Caching contributes to scalability by reducing the load on the database or primary data source, thus allowing the system to serve more users. It can also improve latency by reducing the time it takes to fetch data. However, caching could pose challenges to maintaining consistency, especially in a distributed system, if not properly managed.

How it affects non-functional requirements:

  • Availability: Caching can indirectly enhance availability by reducing the load on the system, which can help prevent system overloads or crashes.
  • Scalability: By storing frequently accessed data and serving it quickly, caching reduces load on the primary data source, which supports system scaling.
  • Latency: By serving stored data much more quickly than the primary data source could, caching significantly reduces data retrieval times, thereby reducing latency.
  • Consistency: Caching can lead to the issue of stale data. This occurs when the original data in the database is updated, but the cached version remains unchanged. As a result, users may receive outdated information, creating a discrepancy between what is stored in the cache and the current data in the database, leading to inconsistencies.


Partitioning (or sharding) is the process of dividing a database into smaller, more manageable pieces. It can significantly boost scalability by allowing a system to store and process more data than a single DBMS could handle. Partitioning can also help with latency by reducing the time it takes to query large databases. However, it can create challenges for consistency if different parts of the data need to be kept in sync across partitions.

How it affects non-functional requirements:

  • Availability: Partitioning improves availability by ensuring that even if a subset of the data is inaccessible due to failures in one partition, other partitions can still serve their data. In addition, if partitioning is combined with replication, even if one partition fails, a replica can continue to serve the data.
  • Scalability: Partitioning enhances scalability as the data is divided among multiple nodes or servers, allowing the system to handle more requests concurrently. As the system grows, new partitions can be added to distribute the data further.
  • Latency: By keeping related data in the same partition, partitioning can help reduce latency. This is because queries can be routed to the specific partition where the data resides, avoiding the need to search the entire dataset.
  • Throughput: By keeping related data in the same partition, partitioning can help reduce latency. This is because queries can be routed to the specific partition where the data resides, avoiding the need to search the entire dataset.
  • Consistency: Partitioning can make consistency more challenging to maintain, especially in a distributed system where data is partitioned across multiple nodes. However, with proper design and the use of protocols like two-phase commit, partitioning can be compatible with strong consistency. For example, data that needs to be consistently read and written can be kept in the same partition.


Replication involves creating and maintaining multiple copies of data. It contributes to availability by providing backup data sources if the primary source fails. Replication also aids scalability by allowing read requests to be distributed across multiple copies, reducing the load on any one server. However, like caching and partitioning, replication can pose challenges for consistency, especially in cases of write operations.

How it affects non-functional requirements:

  • Availability: By creating backup copies of data, replication enhances availability. If the primary data source fails, the system can continue operation using the backup data sources.
  • Scalability: By allowing read requests to be distributed across multiple copies of the data, replication can reduce the load on any one server, enhancing scalability.
  • Consistency: When we have several copies of the same data, we need to make sure they all change together. But this can be tricky. If one copy gets updated and the others don't, then people might see different things when they look at the data. We use different rules, like strong consistency, eventual consistency, and causal consistency, to help make sure that all the copies stay the same, or at least get updated eventually.

Message Queue

A message queue provides an asynchronous communications protocol, meaning that the sender and receiver of the message do not need to interact with the message queue at the same time. This can improve scalability by offloading tasks from the main application threads and handling them asynchronously. Message queues can help with availability and reliability, as they often persist messages until they are processed, ensuring that important tasks are not lost even if a component fails.

How it affects non-functional requirements:

  • Availability: Message queues often persist messages until they are processed, which means that important tasks are not lost even if a component fails, enhancing availability.
  • Scalability: Message queues can offload tasks from the main application threads and handle them asynchronously, which can help in managing larger volumes of tasks, thus supporting system scaling.

Batch Processing

Batch processing refers to the execution of a series of jobs all at once instead of individually. It improves throughput by allowing you to process large volumes of data at once, usually during off-peak times. This can be particularly useful for non-interactive jobs like data analysis or backups, where the time taken to complete the task is less critical. However, batch processing could increase latency for individual tasks within the batch, as they have to wait their turn for processing.

How it affects non-functional requirements:

  • Scalability: By grouping similar tasks together and processing them as a unit, batch processing can handle large volumes of data more efficiently than processing each task individually, enhancing scalability.

TA 👨‍🏫