Why Caching?

Computer system latency numbers

To understand why we’d need caching in the first place, let’s review the latency numbers for common storage devices:

execute typical instruction1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory0.5 nanosec
branch misprediction5 nanosec
fetch from L2 cache memory7 nanosec
Mutex lock/unlock25 nanosec
fetch from main memory100 nanosec
send 2K bytes over 1Gbps network20,000 nanosec
read 1MB sequentially from memory250,000 nanosec
fetch from new disk location (seek)8,000,000 nanosec
read 1MB sequentially from disk20,000,000 nanosec
send packet US to Europe and back150 milliseconds = 150,000,000 nanosec


The gist of the story is

  • CPU cache < memory < SSD < disk
  • Reading from memory is > 50x faster than reading from disk which is where traditional databases store data

This really makes a good case for using memory as data store to speed up our applications.

Why Performance(latency) matters

Speed is a very important business for modern websites. If you website or app is slow, people would leave. Google's research has found that as page load time goes from 1 to 3 seconds, the probability of a bounce increases by 32%. As load time goes from 1 to 5 seconds, the bounce rate increases by 90%. This highlights the significance of having a fast-loading site to prevent users from leaving before interacting with the content. According to a study by Akamai, a 100-millisecond delay in website load time can hurt conversion rates by 7%. Additionally, a 2-second delay in load time can lead to an abandonment rate of up to 87%.

Reading from memory is 50x faster than reading from SSD. Reading from memory gives us sub-millisecond respond time vs milliseconds or even seconds from data source like databases. Here’s the comparison table of disk vs memory

FeatureDisk StorageIn-Memory Storage
SpeedSlower. Disk I/O operations can take significantly more time.Faster. Accessing data in memory is orders of magnitude faster than disk.
CostLess expensive. Traditional hard drives and even SSDs are relatively cheap.More expensive. The cost per GB is much higher for RAM.
Data persistencePersistent. Data stored on a disk remains there even if the power is turned off.Non-persistent. Data in memory is typically lost when the power is turned off, unless the system is designed with persistence in mind (like some databases with snapshotting or journaling mechanisms).
CapacityHigh capacity. Disks can store terabytes of data.Lower capacity. The amount of data that can be stored is limited by the size of the RAM.
UsageBest for long-term storage and large databases where speed is less critical.Best for temporary storage, caching, and real-time computations where speed is critical.
Energy consumptionLower. Disks require less energy to maintain data.Higher. Maintaining data in memory requires continuous power.
Data volatilityNon-volatile. Data remains intact after a power cycle.Volatile. Data is lost upon power loss or reboot (unless it's made persistent through specific technologies).
Access methodBlock access.Random access.
ThroughputLower. Disk drives, especially mechanical hard drives, can handle fewer I/O operations per second compared to memory.Higher. Memory can handle a significantly higher number of I/O operations per second.
Data modelSuitable for structured data models due to slower access times, but can accommodate a variety.Suitable for both structured and unstructured data models due to fast access times. Great for data that requires frequent read/write operations.
Performance bottleneckDisk I/O operations can be a significant bottleneck, especially for workloads that require frequent read/write operations.Memory size can be a bottleneck. While read/write operations are fast, the amount of data that can be stored and accessed is limited by the size of the available memory.

What is caching?

Caching is a technique used in to improve query read performance of a system by storing frequently accessed data in a faster memory storage area, such as RAM. When a program requests data that is stored in the cache, it can be accessed much faster than if it had to be retrieved from slower storage, such as a hard drive. This can significantly improve the overall performance of the system and reduce the time it takes for programs to access data.


Advantages of Caching

  • As mention earlier, caching improves query response time. Reading from memory is much faster than reading from disk so users will be able to get their response faster.
  • Additionally, caching relieves pressure from services like databases. Reducing the number of read requests from the web server to the database also relieves pressure from the database which means it will be able to support more web servers.
  • Finally, by utilizing faster cache memory, web servers can retrieve information more quickly and efficiently, allowing them to handle a greater volume of requests.

Take note that caching is a broad concept, as it can be applied at multiple levels within a computer system, such as the CPU, disk, and network levels. For instance, CPU caches consist of tiny memory units adjacent to the processing core, which accelerates CPU performance. However, when discussing web services and system design, we focus on a higher level. Typically, caching refers to utilizing memory to store information temporarily, thus preventing the need to query databases or access slower media and file systems.

Also note that we used database as data source behind the cache as it is the most common use case. In realty, the data source could be anything. For example, data stored as files in the file system, object storage, or external API responses.

Here’s a more concrete example of caching: consider an e-commerce store like this, you have product recommendations, latest products, main product listings among other things. Each of the section calls some service on the backend which access some sort of database like storage. So each page refresh requires accessing multiple databases. All the systems are given pressure each time someone access the webpage. If the information hasn’t changed, we shouldn’t really need to go to the databases each refresh. What we should really do is to cache the results and only update the cache when it changes.

Common caching technologies

Redis and Memcached are both popular in-memory data storage systems, often used for caching and improving the performance of web applications. Memached is the OG in caching technology. Facebook started using Memcached back in 2010s. The other more recent and popular technology is Redis.

Redis vs Memcached

The main difference is Memcached deals with binary blobs whereas redis has rich data structures. There are some more subtle differences:

Data StructuresStrings, lists, sets, sorted sets, hashes, bitmapsKey-value pairs (strings or binary data)
PersistenceOptional (can save data to disk and recover after restart)No (in-memory only)
Atomic OperationsYes (e.g., increments, list manipulations)No
Pub/SubYes (built-in publish/subscribe messaging system)No
High AvailabilityYes (with Redis Sentinel and Redis Cluster)No (third-party solutions available)
ComplexityMore features and data structures, more complexSimple and straightforward
Cache Eviction PolicyConfigurable (e.g., LRU, LFU, volatile, or all keys)Least Recently Used (LRU)
Use CasesAdvanced data structures, real-time applications, complex caching etcSimple caching, session storage
Companies UsingTwitter, GitHub, Stack OverflowFacebook, YouTube, Reddit

Don’t worry if you haven’t learned about concepts like cache eviction policy and high availability. We’ll discuss these later in this article.

Why Caching can be complicated

There are only two hard things in Computer Science: cache invalidation and naming things.

- Phil Karlton

Caching by itself sounds like a very simple concept. However, correctly setting up caching can be surprisingly tricky. As Phil Karlton pointed out, cache consistency is tricky. When there are multiple users or systems accessing the same data, it's essential to keep the cached data consistent. This means making sure everyone sees the same information, even when updates are happening. Without cache consistency, users might see outdated data, leading to errors or poor user experiences. When data is updated, the system marks the relevant cache entries as invalid or outdated.

Let’s look at these in details.

Caching Policies

  1. Cache Writing Policies: These policies determine when the application should write data to the cache.
    • Write-Through: The cache and the database are updated at the same time. This will ensure the cache and the database have the same data all the time but will slow down the write operation since it has to wait for both writes to succeed.
    • Write-Back/Write-Behind: Data is first written to cache and the write to the database is done later. The disadvantage is the potential inconsistency between cache and database since writing to database happens asynchronously.
    • Write-Around: Data is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write I/O that will not subsequently be re-read, but has the disadvantage that a read request for recently written data will create a “cache miss” and must be read from slower back-end storage and experience higher latency.
  2. Cache Reading Policies: These policies decide how to read data.
    • Cache-Aside/Lazy Loading: Data is loaded into the cache on demand, i.e., only when a cache miss occurs.
    • Read-Through: The cache takes responsibility for reading data from the database if it doesn't have the data.
  3. Cache Eviction Policies: These policies decide which item should be removed from the cache when the cache is full and a new item needs to be added. LRU and TTL fall into this category.
    • LRU: Discards the least recently used items first.
    • TTL: Data is kept or removed based on a time limit.

Let’s discuss each one in details with examples.

Data Loading Patterns

In practice, the two primary caching patterns that are most often used are:

  • lazy loading, aka Cache-aside Caching, a reactive approach
  • preloading, aka Write-through Caching, a proactive approach

Cache-aside (Lazy loading) Pattern

Lazy loading, also known as cache-aside, is the primary pattern used and involves attempting to read from the cache first. If the information is not in the cache, it is considered a cache miss and the system reads from the source data store.


  1. Attempts to read from cache: When the application needs data, it first checks the cache to see if it's already available.
  2. If not found, cache miss: If the data is not found in the cache, this is called a cache miss.
  3. Read from data source and save in memory: When a cache miss occurs, the application fetches the data from the main data source (e.g., a database), saves it in the cache, and returns it to the user.


Let's create a Flask server that connects to the PostgreSQL database and uses Redis for caching with lazy loading pattern:

Imagine we have users and comments data stored in PostgreSQL like this:

CREATE TABLE users ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, email VARCHAR(255) NOT NULL UNIQUE ); INSERT INTO users (name, email) VALUES ('Alice', ''), ('Bob', ''), ('Charlie', ''); CREATE TABLE comments ( id SERIAL PRIMARY KEY, content TEXT NOT NULL, user_id INTEGER REFERENCES users(id) ); INSERT INTO comments (content, user_id) VALUES ('Great post!', 1), ('Interesting read.', 1), ('Thanks for sharing!', 2), ('I learned something new today.', 2), ('Very insightful!', 3);

And our app uses this query to get user orders

SELECT,,,, comments.content FROM users JOIN comments ON = comments.user_id WHERE = %s AND = %s;

Now let’s use the query as cache key and implement easy loading.

from flask import Flask, jsonify, request import redis import psycopg2 import json app = Flask(__name__) # Connect to Redis cache = redis.Redis(host='localhost', port=6379, db=0) # Database configuration db_config = { "dbname": "your_database_name", "user": "your_user", "password": "your_password", "host": "your_host", "port": "your_port" } def fetch_data_from_database(name, email): conn = psycopg2.connect(**db_config) cursor = conn.cursor() query = ''' SELECT,,,, comments.content FROM users JOIN comments ON = comments.user_id WHERE = %s AND = %s; ''' cursor.execute(query, (name, email)) data = cursor.fetchall() conn.close() return data def get_data(name, email): # Generate a cache key from the query parameters cache_key = f"comments:name={name}:email={email}" data = cache.get(cache_key) # Cache miss if data is None: print(f"Fetching data for name={name} and email={email} from database...") data = fetch_data_from_database(name, email) if data: cache.set(cache_key, json.dumps(data), ex=600) # Set a TTL of 600 seconds else: print(f"No comments found for name={name} and email={email}") return None # Cache hit else: print(f"Fetching data for name={name} and email={email} from cache...") data = json.loads(data.decode('utf-8')) return data @app.route('/comments') def get_comments(): name = request.args.get('name', '') email = request.args.get('email', '') data = get_data(name, email) if data: return jsonify([ { "user_id": row[0], "user_name": row[1], "user_email": row[2], "comment_id": row[3], "comment_content": row[4] } for row in data ]) else: return jsonify({"error": f"No comments found for name={name} and email={email}"}), 404 if __name__ == '__main__':,,,, comments.content FROM users JOIN comments ON = comments.user_id WHERE = %s AND = %s

Notice that there are several key concepts related to caching in the code:

  1. Cache Key: The cache key is a unique identifier used to store and retrieve data from the cache. In our example, we generate a cache key based on the user's name and email address to ensure each unique combination has its cache entry. This allows us to differentiate between different queries and cache their results separately. The cache key is generated using the following format: comments:name={name}:email={email}.
  2. JSON Dump Serialization: When caching data, it's necessary to serialize the data into a format that can be stored and later deserialized back into the original data structure. In this example, we use JSON serialization to store the fetched data in the Redis cache. The json.dumps() function is used to convert the data (a list of tuples) into a JSON-formatted string before storing it in Redis. When retrieving data from Redis, we use json.loads() to convert the JSON-formatted string back into a list of tuples.
  3. Time-to-live (TTL): TTL is the duration for which the data should be stored in the cache before it expires and is removed. In our example, we set a TTL of 600 seconds for the cached data. This means that after 600 seconds, the data in the cache is considered stale and is removed. When a request for the same data is made after the TTL expires, the data is fetched from the PostgreSQL database again and stored in the cache with a new TTL. To set the TTL for the cached data, we use the ex parameter when calling cache.set(). In our example, we set the TTL as follows: cache.set(cache_key, json.dumps(data), ex=60).
  4. Cache Hit: A cache hit occurs when the requested data is found in the cache. In our example, when a request is made for comments associated with a user's name and email address, we first check if the data is available in the Redis cache. If it's found, we have a cache hit, and the data is retrieved from the cache instead of fetching it from the PostgreSQL database. Cache hits help improve the performance of the application, as fetching data from the cache is typically faster than fetching it from the primary data source (e.g., a database).
  5. Cache Miss: A cache miss occurs when the requested data is not found in the cache. In our example, if the data for a specific user's name and email address is not available in the Redis cache, we have a cache miss. In this case, the application fetches the data from the PostgreSQL database, stores it in the cache for future use, and serves the data to the requester. Cache misses can happen for various reasons, such as the data not being cached yet, the cache entry having expired (due to the TTL), or the cache having evicted the data to make room for other entries.

Note that for lazy loading, write only happens when we try to fetch a record that doesn't exist in the cache.

Lazy loading advantages:

  • Avoiding writing unnecessary information into the cache: Lazy loading ensures that only data that has been requested is stored in the cache. This means that the cached information is likely to be relevant and have a higher chance of being queried again.
  • Flexibility in cache population: You can populate the cache at any time, as data is added to the cache only when it's requested. This allows for more efficient cache management.

Lazy loading disadvantages:

  • Large initial cache miss cost: Lazy loading can result in a significant initial cache miss, which may be too expensive (in terms of time) for some services. For example, each cache miss might take a few milliseconds or even seconds, depending on the data source. This can put the main data source, such as a database, under high load. Whether this is acceptable depends on the specific application and its requirements.

Write-through Pattern (pre-loading)

Write-through caching is a caching pattern where data is written to the cache and the primary data source (e.g., a database) simultaneously. In this pattern, when new data is created or existing data is updated, the cache is updated in parallel with the primary data source to ensure consistency between the cache and the primary data source. Here are the main steps involved in write-through caching:

  1. When data is requested, check if the data is available in the cache.
  2. If the data is found in the cache (cache hit), return it to the requester.
  3. If the data is not found in the cache (cache miss), fetch it from the primary data source, store it in the cache, and return it to the requester.
  4. When data is created or updated, write the data to both the primary data source and the cache.



  • Ensures data consistency between the cache and the primary data source.
  • Reduces the chances of serving stale data, as the cache is updated in tandem with the primary data source.


  • Slower write performance compared to other caching strategies, as writes must be performed on both the cache and the primary data source.

Now, let's update the Flask server to implement write-through caching for adding a new comment. The important points in the code are explained in the comments:

#... (imports and previous code) @app.route('/comments', methods=['POST']) def create_comment(): user_id = request.form.get('user_id') content = request.form.get('content') if not user_id or not content: return jsonify({"error": "Both user_id and content are required"}), 400 # Step 4: Write data to the primary data source and the cache conn = psycopg2.connect(**db_config) cursor = conn.cursor() query = ''' INSERT INTO comments (content, user_id) VALUES (%s, %s) RETURNING id; ''' cursor.execute(query, (content, user_id)) comment_id = cursor.fetchone()[0] conn.commit() conn.close() # Update the cache for the user cache_key = f"user_comments:user_id={user_id}" cached_data = cache.get(cache_key) if cached_data: # Deserialize the JSON-formatted data data = json.loads(cached_data.decode('utf-8')) # Add the new comment to the data data.append({ "comment_id": comment_id, "comment_content": content }) # Serialize and store the updated data in the cache cache.set(cache_key, json.dumps(data)) return jsonify({"success": f"Comment with id {comment_id} created"}), 201 #... (main function)

In this example, we've added a new route (/comments) to handle POST requests for creating a new comment. When a comment is created, the data is written to both the PostgreSQL database and the Redis cache (if the cache already contains data for the user). By updating the cache in parallel with the primary data source, we ensure that the cached data remains consistent with the data in the PostgreSQL database.

Cache-aside vs Write-through

Lazy loading is easier to implement and provides immediate benefits, while write-through ensures that the cache is always up-to-date. Both strategies can be used together for a fast and powerful caching system. Here’s a side-by-side comparison:

Data LoadingData is loaded into the cache only when it is requested and not found in the cache (cache miss).Data is written to both the cache and the primary data source simultaneously when created or updated.
Cache HitData is returned from the cache if exists.Data is returned from the cache if exists.
Cache MissData is fetched from the primary data source, stored in the cache, and then returned to the requester.Data is fetched from the primary data source, stored in the cache, and then returned to the requester.
Data ConsistencyData consistency can be harder to maintain, as the cache is updated only when data is requested and not found in the cache.Ensures better data consistency between the cache and the primary data source, as data is updated in the cache when it is written to the primary data source.
Write PerformanceWrite performance is generally faster, as data is written only to the primary data source.Write performance can be slower, as data needs to be written to both the cache and the primary data source.
Stale DataThere is a higher risk of serving stale data, as the cache is updated only when data is requested and not found in the cache.Reduces the chances of serving stale data, as the cache is updated in tandem with the primary data source.
Cache Space UtilizationCan lead to more efficient cache space utilization, as only frequently requested data is stored in the cache.Cache space utilization might be less efficient, as all data is stored in the cache, regardless of its access frequency.

Each caching pattern has its own advantages and disadvantages, and the choice between the two depends on the specific requirements and characteristics of the application.

  • Data-intensive applications such as social network timelines typically require a write-through cache. As demonstrated in the Twitter example discussed in the first chapter, retrieving tweets from a database for each individual user can put a significant strain on the system. Therefore, it is often preferable to store the results in a cache as an "inbox" for each user.
  • In a lot of the cases, a combination of both patterns might be the best solution.

Cache Eviction Policies

Cache by definition is in-memory and needs to be fast. And we can only be fast if we keep the size small.

Cache eviction policies are the rules that a caching system uses to determine which items to remove when the cache is full and new data needs to be accommodated.

Least Recently Used (LRU): This policy evicts the item that has not been accessed for the longest period of time. This is based on the assumption that if an item has not been accessed recently, it's less likely to be accessed in the near future. It's widely used because of its simplicity and performance. However, it can perform poorly if frequently accessed items are followed by long periods of inactivity.

Example: A web browser cache uses LRU. When you visit a webpage, the browser stores images, scripts, stylesheets, and other resources from the page in its cache. When the cache fills up, the browser uses the LRU policy to decide which items to remove, getting rid of items from webpages you haven't visited in a while.

Most Recently Used (MRU): The opposite of LRU, this policy evicts the most recently used items first. It's not used as often as LRU, but it can be effective in certain scenarios, such as when the newer an item is, the less likely it is to be accessed again soon.

Example: MRU could be used in a music streaming app like Spotify. Songs that are currently in the listening queue or most recently played might be kept in cache, while the one that was just played (the most recently used item) could be the first to be evicted if the cache needs more space.

Least Frequently Used (LFU): This policy evicts the item that is least frequently accessed. It uses frequency as the metric, which can make it more effective than LRU in scenarios where the access pattern is more bursty. However, it has the downside of potentially keeping older, infrequently used data if a burst of different data is accessed at some point.

Example: An e-commerce app like Amazon might use LFU for product recommendations. Products that are often viewed or purchased would stay in cache, ensuring quick loading times for popular items. Less popular items (i.e., least frequently used) would be the first to be evicted from the cache.

Time To Live (TTL): This policy evicts data after a certain duration of time. The duration can be set at the time the data is written into the cache. This is useful in scenarios where data becomes stale or irrelevant after a certain period of time.

Example: TTL is often used in caching weather app data. Weather forecasts change regularly, so a weather app might cache the forecast for a location with a TTL. When the TTL expires, the app would remove the forecast from the cache and fetch the updated data.

Random Replacement (RR): This policy randomly evicts items from the cache. It's the simplest of all policies but doesn't take into account any information about the access pattern or frequency. This policy is typically less effective than other policies, but it might be useful in certain scenarios where other, more complex policies are not needed or the overhead of tracking usage patterns is not justifiable.

Example: RR might be used in a news app where articles are being continuously updated, and the cache is used for temporarily storing articles for quick access. Since articles get old pretty fast and are replaced by newer articles, the cache could use an RR strategy to randomly replace cached articles when new ones need to be cached.

Here’s a comparison table:

Least Recently Used (LRU)Evicts the data that has not been accessed for the longest timeA web browser cache removing data from webpages you haven't visited in a while
Most Recently Used (MRU)Evicts the most recently accessed dataA music streaming app like Spotify discarding the song just played from the cache
Least Frequently Used (LFU)Evicts the least often accessed dataAn e-commerce app like Amazon removing less popular (infrequently viewed) products from cache
Time To Live (TTL)Evicts data after a specific lifespan has passedA weather app removing cached forecasts after they've expired
Random Replacement (RR)Evicts random data when the cache is fullA news app randomly replacing cached articles when new ones need to be cached


We can use techniques from the replication and partitioning sections to scale Redis.

Replicate to Scale Reads

Redis supports primary-secondary replication, where one Redis server (the primary) is mirrored by one or more Redis servers (secondaries). All write operations are performed on the primary, while read operations can be asynchronously distributed across the secondaries. This setup helps distribute read traffic and improve read performance. As the number of secondaries increases, the system can handle a higher volume of read requests. This can be particularly useful for read-heavy workloads or when the application needs to serve a large number of clients.

Partition to Scale Writes

Partitioning involves dividing the data set into smaller, manageable partitions (shards) and distributing them across multiple Redis instances. Each instance is responsible for a subset of the data, which allows for better write performance and horizontal scaling. With partitioning, the system can handle a higher volume of write operations as each Redis instance is responsible for a smaller subset of the data. To distribute data evenly across the Redis instances, consistent hashing can be used. This technique minimizes the amount of data movement when adding or removing instances, which helps maintain performance and reduce operational overhead.

Replicate + Partition for High Availability

Redis Cluster mode combines replication and partitioning to create a highly available and scalable solution. In this setup, data is partitioned across multiple master nodes, and each master node has one or more slave nodes for replication. By combining replication and partitioning, Redis Cluster provides a solution that can scale both reads and writes while maintaining high availability. This makes it suitable for demanding applications with large data sets and high traffic loads. In case of a master node failure, Redis Cluster can automatically promote one of the slave nodes to become the new master. This ensures data availability and allows the system to continue serving requests even in the event of node failures.


Redis Cluster mode's Replication + Partition

Time-to-Live (TTL)

What is TTL

As mentioned in the cache eviction section, TTL is the duration for which the data should be stored in the cache before it expires and is removed.

We do we need TTL

Time to live is used to get rid of data that might stay around for too long, ensuring that the cache remains up-to-date and efficient.

How to decide what TTL to use

Redis does not have a default time to live, but it can be assigned during creation and modification or after the fact.

It's important to set the time to live (TTL) based on what you're doing. Redis does not have a default TTL. Objects are by default permanent. You can set it for a fixed time or have it expire after a certain amount of time. It's essential to consider the nature of the data and how frequently it is accessed or updated. Choose a TTL that ensures data remains relevant without causing unnecessary evictions and cache misses. Here are some examples of picking TTL:

Use CaseTTL ValueExplanation
User session data1800s (30 minutes)Matches session timeout, removing data when no longer needed.
API rate limiting3600s (1 hour)Matches rate limiting window, resetting the counter automatically.
Cached search results300s (5 minutes) - 86400s (24 hours)Depends on data update frequency and user experience requirements.
Configuration dataNone or 2592000s (30 days)Rarely changing data, stored until explicitly updated or removed.
Weather data600s (10 minutes)Matches weather data update frequency, ensuring up-to-date information.

TA 👨‍🏫