Caching

Why Caching?

Computer system latency numbers

To understand why we’d need caching in the first place, let’s review the latency numbers for common storage devices:

execute typical instruction	1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory	0.5 nanosec
branch misprediction	5 nanosec
fetch from L2 cache memory	7 nanosec
Mutex lock/unlock	25 nanosec
fetch from main memory	100 nanosec
send 2K bytes over 1Gbps network	20,000 nanosec
read 1MB sequentially from memory	250,000 nanosec
fetch from new disk location (seek)	8,000,000 nanosec
read 1MB sequentially from disk	20,000,000 nanosec
send packet US to Europe and back	150 milliseconds = 150,000,000 nanosec

(source: http://norvig.com/21-days.html#answers)

The gist of the story is:

CPU cache < memory < SSD < disk
Reading from memory is more than 50x faster than reading from disk, which is where traditional databases store data.

The idea of caching is to store the data that will likely be requested in the near future in a faster storage so it can be served faster.

Why Performance (Latency) Matters

Speed is very important for modern applications. If your website or app is slow, people will leave. Google's research has found that as page load time goes from 1 to 3 seconds, the probability of a bounce increases by 32%. As load time goes from 1 to 5 seconds, the bounce rate increases by 90%. This highlights the significance of having a fast-loading site to prevent users from leaving before interacting with the content. According to a study by Akamai, a 100-millisecond delay in website load time can hurt conversion rates by 7%. Additionally, a 2-second delay in load time can lead to an abandonment rate of up to 87%.

What is Caching?

Caching is a technique used to improve query read performance of a system by storing frequently accessed data in a faster memory storage area, such as RAM. When a program requests data that is stored in the cache, it can be accessed much faster than if it had to be retrieved from slower storage, such as a hard drive. This can significantly improve the overall performance of the system and reduce the time it takes for programs to access data.

what-is-caching

Advantages of Caching

As mention earlier, caching improves query response time. Reading from memory is much faster than reading from disk so users will be able to get their response faster.
Additionally, caching relieves pressure from services like databases. Reducing the number of read requests from the web server to the database also relieves pressure from the database which means it will be able to support more web servers.
Finally, by utilizing faster cache memory, web servers can retrieve information more quickly and efficiently, allowing them to handle a greater volume of requests.

Non-memory caching: A Note on Terminology

Take note that caching is a broad concept, as it can be applied at multiple levels within a computer system, such as the CPU, disk, and network levels. For instance, CPU caches consist of tiny memory units adjacent to the processing core, which accelerates CPU performance. However, when discussing web services and system design, we focus on a higher level. Typically, caching refers to utilizing memory to store information temporarily, thus preventing the need to query databases or access slower media and file systems.

Also note that we used database as data source behind the cache as it is the most common use case. In realty, the data source could be anything. For example, data stored as files in the file system, object storage, or external API responses.

Let's briefly touch on some of the non-memory caching techniques that are used in web services.

Browser caching

Web browsers cache static assets like images, CSS, and JavaScript files to reduce server load and improve page load times. This is done by the browser itself, guided by the HTTP headers sent by the server.

CDN caching

Content Delivery Networks (CDNs) is a network of servers that deliver content to users based on their geographic location. Data is cached geographically closer to the end users. They reduce latency for global users and reduce load on the origin server. Cloudflare and Akamai are two popular CDNs. The typical rule of thumb is static assets and large files such as videos and images should be cached by the CDN to reduce latency and load on the origin server.

In the context of web services, caching is often referred to as in-memory caching and we will focus on this in the following sections.

Common In-memory Caching Technologies

Redis and Memcached are both popular in-memory data storage systems, often used for caching and improving the performance of web applications. Memached is the OG in caching technology. Facebook started using Memcached back in 2010s. The other more recent and popular technology is Redis.

Redis vs Memcached

The main difference is Memcached deals with binary blobs whereas redis has rich data structures. There are some more subtle differences:

Feature	Redis	Memcached
Data Structures	Strings, lists, sets, sorted sets, hashes, bitmaps	Key-value pairs (strings or binary data)
Persistence	Optional (can save data to disk and recover after restart)	No (in-memory only)
Atomic Operations	Yes (e.g., increments, list manipulations)	No
Pub/Sub	Yes (built-in publish/subscribe messaging system)	No
High Availability	Yes (with Redis Sentinel and Redis Cluster)	No (third-party solutions available)
Complexity	More features and data structures, more complex	Simple and straightforward
Cache Eviction Policy	Configurable (e.g., LRU, LFU, volatile, or all keys)	Least Recently Used (LRU)
Use Cases	Advanced data structures, real-time applications, complex caching etc	Simple caching, session storage
Companies Using	Twitter, GitHub, Stack Overflow	Facebook, YouTube, Reddit

Don’t worry if you haven’t learned about concepts like cache eviction policy and high availability. We’ll discuss these later in this article.

Challenges of Caching: Why It's Not As Simple As It Seems

1. Consitency

There are only two hard things in Computer Science: cache invalidation and naming things.

- Phil Karlton

Caching by itself sounds like a very simple concept. However, correctly setting up caching can be surprisingly tricky. As Phil Karlton pointed out, cache consistency is tricky. When there are multiple users or systems accessing the same data, it's essential to keep the cached data consistent. This means making sure everyone sees the same information, even when updates are happening. Without cache consistency, users might see outdated data, leading to errors or poor user experiences. When data is updated, the system marks the relevant cache entries as invalid or outdated.

There are a few patterns to handle cache consistency. We will explore these in detail in the following lessons.

2. Expiry and Eviction

Since caches operate in memory with limited size, we must decide:

When should items expire to prevent stale data?
What eviction strategy to use when the cache becomes full? Popular policies include LRU (Least Recently Used), FIFO (First In, First Out), or LFU (Least Frequently Used).

3. Fault Tolerance

Cache failures are inevitable in large-scale distributed systems. Because caches are in-memory, data can be lost during a failure. Your system should:

Fall back to the database or another source to serve requests.
Gracefully handle increased latency or reduced availability.
Implement mechanisms to rebuild the cache quickly once recovered.

Caching Patterns

With the challenges of caching in mind, let's discuss some of the common patterns to handle these challenges.

Cache Reading Patterns: These patterns decide how to read data.
- Cache-Aside/Lazy Loading: Data is loaded into the cache on demand, i.e., only when a cache miss occurs.
- Read-Through: The cache takes responsibility for reading data from the database if it doesn't have the data.
Cache Writing Patterns: These patterns determine when the application should write data to the cache.
- Write-Through: The cache and the database are updated at the same time.
- Write-Back/Write-Behind: Data is first written to cache and the write to the database is done later.
- Write-Around: Data is written directly to permanent storage, bypassing the cache.
Cache Eviction Patterns: These patterns decide which item should be removed from the cache when the cache is full and a new item needs to be added. LRU and TTL fall into this category.
- LRU: Discards the least recently used items first.
- TTL: Data is kept or removed based on a time limit.

Don't worry if these patterns look similar and confusing. We will go through each pattern in detail in the following lessons.

How to Use Caching in System Design Interviews

In an interview, you are almost always asked to design a system that is scalable and highly available. Caching is a common technique to reduce latency and improve scalability. It's almost a no-brainer to mention caching in your system design. However, be sure to be ready to discuss deep dive topics such as:

How do you handle the inconsistency between cache and the data source?
What to do when the cache becomes full?
How do you handle cache failure (and data is lost)?

We will go through the patterns in great detail in the following lessons so you will have a good understanding of how to answer these questions.