Why Caching?
Computer system latency numbers
To understand why we’d need caching in the first place, let’s review the latency numbers for common storage devices:
execute typical instruction | 1/1,000,000,000 sec = 1 nanosec |
---|---|
fetch from L1 cache memory | 0.5 nanosec |
branch misprediction | 5 nanosec |
fetch from L2 cache memory | 7 nanosec |
Mutex lock/unlock | 25 nanosec |
fetch from main memory | 100 nanosec |
send 2K bytes over 1Gbps network | 20,000 nanosec |
read 1MB sequentially from memory | 250,000 nanosec |
fetch from new disk location (seek) | 8,000,000 nanosec |
read 1MB sequentially from disk | 20,000,000 nanosec |
send packet US to Europe and back | 150 milliseconds = 150,000,000 nanosec |
(source: http://norvig.com/21-days.html#answers)
The gist of the story is:
- CPU cache < memory < SSD < disk
- Reading from memory is more than 50x faster than reading from disk, which is where traditional databases store data.
This really makes a good case for using memory as a data store to speed up our applications.
Why Performance (Latency) Matters
Speed is very important for modern websites. If your website or app is slow, people will leave. Google's research has found that as page load time goes from 1 to 3 seconds, the probability of a bounce increases by 32%. As load time goes from 1 to 5 seconds, the bounce rate increases by 90%. This highlights the significance of having a fast-loading site to prevent users from leaving before interacting with the content. According to a study by Akamai, a 100-millisecond delay in website load time can hurt conversion rates by 7%. Additionally, a 2-second delay in load time can lead to an abandonment rate of up to 87%.
Reading from memory is 50x faster than reading from SSD. Reading from memory gives us sub-millisecond response time vs milliseconds or even seconds from data sources like databases. Here’s the comparison table of disk vs memory
Feature | Disk Storage | In-Memory Storage |
---|---|---|
Speed | Slower. Disk I/O operations can take significantly more time. | Faster. Accessing data in memory is orders of magnitude faster than disk. |
Cost | Less expensive. Traditional hard drives and even SSDs are relatively cheap. | More expensive. The cost per GB is much higher for RAM. |
Data persistence | Persistent. Data stored on a disk remains there even if the power is turned off. | Non-persistent. Data in memory is typically lost when the power is turned off, unless the system is designed with persistence in mind (like some databases with snapshotting or journaling mechanisms). |
Capacity | High capacity. Disks can store terabytes of data. | Lower capacity. The amount of data that can be stored is limited by the size of the RAM. |
Usage | Best for long-term storage and large databases where speed is less critical. | Best for temporary storage, caching, and real-time computations where speed is critical. |
Energy consumption | Lower. Disks require less energy to maintain data. | Higher. Maintaining data in memory requires continuous power. |
Data volatility | Non-volatile. Data remains intact after a power cycle. | Volatile. Data is lost upon power loss or reboot (unless it's made persistent through specific technologies). |
Access method | Block access. | Random access. |
Throughput | Lower. Disk drives, especially mechanical hard drives, can handle fewer I/O operations per second compared to memory. | Higher. Memory can handle a significantly higher number of I/O operations per second. |
Data model | Suitable for structured data models due to slower access times, but can accommodate a variety. | Suitable for both structured and unstructured data models due to fast access times. Great for data that requires frequent read/write operations. |
Performance bottleneck | Disk I/O operations can be a significant bottleneck, especially for workloads that require frequent read/write operations. | Memory size can be a bottleneck. While read/write operations are fast, the amount of data that can be stored and accessed is limited by the size of available memory. |
What is caching?
Caching is a technique used to improve query read performance of a system by storing frequently accessed data in a faster memory storage area, such as RAM. When a program requests data that is stored in the cache, it can be accessed much faster than if it had to be retrieved from slower storage, such as a hard drive. This can significantly improve the overall performance of the system and reduce the time it takes for programs to access data.
Advantages of Caching
- As mention earlier, caching improves query response time. Reading from memory is much faster than reading from disk so users will be able to get their response faster.
- Additionally, caching relieves pressure from services like databases. Reducing the number of read requests from the web server to the database also relieves pressure from the database which means it will be able to support more web servers.
- Finally, by utilizing faster cache memory, web servers can retrieve information more quickly and efficiently, allowing them to handle a greater volume of requests.
Take note that caching is a broad concept, as it can be applied at multiple levels within a computer system, such as the CPU, disk, and network levels. For instance, CPU caches consist of tiny memory units adjacent to the processing core, which accelerates CPU performance. However, when discussing web services and system design, we focus on a higher level. Typically, caching refers to utilizing memory to store information temporarily, thus preventing the need to query databases or access slower media and file systems.
Also note that we used database as data source behind the cache as it is the most common use case. In realty, the data source could be anything. For example, data stored as files in the file system, object storage, or external API responses.
Here’s a more concrete example of caching: consider an e-commerce store like this, you have product recommendations, latest products, main product listings among other things. Each of the section calls some service on the backend which access some sort of database like storage. So each page refresh requires accessing multiple databases. All the systems are given pressure each time someone access the webpage. If the information hasn’t changed, we shouldn’t really need to go to the databases each refresh. What we should really do is to cache the results and only update the cache when it changes.
Common caching technologies
Redis and Memcached are both popular in-memory data storage systems, often used for caching and improving the performance of web applications. Memached is the OG in caching technology. Facebook started using Memcached back in 2010s. The other more recent and popular technology is Redis.
Redis vs Memcached
The main difference is Memcached deals with binary blobs whereas redis has rich data structures. There are some more subtle differences:
Feature | Redis | Memcached |
---|---|---|
Data Structures | Strings, lists, sets, sorted sets, hashes, bitmaps | Key-value pairs (strings or binary data) |
Persistence | Optional (can save data to disk and recover after restart) | No (in-memory only) |
Atomic Operations | Yes (e.g., increments, list manipulations) | No |
Pub/Sub | Yes (built-in publish/subscribe messaging system) | No |
High Availability | Yes (with Redis Sentinel and Redis Cluster) | No (third-party solutions available) |
Complexity | More features and data structures, more complex | Simple and straightforward |
Cache Eviction Policy | Configurable (e.g., LRU, LFU, volatile, or all keys) | Least Recently Used (LRU) |
Use Cases | Advanced data structures, real-time applications, complex caching etc | Simple caching, session storage |
Companies Using | Twitter, GitHub, Stack Overflow | Facebook, YouTube, Reddit |
Don’t worry if you haven’t learned about concepts like cache eviction policy and high availability. We’ll discuss these later in this article.
Why Caching can be complicated
There are only two hard things in Computer Science: cache invalidation and naming things.
- Phil Karlton
Caching by itself sounds like a very simple concept. However, correctly setting up caching can be surprisingly tricky. As Phil Karlton pointed out, cache consistency is tricky. When there are multiple users or systems accessing the same data, it's essential to keep the cached data consistent. This means making sure everyone sees the same information, even when updates are happening. Without cache consistency, users might see outdated data, leading to errors or poor user experiences. When data is updated, the system marks the relevant cache entries as invalid or outdated.
Let’s look at these in details.
Caching Policies
- Cache Writing Policies: These policies determine when the application should write data to the cache.
- Write-Through: The cache and the database are updated at the same time. This will ensure the cache and the database have the same data all the time but will slow down the write operation since it has to wait for both writes to succeed.
- Write-Back/Write-Behind: Data is first written to cache and the write to the database is done later. The disadvantage is the potential inconsistency between cache and database since writing to database happens asynchronously.
- Write-Around: Data is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write I/O that will not subsequently be re-read, but has the disadvantage that a read request for recently written data will create a “cache miss” and must be read from slower back-end storage and experience higher latency.
- Cache Reading Policies: These policies decide how to read data.
- Cache-Aside/Lazy Loading: Data is loaded into the cache on demand, i.e., only when a cache miss occurs.
- Read-Through: The cache takes responsibility for reading data from the database if it doesn't have the data.
- Cache Eviction Policies: These policies decide which item should be removed from the cache when the cache is full and a new item needs to be added. LRU and TTL fall into this category.
- LRU: Discards the least recently used items first.
- TTL: Data is kept or removed based on a time limit.
Data Loading Patterns
In practice, the two primary caching patterns that are most often used are:
- lazy loading, aka Cache-aside Caching, a reactive approach
- preloading, aka Write-through Caching, a proactive approach
Let's discuss each one in detail with examples in the following lessons.