All
Star

Caching

Why Caching?

Computer system latency numbers

To understand why weā€™d need caching in the first place, letā€™s review the latency numbers for common storage devices:

execute typical instruction1/1,000,000,000 sec = 1 nanosec
fetch from L1 cache memory0.5 nanosec
branch misprediction5 nanosec
fetch from L2 cache memory7 nanosec
Mutex lock/unlock25 nanosec
fetch from main memory100 nanosec
send 2K bytes over 1Gbps network20,000 nanosec
read 1MB sequentially from memory250,000 nanosec
fetch from new disk location (seek)8,000,000 nanosec
read 1MB sequentially from disk20,000,000 nanosec
send packet US to Europe and back150 milliseconds = 150,000,000 nanosec

(source: http://norvig.com/21-days.html#answers)

The gist of the story is

  • CPU cache < memory < SSD < disk
  • Reading from memory is > 50x faster than reading from disk which is where traditional databases store data

This really makes a good case for using memory as data store to speed up our applications.

Why Performance (Latency) Matters

Speed is a very important business for modern websites. If your website or app is slow, people will leave. Google's research has found that as page load time goes from 1 to 3 seconds, the probability of a bounce increases by 32%. As load time goes from 1 to 5 seconds, the bounce rate increases by 90%. This highlights the significance of having a fast-loading site to prevent users from leaving before interacting with the content. According to a study by Akamai, a 100-millisecond delay in website load time can hurt conversion rates by 7%. Additionally, a 2-second delay in load time can lead to an abandonment rate of up to 87%.

Reading from memory is 50x faster than reading from SSD. Reading from memory gives us sub-millisecond response time vs milliseconds or even seconds from data sources like databases. Hereā€™s the comparison table of disk vs memory

FeatureDisk StorageIn-Memory Storage
SpeedSlower. Disk I/O operations can take significantly more time.Faster. Accessing data in memory is orders of magnitude faster than disk.
CostLess expensive. Traditional hard drives and even SSDs are relatively cheap.More expensive. The cost per GB is much higher for RAM.
Data persistencePersistent. Data stored on a disk remains there even if the power is turned off.Non-persistent. Data in memory is typically lost when the power is turned off, unless the system is designed with persistence in mind (like some databases with snapshotting or journaling mechanisms).
CapacityHigh capacity. Disks can store terabytes of data.Lower capacity. The amount of data that can be stored is limited by the size of the RAM.
UsageBest for long-term storage and large databases where speed is less critical.Best for temporary storage, caching, and real-time computations where speed is critical.
Energy consumptionLower. Disks require less energy to maintain data.Higher. Maintaining data in memory requires continuous power.
Data volatilityNon-volatile. Data remains intact after a power cycle.Volatile. Data is lost upon power loss or reboot (unless it's made persistent through specific technologies).
Access methodBlock access.Random access.
ThroughputLower. Disk drives, especially mechanical hard drives, can handle fewer I/O operations per second compared to memory.Higher. Memory can handle a significantly higher number of I/O operations per second.
Data modelSuitable for structured data models due to slower access times, but can accommodate a variety.Suitable for both structured and unstructured data models due to fast access times. Great for data that requires frequent read/write operations.
Performance bottleneckDisk I/O operations can be a significant bottleneck, especially for workloads that require frequent read/write operations.Memory size can be a bottleneck. While read/write operations are fast, the amount of data that can be stored and accessed is limited by the size of available memory.

What is caching?

Caching is a technique used to improve query read performance of a system by storing frequently accessed data in a faster memory storage area, such as RAM. When a program requests data that is stored in the cache, it can be accessed much faster than if it had to be retrieved from slower storage, such as a hard drive. This can significantly improve the overall performance of the system and reduce the time it takes for programs to access data.

what-is-caching

Advantages of Caching

  • As mention earlier, caching improves query response time. Reading from memory is much faster than reading from disk so users will be able to get their response faster.
  • Additionally, caching relieves pressure from services like databases. Reducing the number of read requests from the web server to the database also relieves pressure from the database which means it will be able to support more web servers.
  • Finally, by utilizing faster cache memory, web servers can retrieve information more quickly and efficiently, allowing them to handle a greater volume of requests.

Take note that caching is a broad concept, as it can be applied at multiple levels within a computer system, such as the CPU, disk, and network levels. For instance, CPU caches consist of tiny memory units adjacent to the processing core, which accelerates CPU performance. However, when discussing web services and system design, we focus on a higher level. Typically, caching refers to utilizing memory to store information temporarily, thus preventing the need to query databases or access slower media and file systems.

Also note that we used database as data source behind the cache as it is the most common use case. In realty, the data source could be anything. For example, data stored as files in the file system, object storage, or external API responses.

Hereā€™s a more concrete example of caching: consider an e-commerce store like this, you have product recommendations, latest products, main product listings among other things. Each of the section calls some service on the backend which access some sort of database like storage. So each page refresh requires accessing multiple databases. All the systems are given pressure each time someone access the webpage. If the information hasnā€™t changed, we shouldnā€™t really need to go to the databases each refresh. What we should really do is to cache the results and only update the cache when it changes.

Common caching technologies

Redis and Memcached are both popular in-memory data storage systems, often used for caching and improving the performance of web applications. Memached is the OG in caching technology. Facebook started using Memcached back in 2010s. The other more recent and popular technology is Redis.

Redis vs Memcached

The main difference is Memcached deals with binary blobs whereas redis has rich data structures. There are some more subtle differences:

FeatureRedisMemcached
Data StructuresStrings, lists, sets, sorted sets, hashes, bitmapsKey-value pairs (strings or binary data)
PersistenceOptional (can save data to disk and recover after restart)No (in-memory only)
Atomic OperationsYes (e.g., increments, list manipulations)No
Pub/SubYes (built-in publish/subscribe messaging system)No
High AvailabilityYes (with Redis Sentinel and Redis Cluster)No (third-party solutions available)
ComplexityMore features and data structures, more complexSimple and straightforward
Cache Eviction PolicyConfigurable (e.g., LRU, LFU, volatile, or all keys)Least Recently Used (LRU)
Use CasesAdvanced data structures, real-time applications, complex caching etcSimple caching, session storage
Companies UsingTwitter, GitHub, Stack OverflowFacebook, YouTube, Reddit

Donā€™t worry if you havenā€™t learned about concepts like cache eviction policy and high availability. Weā€™ll discuss these later in this article.

Why Caching can be complicated

There are only two hard things in Computer Science: cache invalidation and naming things.

- Phil Karlton

Caching by itself sounds like a very simple concept. However, correctly setting up caching can be surprisingly tricky. As Phil Karlton pointed out, cache consistency is tricky. When there are multiple users or systems accessing the same data, it's essential to keep the cached data consistent. This means making sure everyone sees the same information, even when updates are happening. Without cache consistency, users might see outdated data, leading to errors or poor user experiences. When data is updated, the system marks the relevant cache entries as invalid or outdated.

Letā€™s look at these in details.

Caching Policies

  1. Cache Writing Policies: These policies determine when the application should write data to the cache.
    • Write-Through: The cache and the database are updated at the same time. This will ensure the cache and the database have the same data all the time but will slow down the write operation since it has to wait for both writes to succeed.
    • Write-Back/Write-Behind: Data is first written to cache and the write to the database is done later. The disadvantage is the potential inconsistency between cache and database since writing to database happens asynchronously.
    • Write-Around: Data is written directly to permanent storage, bypassing the cache. This can reduce the cache being flooded with write I/O that will not subsequently be re-read, but has the disadvantage that a read request for recently written data will create a ā€œcache missā€ and must be read from slower back-end storage and experience higher latency.
  2. Cache Reading Policies: These policies decide how to read data.
    • Cache-Aside/Lazy Loading: Data is loaded into the cache on demand, i.e., only when a cache miss occurs.
    • Read-Through: The cache takes responsibility for reading data from the database if it doesn't have the data.
  3. Cache Eviction Policies: These policies decide which item should be removed from the cache when the cache is full and a new item needs to be added. LRU and TTL fall into this category.
    • LRU: Discards the least recently used items first.
    • TTL: Data is kept or removed based on a time limit.

Data Loading Patterns

In practice, the two primary caching patterns that are most often used are:

  • lazy loading, aka Cache-aside Caching, a reactive approach
  • preloading, aka Write-through Caching, a proactive approach

Let's discuss each one in detail with examples in the following lessons.

ā†
ā†‘TA šŸ‘Øā€šŸ«