How does CodePen use Redis? Critical and simple use cases, backups, and config tweaks

Written by Christophe Limpalair on 11/13/2015

Redis is a widely popular data store used for a variety of reasons. In this article, we're going to see how CodePen uses Redis in their stack to accomplish anything from small to critical tasks.

Please keep in mind these stats are from June 2015, so certain facts might have changed since then.

How is CodePen running Redis?

Before we dive in, let's take a look at how they've integrated Redis.

If you read my previous post on CodePen's security and technical challenges, you already know that they are running on AWS.

If you're not familiar with AWS, they have a number of different options when it comes to instances. These instances have a variety of different CPU, memory, storage, and networking capacity combinations.

Of course, the instances also have different price points. All of this gives more flexibility when it comes to the price:performance ratio.

CodePen uses 2 m3.large instances to host their Redis primary-replica servers. At the time of writing, m3.large has:

  • 2 vCPU
  • 7.5 GiB Memory
  • 32 GB SSD Storage

This setup handles 1,000 requests per second and has processed 11 billion calls in the last 150 days (stats from late June 2015).

If you're familiar with AWS, then you might also be wondering why they don't use Elasticache instead of m3.large instances.

At the time of the interview with Tim Sabat (their server guy), Elasticache didn't allow for more than one database, and they use 4 different databases (one for volatile cache, permanent cache (for counts), queuing, and another one). This would be considerably more expensive.

Why use Redis primary-replica, and how do you set it up?

The process is simple. You take a slave and point it to the master.

While some applications use this setup to farm out reads to the replica and only keep the primary as write-only, that's not what CodePen does. Instead, they use this setup primarily for disaster recovery.

Simple use cases

So what do they really use Redis for? Let's talk about real examples.

Every time someone saves a Pen, they need to update a snapshot image of the final result. Instead of updating it every single time, or as soon as someone hits save, they queue it up and process it in 3 minute intervals. Even if you save 20 times during that 3 minute period, only the final result gets saved.

How? By using Sets. Sets have a function called SADD. It doesn't allow duplicate members, so if a member is already set, the action will be ignored. This is a practical way to avoid duplicating screenshots.

They also use it for scheduling jobs that run in the background. This includes cron jobs, and jobs like queuing Pens that appear to be spam.

Another use case is for popularity votes. This is very easy with Sorted Sets.

Critical use cases

CodePen had an issue with Solr scaling and running out of memory. Solr is used to fulfill user searches. While it's not great to take search down, it's much better than taking down the entire application.

Since Solr was tightly coupled to the application, if a query ran slow, so did the application. If Solr went down, so would CodePen. Obviously not what they want, which is when Redis came to the rescue.

Before sending a request, Redis checks Solr's status. If it's down, the app redirects to a "temporarily unavailable" page and doesn't check again for 30 seconds.

How do you do that? Redis lets you expire keys. So if Solr is down, set a key and expire it in 30 seconds. While the key exists, don't check for Solr's status. When the key doesn't exist, check it.

They've talked about using Redis as an LRU cache (least recently used) to avoid having to set manual expiration times. This setting makes Redis evict keys that have been least recently used when it needs more space. There are also a few other eviction options that might fit your needs better.

If you're curious as to how they check for Solr's status, they use a Resque call around the HTTP call.

For more general Redis use case examples, see this article.

What Redis configurations have they tweaked?

On the primary, they don't write any backups to disk. This is because keys change so often that background saves would run almost constantly.

Instead, the backups come from the replica. This replica is backed by a high IOPS disk which runs on an EBS-optimized machine, so they can quickly write the RDB files.

How do they handle Redis backups?

Redis saves RDB files to disk. These files can be pushed to S3 for permanent storage.

There are a few ways to do this, but here's one:

  1. Have a cron job that runs a shell script
  2. Shell script compresses RDB file and uploads to S3

Actions to take

Even though use cases can often times be very specific to applications, seeing how Redis is used at other companies can help you improve your apps and wrap your head around how to best use it.

But, let's try to take it a step further. How can you take action from information in this post? Here are a few ways:

1) Think of ways that Redis could be used to soften the load on your disk-based database.

Since the entire data set fits in memory, manipulating data in Redis is much faster than pulling it from disk. However, even with backups, you might still lose data. While having super fast data manipulations is nice, be aware of the risk as you make your list.

2) Go over configurations to understand how certain things work.

Understanding backups, for example, is pretty important. Do you know how often your data set gets written to disk? Does it even get backed up? If you can't answer this, it may be a good idea to check it out.

Another important setting to understand is maxmemory. 64 bit systems have a default of not setting a maxmemory policy. Refer to the previous link to see what to do about that.

3) Take a look at evictions

If Redis runs out of memory, a number of things can happen. This all depends on how you configure settings, which I just talked about in point #2.

Setting an eviction policy can not only tell Redis what to do in that situation, but it can also be a method of clearing out unused keys without having to manually expire them. If you have data that doesn't need to be permanently saved, it's a good idea to set expiration times to clear up space in your RAM. Instead of doing that, though, you could let Redis handle all of it and never have to worry about that.

There's an entire page dedicated to using Redis as an LRU cache. Even if you don't plan on using it now, it's nice to know that the option exists.

How do you use Redis? Do you have any similar use cases?