4 Powerful Redis Data Types

Written by Christophe Limpalair on 09/03/2015

This post is extracted from one of my episodes in the Redis and Laravel series.

One of the best ways I can learn new things is by seeing implementations with explanations. This is exactly what this post will do.

Coming from a SQL background to something like Redis can be a very confusing experience. I know because I've been there and I've done that. This is precisely why I've been creating and promoting a lot of Redis content recently. It makes certain tasks so much easier and accomplishable in a shorter amount of time. But what are those tasks and how do you get started? Let me show you.

Lists
Lists in Redis are implemented with Linked Lists which means that we can maintain the order of strings depending on their insertion order. You may have worked with Linked Lists in other languages like Python, and if so, this should look quite familiar to you.

Redis List illustration example

With commands like
LPUSH
and
RPUSH
, we can add items to the head or tail of a list. Want to insert it somewhere else in the list? No problem, just use
LINSERT
with a BEFORE or AFTER value and Redis will take it from there.

Here are some examples:
LPUSH list "last"
LPUSH list "first"
LRANGE list 0 -1
// returns:
1) "first"
2) "last"


LPOP list
LRANGE list 0 -1
// returns:
1) last


The good news is that adding strings to the head of our list is constant -
O(1)
, so it doesn't matter if you have 10 million elements. The bad news is that it's not as nice if you want to manipulate or find items in the middle of our list.

This is what makes Lists useful for queues. We rarely (if ever) need to access anything other than the head or tail. Have a new job coming in? LPUSH it to your list that stores jobs and have another list process that handles the jobs. This other process will "poll" the first list for jobs and process them as they come in.



Lists aren't limited to queues, though. We could use them for social networking sites like Facebook and Twitter where we want to list the latest posts by users (see this video for more on that). We can use a command like
LTRIM
to limit how many elements can be stored in our list. This will ensure we only keep the newest posts from our users, and that we don't end up with massive lists.



Do you use lists for any other use cases? I'd love to hear about them.



Sets
Opposite of Lists, Sets store information in an unordered fashion. Their strength comes from the different powerful operations we can use such as intersections, unions, and differences. For ScaleYourCode.com, Sets could be useful to answer questions like "What are all of our article tags?", "What are all of our interview tags?" and "What article and interview tags intersect?".

Another beautiful aspect of Sets is their speed. You can add, remove, and check for keys in constant time -
O(1)
. This means that no matter how many elements you have inside of a Set, it will take the same amount of time to perform one of those actions.

Keep in mind that Sets don't allow duplicate keys. If you try to add the same key multiple times, the actions will just be ignored. This saves us from having to write if/else conditions.

Intersections


As I mentioned a few paragraphs up, Sets can really help with tags (see this episode for more on tagging). Using a command like intersections, we can implement a powerful tagging system. Say for example that you run an ecommerce store that sells laptops. You have high-end laptops as well as low-end laptops. If a user clicks on a cheap low-end laptop that's also marketed for students and web surfing, you can run an intersection with those keys to only pull other cheap laptops that have a student discount and are optimized for web surfing. On the product page the user just got routed to, you can have a "Recommended Laptops" section.

Here's an example:
affordable = { laptop:1, laptop:23, laptop:77, laptop:78 }
web_surfing = { laptop:1, laptop:23, laptop:77 }
students_discount = { laptop:23, laptop:77 }

SINTERSTORE recommend affordable web_surfing students_discount
// returns recommend = { laptop:23, laptop:77 }


By the way, Intersections are
O(N*M)
where N is the number of members in the smallest set and M is the number of sets. More on SINTERSTORE.

Unions


Unions take all the specified sets and unifies them into one result. I'm going to take our previous example with Intersections but this time get a different result that could also benefit us in our ecommerce website. Even though our user selected a laptop that is affordable, has a student discount, and is optimized for web surfing, why just limit our recommendations to those? Maybe that user would also be interested in a laptop that can play video games or is optimized for video editing.

Let me show you what I mean:
affordable = { laptop:1, laptop:23, laptop:77, laptop:78 }
web_surfing = { laptop:1, laptop:23, laptop:77 }
students_discount = { laptop:23, laptop:77, laptop:88 }

SUNIONSTORE recommend affordable web_surfing students_discount
// returns { laptop:1, laptop:23, laptop:77, laptop:78, laptop:88 }


With this list of laptop IDs, we know for a fact that they contain at least one of three tags: affordable, web_surfing, or students_discount. From here, we can display a laptop that may be "affordable" and "optimized for gaming" at the same time. Even if it doesn't have a student discount, your user may not mind spending a little bit more for the ability to play games. You can upsell that way while giving your user exactly what they want.

Differences


Differences do exactly what their name implies: they take different sets and return members from the first set that aren't in other sets.

Random


One more command I'd like to mention is
SRANDMEMBER
. This one can work just by passing in a key, and will return one random member from the set. If you add a count after the key, that number of unique random elements will be returned. Pass in a negative number, however, and it might return repeating members.

For example:
test = { 1, 5, 0, 63 }

SRANDMEMBER test
// returns 5

SRANDMEMBER test 2
// returns 1, 63




Sorted Sets
Sets are great, but what if we want our data to be in a specific order? Sorted Sets are what you are looking for. This is the only data structure besides lists that maintains order. This order is determined with scores. For example, post:1 can have a score of 1, post:2 a score of 2, etc... As you can imagine, this is very useful if we want to display the latest posts or posts that have the most views (refer to episode 3 to see how this is implemented).

Members in a sorted sets are unique but scores can be repeated. This is very useful in a situation where we want to keep score in a game, or to create a voting system like they have at StackOverflow for example. It very well could be that two members have the same score at the same time.

// user writes an answer to question. Initialize with 0 votes
ZADD post:1:votes 0 "5466"


In this example post:1:votes is the key, 0 is the score and 5466 is the answer's ID. This ID could then be a Hash key and we could load the answer's information from there (such as user ID of author, date posted, content, etc..). (Hashes explained in the last section)

When another user upvotes this answer:

ZINCRBY post:1:votes 1 "5466"
ZSCORE post:1:votes "5466"
// returns "1"


What if we have multiple answers and we want to display the top 3?

ZADD post:443:votes 4 "32" 11 "5587" 51 "1126" 1 "673"
ZREVRANGE post:443:votes 0 2
// returns:
1) "1126"
2) "5587"
3) "32"


The number 2 in
ZREVRANGE post:443:votes 0 2
is not a typo in this case. We are going by indexes starting from 0, which is why we get 3 results returned to us.


Another benefit over Lists is that we can check for members very quickly even if they are in the middle of our Sorted Set.

Do keep in mind that Sorted Sets use more memory than sets and most operations are more costly as far as time complexity goes. This is not a big deal unless you have a massive amount of things to store. If you do need more efficient storage, check out Hashes.



Hashes
Hashes are great when representing objects such as blog posts or users for example. That is not where their usefulness ends, however. Hashes encode smaller values in a special way that makes them very memory efficient. Instagram has a post about this and how they used hashes to save on memory storage space. Read more about it here.

Let me give you an example of how Hashes work :)
HMSET user:1 username christophe year 2015 password test
HGET user:1 username
// returns christophe

HGETALL user:1
// returns:
1) username
2) christophe
3) year
4) 2015
5) password
6) test


(I just italicized fields for distinction)

You can also retrieve multiple values using
HMGET


HMGET user:1 username password
// returns:
1) christophe
2) test


Conclusion
Even with use cases and a proper understanding of what differentiates various data types in Redis, implementing your own logic can be a bit tricky especially coming from a relational database background.

With this post, I hope you can continually refer back and compare what you are doing with what I've laid out. I've used this list whenever I hit a brick wall in my logic and it has been a lot of help. Please let me know if you come up with more use cases. I would love to include them.

Go out there and build something great!