Fork me on GitHub

Redis as a Data Store

As useful and ubiquitous as relational database management systems like PostgreSQL are, they have had to make trade-offs in order to be useful in solving particular sets of problems. This means in today’s demanding applications, sometimes they become performance hogs and scalability roadblocks.

Redis is a data storage server that has made somewhat different trade-offs in order to set itself free from the traditional limitations of RDBMSes. For example, it has opted for storing data in the much faster but volatile memory instead of the disk, it has looser data persistence guarantees, and a much simpler data model than the relational model. On top of it all, it is simple and has very easy onboarding for the user.

This has resulted in Redis being much faster and very easily scalable. If a developer decides to accept these trade-offs to get the gains, it first needs to understand how to port an existing data-centric application into this tool. Although Redis is generally categorized as an in-memory key-value store, it has gone out of its way to prove itself as more than that. It offers two different methods for persistence which makes the foundation for creating a permanent data store. It also offers many more data types than just a simple key-value mapping. These include efficient implementations of lists, sets, ordered sets, hashes, bitmaps, streams, and geospatial data types. On top of these data types, many operations have been implemented by Redis itself which are readily available to be used by the programmer.

Mapping The Data Model

For storing records, I choose a key which is the combination of most common WHERE clause criteria in an SQL SELECT query for accessing the record. This means using the primary key for the record as Redis key and also keeping pointers to this key from all other candidate keys that might be used to get to the record. Having said that, try to keep the key fairly short, for example, if it is substantially bigger than the size of the output of a well-known hash function, consider using the hash of the key instead. If the record needs to be ephemeral, I use the SET command because only top-level keys can have expiry times. In this case, the value should be the serialized version of the object being stored. Otherwise, use HSET with keys and values mirroring object properties. The key can start with “entity_name:” which helps distinguish different entities in a namespace and avoid conflicts. If this scheme is followed, KEYS or better yet SCAN command can be used to easily inspect all keys and data types related to an entity. RENAME command can be used to change the key if we need to. Depending on the nature of the data, a set or list might be what we actually need, which are directly supported by Redis.

For sorting and unique constraints, min/max values, I used sorted sets, which given a sort criterion, point to an existing key in the keyspace related to the record. Redis can sort keys based on a numerical score or lexicographically based on the key itself. Although Redis has the SORT command for one-off sorting of a list or set, the increase in time complexity could rationalize keeping a separate sorted set.

Check the Redis docs page on secondary indexes for a more complete explanation of indexing methods.

Transactions and ACID

Redis offers MULTI/EXEC/DISCARD combo commands that buffer a set of commands to be executed atomically. This feature also known as “pipeline” makes a huge difference in the speed at which the commands are executed because the network latency is broken down to the number of commands in the pipeline. One caveat is that Redis does not have the notion of “rollback”. So although pipelines execute atomically, they cannot be rolled back if one command in the sequence fails. The commands will keep executing and the program will only be informed via the returned list of results. In practice, this might not be an issue because Redis commands usually cannot fail, with the notable exception of running a command on a key with an incompatible type which is a bug in the program that needs fixing anyway.

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of RDBMSes that as programmers we have grown to love and take for granted as the default behavior of such systems. But the truth is these set of guarantees come at a performance and scalability cost which Redis has decided to do away with. That being said, most of these properties can be achieved by making a conscious effort when working with Redis.

Among ACID properties, Redis does not provide atomicity, howsoever. You cannot have a set of commands that execute in an all-or-none fashion. It is not possible to change your mind about the modifications made by a command after it was sent to the server. EXEC and DISCARD commands in pipelines might look like such functionality, but they are not analogous to RDBMS COMMIT and ROLLBACK commands. The good news is Redis commands usually don’t fail, so if the set of commands have been designed correctly in the first place, no rollback is needed.

Redis can guarantee consistency using pipelines because it buffers the commands and runs them sequentially in its single thread of execution, thus ensuring that in the meantime, other clients not only cannot change the keys affecting this pipeline, but also they cannot modify any other part of the entire Redis key-space!

Redis can also provide isolation using WATCH guards and optimistic locking.

In Redis, durability at the granularity level of single commands, can only be enabled globally for the entire server using the AOF (Append-Only File) feature with a sync operation after each command:

appendonly yes
appendfsync always

In reality, this configuration is so taxing on the performance that if you actually need it, you might be better off with an RDBMS. The closest compromise is to sync the latest changes in key-space every second.

appendfsync everysec

Or every X number of commands executed in a Y second timeframe:

save 900 1

For transactions and data locking consistency, Redis allows setting WATCH guards on any number of target keys and be notified if any of the keys has been changed by other clients before a pipeline starts executing, thus allowing the pipeline to be retried as many times as needed to avoid race conditions and get consistent data. Having said that, simple increment or decrement of integers are atomically supported by INCR/DECR, INCRBY/DECRBY, INCRBYFLOAT/DECRBYFLOAT and similar commands.

Other RDBMS Functionality

For advisory locking, Redis offers the SETNX command which can be used to check for acquired locks by other processes. This pattern can be used to create a distributed locking mechanism among processes. See Distributed locks to find out how this functionality can be implemented.

For complex functions in queries or for ETL-style data access or generally the logic that should be run on the server side, Redis has added support for Lua scripting since version 2.6.0 via the EVAL command. Lua perfectly fits the bill here and can help reduce the number of roundtrips to the Redis server for simple enough computations and thus result in considerable performance improvements depending on the workload. This feature is akin to PL/Lua support in Postgres.

Triggers are what RDBMSes invented to enable registering code to be run when specific changes are made into the data. Redis equivalent is called notifications and they are based on the built-in publish/subscribe feature. Although at the moment these notifications are sent in a fire and forget manner to subscribed clients and are not as reliable as database triggers.

Extra Features

Redis also offers features not available in traditional RDBMSes, at least not in their default setting.

First of all, Redis allows setting an expiry time on all top-level keys in its key space with millisecond accuracy, after which the key will be automatically deleted. This feature allows it to be used as an independent ephemeral data store or an intermediary cache layer for accessing data stored in another data store in a very fast way. The expiry time on a key can be set, queried, and lifted. There are also LRU and LFU policies for deciding when to purge a key.

bitmaps, hyperloglogs, streams, and geospatial data types could offer alternative ways to re-implement an existing application. See the introduction to Redis data types for the complete reference.

Blocking fetch commands like BLPOP, BRPOP, BRPOPLPUSH, BZPOPMIN and BZPOPMAX are a delight to work with when we need an inter-process blocking queue to pass messages across programs. The queue can be optionally capped at a certain size, making for a nice ring buffer.

Message passing in a publish/subscribe fashion is rather hard to do using an RDBMS, but is readily supported in Redis by PUBLISH/SUBSCRIBE commands. This feature is somewhat different from what Redis is known for because there is no data structure or persistence involved, but it has been useful and simple enough to land in Redis as a first-class feature. That being said, this feature in Redis is not nearly as complete as what a purpose-built message broker like RabbitMQ provides.

Streams are an interesting way to process data that, RDBMSes do not support, while Redis has basic support for that, might get the job done. Streams are very similar to queues except that data items can be structured, are automatically tagged with the timestamp they were inserted and can be freely traversed forwards and backwards based on this tag. That means data items in streams are expected to be more timeless than that of queues and them being an append-only data structure, the size of the stream can grow infinitely. Streams are commonly used to collect data from different sources and filter them based on some criteria, possibly in a pipeline. Also, streams enable multiple consumers to process the same or different portions of the stream in parallel. Again while this feature is very useful, it is not comparable to what a dedicated stream processing solution like Kafka provides.

Redis also comes with the ability to load modules; pluggable pieces of compiled code that extend the internal functionality of the server. These are akin to how PostgreSQL extensions enable adding functionality to the core DBMS. For example, redissearch is a full-text-search engine built for Redis and distributed as a Redis module. RedisJSON, RedisTimeSeries and RedisGraph are other extremely useful functionalities built as modules on top of Redis. See redismodules.com for an official list of Redis modules.

Performance Optimization

Most Redis queries do not have a high time complexity to execute (the actual complexity of any command in big O notation is mentioned in the documentation), but as I have found, a significant portion of the latency of a command is the network round-trip it takes to be sent to the server and received back. To reduce this round-trip time, you can try connecting over Unix sockets instead of IP sockets which gave me a nearly two-fold increase in command throughput. To use Unix sockets, make sure these two lines are present in the Redis configuration file:

unixsocket /var/run/redis/redis-server.sock
unixsocketperm 760

Another very effective solution to slow performance is to see if the “multiple” version of a command can be used. For example MGET and MSET instead of GET and SET, LPUSH and RPUSH instead of LSET, and HMGET and HMSET instead of HGET and HSET. There are many more similar commands in the Redis command reference. When using these commands to send command parameters in bulk, be aware that the maximum buffer that Redis allocates to an incoming command is 512MB so, as a ballpark, do not send such commands with parameters in the order of millions. For example, limit the number parameters of a “multiple” command to a thousand parameters in the program.

Although Redis is famously single-threaded which might be worrisome for some, it comes with excellent built-in tools to profile, monitor, and inspect the performance characteristics of a running instance. For example look for these Redis commands: DBSIZE, INFO, SLOWLOG, LATENCY, MEMORY DOCTOR. Redis claims that CPU is not the bottleneck in processing commands and it’s usually the memory or network bandwidth that limits the throughput. Judging by the CPU usage of a very busy Redis instance that I worked with, this looks to be true.

The redis-benchmark command and memtier project can measure the throughput of a Redis instance. The redis-cli --latency can measure the latency of command execution which should be very low and more importantly very predictable. These commands can help find keys that are big themselves or store a large object as their value:

redis-cli --bigkeys
redis-cli --memkeys

Without these excellent tools to find performance bottlenecks, it’s hard to explain why the application is not any faster when you’ve spent weeks implementing the “fast cache” feature. Don’t ask me how I know.

Setting maxclients in the Redis configuration file can also help increase performance. Redis, being a single-threaded service, only can leverage one CPU core and might slow down if it has to process many client connections at the same time. A connection pooling proxy service like nutcracker can help in this scenario. Also never use the same Redis instance for separate applications, as they will unnecessarily compete for Redis connections, even if you use the SELECT command to create independent databases. You can very easily spin up Redis instances using a tool like Docker and enjoy the speedup and isolation resulting from the service running on a separate CPU core. So if you find yourself using the SELECT command and having performance issues, you are doing it wrong.

Set maxmemory and maxmemory-policy in Redis configuration. This puts an upper limit on maximum memory usage and can help avoid surprises from Redis becoming too slow because of swapping to disk or outright crashing due to “OOM killer” kicking in. If you are using snapshotting feature for persistence (the default), you should not set maxmemory to more than 45% of the available system memory, because this feature creates a fork of the Redis process, which in the worst-case effectively doubles the memory usage. AOF persistence on the other hand does not need the extra memory and up to 95% of the available memory can be allocated to Redis. If more memory is needed, data partitioning in Redis can be easily set up to distribute the key space among several nodes.

Finally, to top it all, Redis has some actual artistic side in its genes which is hard to find in RDBMSes! try the LOLWUT command to see what I mean.

Social