As useful and ubiquitous as relational database management systems like PostgreSQL are, they have had to make trade-offs in order to be useful in solving particular sets of problems. This means in today’s demanding applications, sometimes they become performance hogs and scalability roadblocks.
Redis is a data storage server that has made somewhat different trade-offs in order to set itself free from the traditional limitations of RDBMSes. For example, it has opted for storing data in the much faster but volatile memory instead of the disk, it has looser data persistence guarantees, and a much simpler data model than the relational model. On top of it all, it is simple and has very easy onboarding for the user.
This has resulted in Redis being much faster and very easily scalable. If a developer decides to accept these trade-offs to get the gains, it first needs to understand how to port an existing data-centric application into this tool. Although Redis is generally categorized as an in-memory key-value store, it has gone out of its way to prove itself as more than that. It offers two different methods for persistence which makes the foundation for creating a permanent data store. It also offers many more data types than just a simple key-value mapping. These include efficient implementations of lists, sets, ordered sets, hashes, bitmaps, streams, and geospatial data types. On top of these data types, many operations have been implemented by Redis itself which are readily available to be used by the programmer.
Mapping The Data Model
For storing records, I choose a key which is the combination of most
common WHERE
clause criteria in an SQL SELECT
query for accessing
the record. This means using the primary key for the record as Redis
key and also keeping pointers to this key from all other candidate keys
that might be used to get to the record.
Having said that, try to keep the key fairly short,
for example, if it is substantially bigger than the size of the output
of a well-known hash function, consider using the hash of the key instead.
If the record needs to be ephemeral, I use the SET
command because only
top-level keys can have expiry times. In this case, the value should be
the serialized version of the object being stored. Otherwise, use HSET
with keys and values mirroring object properties.
The key can start with “entity_name:” which helps distinguish different
entities in a namespace and avoid conflicts. If this scheme is followed,
KEYS
or better yet SCAN
command can be used to easily inspect
all keys and data types related to an entity.
RENAME
command can be used to change the key if we need to.
Depending on the nature of the data, a set or list might be what we
actually need, which are directly supported by Redis.
For sorting and unique constraints, min/max values, I used sorted sets,
which given a sort criterion, point to an existing key in the keyspace
related to the record. Redis can sort keys based on a numerical score
or lexicographically based on the key itself.
Although Redis has the SORT
command for one-off sorting of a list
or set, the increase in time complexity could rationalize keeping
a separate sorted set.
Check the Redis docs page on secondary indexes for a more complete explanation of indexing methods.
Transactions and ACID
Redis offers MULTI
/EXEC
/DISCARD
combo commands that buffer a set of
commands to be executed atomically. This feature also known as “pipeline”
makes a huge difference in the speed at which the commands are executed
because the network latency is broken down to the number of commands in the
pipeline. One caveat is that Redis does not have the notion of “rollback”.
So although pipelines execute atomically, they cannot be rolled back if
one command in the sequence fails. The commands will keep executing
and the program will only be informed via the returned list of
results. In practice, this might not be an issue because Redis commands
usually cannot fail, with the notable exception of running a command on
a key with an incompatible type which is a bug in the program that needs
fixing anyway.
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of RDBMSes that as programmers we have grown to love and take for granted as the default behavior of such systems. But the truth is these set of guarantees come at a performance and scalability cost which Redis has decided to do away with. That being said, most of these properties can be achieved by making a conscious effort when working with Redis.
Among ACID properties, Redis does not provide atomicity, howsoever.
You cannot have a set of commands that execute in an all-or-none fashion.
It is not possible to change your mind about the modifications made by
a command after it was sent to the server.
EXEC
and DISCARD
commands in pipelines might look like such
functionality, but they are not analogous to RDBMS COMMIT
and ROLLBACK
commands. The good news is Redis commands usually don’t fail, so if the set
of commands have been designed correctly in the first place, no rollback
is needed.
Redis can guarantee consistency using pipelines because it buffers the commands and runs them sequentially in its single thread of execution, thus ensuring that in the meantime, other clients not only cannot change the keys affecting this pipeline, but also they cannot modify any other part of the entire Redis key-space!
Redis can also provide isolation using WATCH
guards and
optimistic locking.
In Redis, durability at the granularity level of single commands, can only be enabled globally for the entire server using the AOF (Append-Only File) feature with a sync operation after each command:
appendonly yes
appendfsync always
In reality, this configuration is so taxing on the performance that if you actually need it, you might be better off with an RDBMS. The closest compromise is to sync the latest changes in key-space every second.
appendfsync everysec
Or every X number of commands executed in a Y second timeframe:
save 900 1
For transactions and data locking consistency, Redis allows setting WATCH
guards on any number of target keys and be notified if any of the keys
has been changed by other clients before a pipeline starts executing,
thus allowing the pipeline to be retried as many times as needed to avoid
race conditions and get consistent data.
Having said that, simple increment or decrement of integers are
atomically supported by INCR
/DECR
, INCRBY
/DECRBY
,
INCRBYFLOAT
/DECRBYFLOAT
and similar commands.
Other RDBMS Functionality
For advisory locking, Redis offers the SETNX
command which can be used
to check for acquired locks by other processes. This pattern can be used
to create a distributed locking mechanism among processes.
See Distributed locks to find out how
this functionality can be implemented.
For complex functions in queries or for ETL-style data access or generally
the logic that should be run on the server side, Redis has added support for
Lua scripting since version 2.6.0 via the EVAL
command.
Lua perfectly fits the bill here and can help reduce the number of roundtrips
to the Redis server for simple enough computations and thus result in
considerable performance improvements depending on the workload.
This feature is akin to PL/Lua support in Postgres.
Triggers are what RDBMSes invented to enable registering code to be run when specific changes are made into the data. Redis equivalent is called notifications and they are based on the built-in publish/subscribe feature. Although at the moment these notifications are sent in a fire and forget manner to subscribed clients and are not as reliable as database triggers.
Extra Features
Redis also offers features not available in traditional RDBMSes, at least not in their default setting.
First of all, Redis allows setting an expiry time on all top-level keys in its key space with millisecond accuracy, after which the key will be automatically deleted. This feature allows it to be used as an independent ephemeral data store or an intermediary cache layer for accessing data stored in another data store in a very fast way. The expiry time on a key can be set, queried, and lifted. There are also LRU and LFU policies for deciding when to purge a key.
bitmaps, hyperloglogs, streams, and geospatial data types could offer alternative ways to re-implement an existing application. See the introduction to Redis data types for the complete reference.
Blocking fetch commands like BLPOP
, BRPOP
, BRPOPLPUSH
, BZPOPMIN
and BZPOPMAX
are a delight to work with when we need an inter-process
blocking queue to pass messages across programs. The queue can be optionally
capped at a certain size, making for a
nice ring buffer.
Message passing in a publish/subscribe fashion is rather hard to do
using an RDBMS, but is readily supported in Redis by PUBLISH
/SUBSCRIBE
commands. This feature is somewhat different from what Redis
is known for because there is no data structure or persistence involved,
but it has been useful and simple enough to land in Redis as
a first-class feature.
That being said, this feature in Redis is not nearly as complete as
what a purpose-built message broker like
RabbitMQ provides.
Streams are an interesting way to process data that, RDBMSes do not support, while Redis has basic support for that, might get the job done. Streams are very similar to queues except that data items can be structured, are automatically tagged with the timestamp they were inserted and can be freely traversed forwards and backwards based on this tag. That means data items in streams are expected to be more timeless than that of queues and them being an append-only data structure, the size of the stream can grow infinitely. Streams are commonly used to collect data from different sources and filter them based on some criteria, possibly in a pipeline. Also, streams enable multiple consumers to process the same or different portions of the stream in parallel. Again while this feature is very useful, it is not comparable to what a dedicated stream processing solution like Kafka provides.
Redis also comes with the ability to load modules; pluggable pieces of compiled code that extend the internal functionality of the server. These are akin to how PostgreSQL extensions enable adding functionality to the core DBMS. For example, redissearch is a full-text-search engine built for Redis and distributed as a Redis module. RedisJSON, RedisTimeSeries and RedisGraph are other extremely useful functionalities built as modules on top of Redis. See redismodules.com for an official list of Redis modules.
Performance Optimization
Most Redis queries do not have a high time complexity to execute (the actual complexity of any command in big O notation is mentioned in the documentation), but as I have found, a significant portion of the latency of a command is the network round-trip it takes to be sent to the server and received back. To reduce this round-trip time, you can try connecting over Unix sockets instead of IP sockets which gave me a nearly two-fold increase in command throughput. To use Unix sockets, make sure these two lines are present in the Redis configuration file:
unixsocket /var/run/redis/redis-server.sock
unixsocketperm 760
Another very effective solution to slow performance is to see if the
“multiple” version of a command can be used. For example MGET
and MSET
instead of GET
and SET
, LPUSH
and RPUSH
instead of LSET
,
and HMGET
and HMSET
instead of HGET
and HSET
.
There are many more similar commands in the Redis
command reference.
When using these commands to send command parameters in bulk, be aware that
the maximum buffer that Redis allocates to an incoming command is 512MB so,
as a ballpark, do not send such commands with parameters in the order
of millions. For example, limit the number parameters of a “multiple”
command to a thousand parameters in the program.
Although Redis is famously single-threaded which might be worrisome
for some, it comes with excellent built-in tools to profile, monitor,
and inspect the performance characteristics of a running instance.
For example look for these Redis commands: DBSIZE
, INFO
, SLOWLOG
,
LATENCY
, MEMORY DOCTOR
. Redis claims that CPU is not the bottleneck
in processing commands and it’s usually the memory or network bandwidth
that limits the throughput. Judging by the CPU usage of a very busy
Redis instance that I worked with, this looks to be true.
The redis-benchmark
command and
memtier project
can measure the throughput of a Redis instance.
The redis-cli --latency
can measure the latency of command execution
which should be very low and more importantly very predictable.
These commands can help find keys that are big themselves or store
a large object as their value:
redis-cli --bigkeys
redis-cli --memkeys
Without these excellent tools to find performance bottlenecks, it’s hard to explain why the application is not any faster when you’ve spent weeks implementing the “fast cache” feature. Don’t ask me how I know.
Setting maxclients
in the Redis configuration file can also help increase
performance. Redis, being a single-threaded service, only can leverage
one CPU core and might slow down if it has to process many client
connections at the same time.
A connection pooling proxy service like
nutcracker can help in this
scenario. Also never use the same Redis instance for separate applications,
as they will unnecessarily compete for Redis connections, even if you use
the SELECT
command to create independent databases. You can very easily
spin up Redis instances using a tool like Docker and enjoy the speedup
and isolation resulting from the service running on a separate CPU core.
So if you find yourself using the
SELECT
command
and having performance issues, you are doing it wrong.
Set maxmemory
and maxmemory-policy
in Redis configuration. This puts
an upper limit on maximum memory usage and can help avoid surprises from
Redis becoming too slow because of swapping to disk or outright
crashing due to “OOM killer” kicking in. If you are using snapshotting
feature for persistence (the default), you should not set maxmemory
to more than 45% of the available system memory, because this feature
creates a fork of the Redis process, which in the worst-case effectively
doubles the memory usage. AOF persistence on the other hand
does not need the extra memory and up to 95% of the available
memory can be allocated to Redis. If more memory is needed,
data partitioning in Redis
can be easily set up to distribute the key space among several nodes.
Finally, to top it all, Redis has some actual artistic side in its
genes which is hard to find in RDBMSes! try the LOLWUT
command to
see what I mean.