from Hacker News

Do you really need Redis? How to get away with just PostgreSQL

by hyzyla on 6/12/21, 6:57 AM with 472 comments

by _ugfj on 6/12/21, 7:29 AM
You really don't need anything fancy to implement a queue using SQL. You need a table with a primary id and a "status" field. An "expired" field can be used instead of the "status". We used the latter because it allows easy retries.
1. SELECT item_id WHERE expire = 0. If this is empty, no items are available.
2. UPDATE SET expire = some_future_time WHERE item_id = $selected_item_id AND expire = 0. Then check whether UPDATE affected any rows. If it did, item_id is yours. If not, loop. If the database has a sane optimizer it'll note at most one document needs locking as the primary id is given.
All this needs is a very weak property: document level atomic UPDATE which can return whether it changed anything. (How weak? MongoDB could do that in 2009.)
Source code at https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/... (We cooked this up for Drupal in 2009 but I am reasonably sure we didn't invent anything new.)
Of course, this is not the fastest job queue there is but it is quite often good enough.
by taf2 on 6/12/21, 2:29 PM
Redis to me is the magic solution to solve so many complex multiple process syncing issues, a global lock, rate limiter, uniqueness with a set, stasher of all the temporary things… no migration, quick solution to complex problems- streams of events to organize pending tasks, etc… to me it’s all about the temporal / temporary state my application needs to keep track in order to work in a multiuser - multiprocess and multi service environment… just a different tool for a different job then what I would use a database for… but really it’s computer science, to each is their own
by nickjj on 6/12/21, 10:13 AM
I think one of the biggest advantages of using Redis for job queing vs Postgres comes down to library support.
For example Python has Celery and Ruby has Sidekiq. As far as I know there's no libraries in either language that has something as battle hardened with comparable features for background tasks using Postgres as a backend.
There's a big difference between getting something to work in a demo (achievable by skimming PG's docs and rolling your own job queue) vs using something that has tens of thousands of hours of dev time and tons of real world usage.
I'm all for using PG for things like full text search when I can because it drastically reduces operation complexity if you can avoid needing to run Elasticsearch, but Redis on the other hand is a swiss army knife of awesome. It's often used for caching or as a session back-end so you probably have it as part of your stack already. It's also really easy to run, uses almost no resources and is in the same tier as nginx in terms of how crazy efficient it is and how reliable it is. I don't see not using Redis for a job queue as that big of a win.
by pilif on 6/12/21, 8:43 AM
For pub/sub, I would recommend against using PostgreSQL if you're doing it at any kind of scale because LISTEN ties up one connection completely and Postgres connections are very expensive compared to a redis connection.
by petepete on 6/12/21, 7:08 AM
I've used PostgreSQL in the first two scenarios and would love an opportunity to in the third. It's worked really well and hasn't caused any problems under decent loads.
The one feature of Redis I'd love to have supported in PostgreSQL is to be able to set a TTL on a record. On a recent project where we could only save personal data for so long, it was a must have feature so we had to use Redis for that purpose instead.
by Aeolun on 6/12/21, 7:40 AM
This seems a bit silly. We might be able to use only postgres, but this seems harder than using Postgres and Redis.
Any time someone says ‘this works for small workloads’ I hear, ‘we’ll eventually have to migrate this’.
That’s not a problem if you are talking about introducing a hugely complex piece of infrastructure, but Redis is hardly that.
by truth_seeker on 6/12/21, 1:29 PM
While you are at it, don't forget to use UNLOGGED tables. UNLOGGED == In Memory.
But if you must use disk based table for Job queueing, set fillfactor = 50. This takes care of heavy updates.
Indexes are helpful but costs memory and CPU, so always make sure you partition the table based on job_type for performant pending job query.
I wouldn't recommend using LISTEN/NOTIFY unless you are okay with "at most once" semantics. I have used disk table based approach for PUB/SUB to replace Kafka. More fine tuned approach will also allow (job_type, consumer_group, publisher) as a partition key.
Ref - https://www.postgresql.org/docs/current/sql-createtable.html
by simonw on 6/12/21, 2:35 PM
The best reason I know of to use a relational database as a queue is that it lets you trigger queue operations from within a transaction - so a queue item is guaranteed to be created if the transaction succeeds, and guaranteed not to be created if the transaction fails.
Brandur wrote a great article about that here: https://brandur.org/postgres-queues
by Rygu on 6/12/21, 10:08 AM
Don't forget to use Partial Indexes on jobs/tasks tables with queries like WHERE status = 'pending'
More: https://use-the-index-luke.com/sql/where-clause/partial-and-...
by arpa on 6/12/21, 7:20 AM
As German engineers are known to say: "Why make things simple when complicated will do?"
by hardwaresofton on 6/12/21, 9:45 AM
A blog post series I've been meaning to write for over 3 years now:
* Every database a Postgres 1: Key/Value store
* Every database a Postgres 2: Document stores
* Every database a Postgres 3: Logs (Kafka-esque)
* Every database a Postgres 4: Timeseries
* Every database a Postgres 5: Full Text Search
* Every database a Postgres 6: Message Queues
Low key, you could make almost every single type of database a modern startup needs out of Postgres, and get the benefits (and drawbacks) of Postgres everywhere.
Should you do it? Probably not. Is it good enough for a theoretical ~70% of the startups out there who really don't shuffle around too much data or need to pretend to do any hyper scaling? Maybe.
If anyone from 2ndQuadrant/Citus/EDB see this, please do a series like this, make the solutions open source, and I bet we'd get some pretty decent performance out of Postgres compared to the purpose built solutions (remember, TimescaleDB did amazing compared to InfluxDB, a purpose built tool, not too long ago).
New features like custom table access methods and stuff also shift the capabilities of Postgres a ton. I'm fairly certain I could write a table access method that "just" allocated some memory and gave it to a redis subprocess (or even a compiled-in version) to use.
[EDIT] - It's not clear but the listing is in emacs org mode, those bullet points are expandable and I have tons of notes in each one of these (ex. time series has lots of activity in postgres -- TimescaleDB, native partitioning, Citus, etc). Unfortunately the first bullet point is 43 (!) bullet points down. If someone wants to fund my yak shaving reach out, otherwise someone signal boost this to 2Q/Citus/EDB so professionals can take a stab at it.
[EDIT2] - I forgot some, Postgres actually has:
- Graph support, w/ AgensGraph now known as AGE[0]
- OLAP workloads with Citus Columnar[1] (and zedstore[2]).
[0]: https://age.apache.org
[1]: https://www.citusdata.com/blog/2021/03/05/citus-10-release-o...
[2]: https://github.com/greenplum-db/postgres/tree/zedstore
by mosselman on 6/12/21, 7:45 AM
For rails I’ve used que in the past, which is a job queue adapter for Postgres. It supports activejob.
https://github.com/que-rb/que
by deckard1 on 6/12/21, 9:27 AM
I imagine most people using Redis as a queue were already using it as a cache and just needed some limited queuing ability. Much like how places end up using a DB as a queue.
Using a DB as a queue has been a thing for a very long time. Every billing system I've seen is a form of a queue: at a certain point in the month a process kicks off that scans the DB and bills customers, marking their record as "current".
The challenge is always going to be: what if the worker dies. What if the worker dies, the job is re-ran, and the customer is billed twice. Thank god it's been many years since I've had to touch cron batch jobs or queue workers. The thought of leaving the office knowing some batch job is going to run at 3am and the next morning might be total chaos... shudder.
by BiteCode_dev on 6/12/21, 9:18 AM
Oh but redis is much more than that.
It's so simple any little script can interact with it in seconds, instead of having to craft complex SQL or import your wrapper. You can call redis from the weirdest places, like from the inside of an nginx config file. You needn't even a compiled driver if that matters.
It's easier to get it to perform. It's possible to get great perfs with PostgreSQL, but you just got them for free with redis. Very hard to screw up. Read, write ? N+1 ? Batching ? Who cares, it will be fast no matter the concurrency or the load.
Sure you can expire with Postgres, but having done so in practice, it's way harder than it looks to get right. With redis, you don't have to care. Set a cache, set a lock with a timeout, store data your are not sure you need. It will all disapear.
Redis is not just key/value. You have sets, so ensuring unicity is a piece of cake, no constraints to define, then insertion to check against. And of course, you have sorted sets, which you are kept ordered at insertion by any number you pair the value with, such as a timestamp/score/index, and truncate by range for extra magic.
And then you have bonuses like hyperloglog which you need an extension for in Posgres. Super handy for stats.
Finally, you have streams, which for most apps will fill the gap for a timeserie database or a way to store your app logs. All that with a 2ms latency at 10k/s requests. None of my projects ever needed more than 1K/s though, even one with 1M users/month.
You have all of that with a dead simple install and basically no maintenance.
In fact, redis by itself consume almost no resource, it's not even worth it to not have it in your toolset. I just install it by default on all my projects: I never regretted it, there is always something it can do for you. It not now, just wait a bit, it's costing you nothing, and something will come up.
So no, let use Postgres for what it's great at, which is being a robust all purpose database. Redis is a fantastic complement to it, not a competitor, just use both.
Unless you are google size, there are little chances you will reach a stage where you need to migrate from any of them.
It's part of those tech that are just too good to be true, like SQLite or Python.
My only regret is that it doesn't exist on windows.
P.S: if you need caching and can't afford redis on a small python script, use Disk Cache, it's awesome: http://www.grantjenks.com/docs/diskcache/index.html
by skunkworker on 6/12/21, 7:52 AM
Hopefully one day projects like que[1][2] will become stable and battle-tested enough to use in a production environment. Until then I'll be using something like sidekiq (if you're going for a client-side job queue, eg: the clients don't really know about each other and only have rate-limiting, not true throttling).
With Postgres you also need to worry about high churn, especially since you are creating/locking/deleting rows constantly. This can be alleviated through a variety of means, of which personally I would use per-day table partitioning and truncate older partitions on a cron, not to mention the sharp increase in database connections to the host now required.
Ignoring the literal elephant in the room of synced writes to the store. Redis can be used quite effectively in a blocking manner with RPOPLPUSH/LMOVE(6.2+) for a reliable queue, allowing an item to not be lost because atomically the pop and push from two different lists are done together.
[1] https://github.com/que-rb/que [2] https://gist.github.com/chanks/7585810
by osigurdson on 6/12/21, 2:32 PM
This seems to come up on HN at least once a year. Sure it can work but LISTEN ties up a connection which limits scalability as connections are limited and expensive. Also, mitigation strategies like PgBouncer cannot be used with this approach (nor can scale out solutions like CitusDB I don't think).
Of course, if scalability is not a concern (or the connection limitations are eventually fixed in postgres), this would be a very viable approach.
by fasteo on 6/12/21, 9:53 AM
For use case 1 (job queue) I can only recommend beanstalkd[1]. Simple tcp plain text protocol. Install and forget. Plus you get some really useful features like deferred jobs, burying, kicking, job priorities, etc.
We have literally processed tens of billions of jobs without a single failure.
Old and simple technology that just works
[1] https://beanstalkd.github.io/
by dutchbrit on 6/12/21, 11:47 AM
Why use Redis when you have more suitable solutions like RabbitMQ or Kafka?
Obviously depends on the scale and needs of a project, Postgres etc is fine for simple queues.
I often see people waste unnecessary time by writing their own complex solutions, resulting in increasing technical debt, when you already have perfectly suitable open source options available that do a better job..
by cpursley on 6/12/21, 8:28 AM
With Elixir you can replace both with something like Quantum or backed by Postgres with Oban. Or even just a hand rolled genserver.
You can also listen to Postgres WAL changes to build an Event like system with Elixir. Supabase is doing this with their Realtime app.
by chmod775 on 6/12/21, 7:07 AM
Funny. My approach is usually the other way around: Can I get away with just Redis?
by leetrout on 6/12/21, 12:06 PM
Watch out for transaction ID and sequence exhaustion if you have a lot of things rolling through a table as a queue.
Postgres is awesome but logging and queuing in a table can cause gotchas you won’t have with redis.
by yukinon on 6/12/21, 8:03 AM
This is a great article because it outlines an alternative to using Redis for any given use cases. If we don't constantly re-evaluate our toolsets and their capabilities, it can lead to poor decision making.
That being said, I've found Redis largely a pleasure to work with for these use cases and don't really see a real incentive to changing my current approach.
by antihero on 6/12/21, 9:13 AM
I think one of the draws of redis was, back when it came out, how simple it was to set up and maintain in comparison to an RDBMS.
by mxyzptlk on 6/12/21, 2:49 PM
We tried it with just PostgreSQL and struggled with missed jobs. We tried it with Redis + PostgreSQL and haven't looked back. I'll take the blame for not engineering the first version adequately (this was before upsert) but Redis has been useful in so other ways that I'm glad it ended up in our architecture.
by jdlyga on 6/12/21, 5:01 PM
It's kind of like buying all sorts of running outfits and a camelbak to go on a 3 mile jog once a week. It's about overeager optimism of doing races and marathons in the future. Where in reality, you can get away with just running in the track pants and tshirt you already have on.
by jugg1es on 6/12/21, 6:15 PM
Why would you use a DB or Redis for job queuing when there are extremely inexpensive and highly optimized queuing systems in every major cloud provider?
I've had so many horrible experiences with DB-based queuing over my career once you get to a certain scale. Just use a message bus, bro!
by fiznool on 6/12/21, 6:13 PM
We’ve seen success in our current node.js application using pgboss [1] for job queueing. Since the primary database is Postgres, it’s nice to not introduce another dependency into the system, and take advantage of the infrastructure that we already have. The library supports most of what you’d expect from a job queue, and is crucially well maintained!
That being said, I agree with other comments that this is somewhat swimming against the tide. Redis is much more commonly used, and so if you don’t mind adding another service into the mix, I’d probably recommend going with Redis instead.
[1] https://github.com/timgit/pg-boss
by gsvclass on 6/12/21, 6:54 PM
And a search engine in Posgres instead of Elastic Search https://twitter.com/dosco/status/1400643969030127620
by larodi on 6/12/21, 10:16 AM
yes, do you really need PDF convertors when you can have them as PostgreSQL extensions.
the point (ok, one point of many) of REDIS is that it is not the main DB so you can have a sense of security and decoupling in the architecture. Besides - there is no silver bullet for all things. While you can have your app do everything with PostgreSQL (and much more with Oracle, something people dislike it about), the fact itself does not mean is a good design decision or is a more stable decision.
Because when you have redis for sessions, kafka for event streams, postgre (or else) for data storage - you have components that can fail separately and thus the system degrades gracefully.
by manishsharan on 6/12/21, 12:11 PM
Sometimes we need network based data structures that are other than relational tables. Redis delivers quite a few of those .. You can always implement then in a SQL database but Redis is just a better solution for those cases.
by void_mint on 6/12/21, 7:24 PM
"Do you really need < insert well-tested, verified technology > ? Why not just hand-roll your own in < different technology > ?"
Hopefully the emphasis is clear why this is silly.
by avinassh on 6/12/21, 8:03 AM
Most of the times Redis's distributed lock works fine, however one should know that its not fail proof and you might run into really weird bugs
references:
0 - https://aphyr.com/posts/283-jepsen-redis
1 - https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...
by bob1029 on 6/12/21, 3:57 PM
I feel I could write a similar article titled:
```
  Do you really need PostgreSQL? How to Get Away with Just SQLite
```
How many apps are running pg on a single node and don't really have plans to scale beyond this?
Exclusive ownership of a SQLite database by the application can deliver better performance than exclusive ownership over a single node postgres database. This also extends to SQL Server, DB2, Oracle, or any other hosted solution.
by Xophmeister on 6/12/21, 1:40 PM
SKIP LOCKED looks interesting; I'll have to try that. I've used advisory locks in the past, but I kept running into deadlocks when I tried to acquire them intelligently (just for popping off the queue). It was unclear why, at the time, so I just put an advisory lock on every transaction. Obviously that causes serious contention problems as the number of jobs and workers increase.
by markus_zhang on 6/12/21, 6:26 PM
I think it more or less depends on the mindset of the developer. For example from my observation developrrs from telecom or other big companies tend to turn every program into a "solution" i. e. it must have a full weaponry and other bells and whistles. Me? I'm the opposite amd won't add anything or fix anything unless it is really useful.
by zomgwat on 6/12/21, 10:14 PM
Security is another factor in choosing not to use a database for things like queues. I like to keep access to the database as tight as possible. I don’t want to start dishing out network and account access to services just because they need a queue or a distributed lock. I could run a separate database instance but that’s worse than just running Redis.
by munro on 6/12/21, 9:47 PM
I feel like sanity is being restored. Maybe I’m lazy, but yeah I use PostgreSQL for everything I can. Zookeeper is great, but I’ve used PostgreSQL for distributed locking & queueing. Sometimes it’s quicker for me to write a 10–20 line algorithm than install a new piece of infrastructure.
by dvfjsdhgfv on 6/12/21, 8:43 AM
Has anyone tested how listen/notify in pg (for in-memory databases) compares to pub/sub in Redis?
by gchamonlive on 6/12/21, 3:49 PM
Get your requirements in check.
How sensitive is your app to latency? How much data and request volume you need to handle?
Do proof of concepts, write thorough load tests and go for what makes sense.
Either way, no matter which tech you choose, make sure monitoring and alarms are in place and that you do regular maintenance exercises.
by gsvclass on 6/12/21, 6:53 PM
And others like this thread on building a MongoDB like JSON database in Postgres in one line. https://twitter.com/dosco/status/1401413712842346501
by didip on 6/12/21, 7:36 PM
The issue with Redis is that it's distributed story is not great. I wonder if their Raft experiment is finally GA or not. https://github.com/RedisLabs/redisraft
by gregors on 6/13/21, 3:52 AM
Remember when Circleci kept going down due to weirdness with their queued jobs in their database?
Read here https://status.circleci.com/incidents/8rklh3qqckp1
by phendrenad2 on 6/12/21, 8:02 AM
Wordpress historically didn't need a cache and just used page visits and/or a cron job to kick off automated processes, backed by MySQL. Is it fast? Yes. Is it nearly as fast as Redis? No. Do you need it to be? Not for Wordpress lol
by beforeolives on 6/12/21, 12:14 PM
Can anyone explain the need for redis in a clear way for someone who knows how databases work but isn't a backend developer? What alternatives are there? What did people do to solve the same problem before redis existed?
by gsvclass on 6/12/21, 6:52 PM
I just did a thread on the topic myself https://twitter.com/dosco/status/1402909104012623873
by papito on 6/12/21, 3:05 PM
The world would be a simpler place if people actually knew their storage engine like Postgres. I mean, knew it. All the features and how to leverage them in a performant way. Would eat less cloud energy too.
by xdanger on 6/13/21, 10:33 AM
A great job queue for PostgreSQL running on Node.js https://github.com/graphile/worker
I've been very happy with it.
by posharma on 6/12/21, 8:43 AM
What happened to RabbitMQ? Is that not used as a queue any more?
by flefto on 6/12/21, 10:38 AM
MySQL and Microsoft SQL server also support SKIP LOCKED.
by MichaelMoser123 on 6/12/21, 12:27 PM
it's cool that they added listen notify to postgres; I wonder when they will add many-to-many relationships; my problems with postgress start when i have to do a many to many relationship. For these you need to do an extra table for the link entries; now performance tanks, and you need to denormalise this for read requests, that's where the fun starts...
by qatanah on 6/12/21, 12:01 PM
For simple task queues, yes pg is ok. For high loads, redis is still better as PG generates a lot of WAL & connection overhead.
by busymom0 on 6/13/21, 8:11 AM
How does this compare in terms of performance since Redis keeps everything in memory while Postgres doesn't as far as I know?
by nameless912 on 6/12/21, 1:22 PM
It's always wild when an article on HN has an answer to the exact problem i wanted to solve this weekend. Kudos!
by u678u on 6/12/21, 3:39 PM
Surely redis is trivial to set up. Postgres gives you lots more maintenance and admin headaches.
by ibraheemdev on 6/12/21, 1:44 PM
Okay, but why? Redis is perfect for job queues and pub/sub. Use the right tool for the job.
by sgt on 6/12/21, 9:02 AM
As much as I love Postgres, I would rather use Redis for this. I haven't used Redis much though, and on our project we actually decided on using Kafka. Admittedly much heavier and maintenance intensive, it seems to do the job very well.
Any opinions and experiences here with Kafka vs Redis as a queue?
by iamgopal on 6/12/21, 10:11 AM
Does groupcache for golang works with cloud run ? As a redis replacement ?
by chrisallick on 6/12/21, 2:07 PM
interesting... im in opposite camp. do you really need anything other than key/value store and a data structure? ive used redis exclusively for close to a decade.
by ram_rar on 6/12/21, 7:42 AM
Why do you need postgres, if you can get away with Redis? For all the 3 points mentioned in the article, I would rather just use redis as apposed to postgres. Unless, you really need a rds solution for the project.
by encoderer on 6/12/21, 4:56 PM
Because one can, does not mean one should.
by dave_sid on 6/12/21, 12:58 PM
I’d think you don’t really need redis or elastic search in about 80% of the places you see it. You certainly don’t need the baggage.
by dandarie on 6/12/21, 10:38 AM
How about tags/sets?
by cs02rm0 on 6/12/21, 7:53 PM
I tend to ask the opposite question more often.
Do you really need Postgres?
by sometimesshit on 6/12/21, 5:36 PM
I need an article how to get away with just MangoDB
by axiosgunnar on 6/12/21, 7:38 AM
The point the author is missing is that most people are not deliberately choosing Redis.
They use packages such as celery which happen use Redis under the hood but the user doesn‘t have to deal with these details other than pasting some Redis connection URL somewhere.