by hyzyla on 6/12/21, 6:57 AM with 472 comments
by _ugfj on 6/12/21, 7:29 AM
1. SELECT item_id WHERE expire = 0. If this is empty, no items are available.
2. UPDATE SET expire = some_future_time WHERE item_id = $selected_item_id AND expire = 0. Then check whether UPDATE affected any rows. If it did, item_id is yours. If not, loop. If the database has a sane optimizer it'll note at most one document needs locking as the primary id is given.
All this needs is a very weak property: document level atomic UPDATE which can return whether it changed anything. (How weak? MongoDB could do that in 2009.)
Source code at https://git.drupalcode.org/project/drupal/-/blob/9.2.x/core/... (We cooked this up for Drupal in 2009 but I am reasonably sure we didn't invent anything new.)
Of course, this is not the fastest job queue there is but it is quite often good enough.
by taf2 on 6/12/21, 2:29 PM
by nickjj on 6/12/21, 10:13 AM
For example Python has Celery and Ruby has Sidekiq. As far as I know there's no libraries in either language that has something as battle hardened with comparable features for background tasks using Postgres as a backend.
There's a big difference between getting something to work in a demo (achievable by skimming PG's docs and rolling your own job queue) vs using something that has tens of thousands of hours of dev time and tons of real world usage.
I'm all for using PG for things like full text search when I can because it drastically reduces operation complexity if you can avoid needing to run Elasticsearch, but Redis on the other hand is a swiss army knife of awesome. It's often used for caching or as a session back-end so you probably have it as part of your stack already. It's also really easy to run, uses almost no resources and is in the same tier as nginx in terms of how crazy efficient it is and how reliable it is. I don't see not using Redis for a job queue as that big of a win.
by pilif on 6/12/21, 8:43 AM
by petepete on 6/12/21, 7:08 AM
The one feature of Redis I'd love to have supported in PostgreSQL is to be able to set a TTL on a record. On a recent project where we could only save personal data for so long, it was a must have feature so we had to use Redis for that purpose instead.
by Aeolun on 6/12/21, 7:40 AM
Any time someone says ‘this works for small workloads’ I hear, ‘we’ll eventually have to migrate this’.
That’s not a problem if you are talking about introducing a hugely complex piece of infrastructure, but Redis is hardly that.
by truth_seeker on 6/12/21, 1:29 PM
But if you must use disk based table for Job queueing, set fillfactor = 50. This takes care of heavy updates.
Indexes are helpful but costs memory and CPU, so always make sure you partition the table based on job_type for performant pending job query.
I wouldn't recommend using LISTEN/NOTIFY unless you are okay with "at most once" semantics. I have used disk table based approach for PUB/SUB to replace Kafka. More fine tuned approach will also allow (job_type, consumer_group, publisher) as a partition key.
Ref - https://www.postgresql.org/docs/current/sql-createtable.html
by simonw on 6/12/21, 2:35 PM
Brandur wrote a great article about that here: https://brandur.org/postgres-queues
by Rygu on 6/12/21, 10:08 AM
More: https://use-the-index-luke.com/sql/where-clause/partial-and-...
by arpa on 6/12/21, 7:20 AM
by hardwaresofton on 6/12/21, 9:45 AM
* Every database a Postgres 1: Key/Value store
* Every database a Postgres 2: Document stores
* Every database a Postgres 3: Logs (Kafka-esque)
* Every database a Postgres 4: Timeseries
* Every database a Postgres 5: Full Text Search
* Every database a Postgres 6: Message Queues
Low key, you could make almost every single type of database a modern startup needs out of Postgres, and get the benefits (and drawbacks) of Postgres everywhere.
Should you do it? Probably not. Is it good enough for a theoretical ~70% of the startups out there who really don't shuffle around too much data or need to pretend to do any hyper scaling? Maybe.
If anyone from 2ndQuadrant/Citus/EDB see this, please do a series like this, make the solutions open source, and I bet we'd get some pretty decent performance out of Postgres compared to the purpose built solutions (remember, TimescaleDB did amazing compared to InfluxDB, a purpose built tool, not too long ago).
New features like custom table access methods and stuff also shift the capabilities of Postgres a ton. I'm fairly certain I could write a table access method that "just" allocated some memory and gave it to a redis subprocess (or even a compiled-in version) to use.
[EDIT] - It's not clear but the listing is in emacs org mode, those bullet points are expandable and I have tons of notes in each one of these (ex. time series has lots of activity in postgres -- TimescaleDB, native partitioning, Citus, etc). Unfortunately the first bullet point is 43 (!) bullet points down. If someone wants to fund my yak shaving reach out, otherwise someone signal boost this to 2Q/Citus/EDB so professionals can take a stab at it.
[EDIT2] - I forgot some, Postgres actually has:
- Graph support, w/ AgensGraph now known as AGE[0]
- OLAP workloads with Citus Columnar[1] (and zedstore[2]).
[1]: https://www.citusdata.com/blog/2021/03/05/citus-10-release-o...
by mosselman on 6/12/21, 7:45 AM
by deckard1 on 6/12/21, 9:27 AM
Using a DB as a queue has been a thing for a very long time. Every billing system I've seen is a form of a queue: at a certain point in the month a process kicks off that scans the DB and bills customers, marking their record as "current".
The challenge is always going to be: what if the worker dies. What if the worker dies, the job is re-ran, and the customer is billed twice. Thank god it's been many years since I've had to touch cron batch jobs or queue workers. The thought of leaving the office knowing some batch job is going to run at 3am and the next morning might be total chaos... shudder.
by BiteCode_dev on 6/12/21, 9:18 AM
It's so simple any little script can interact with it in seconds, instead of having to craft complex SQL or import your wrapper. You can call redis from the weirdest places, like from the inside of an nginx config file. You needn't even a compiled driver if that matters.
It's easier to get it to perform. It's possible to get great perfs with PostgreSQL, but you just got them for free with redis. Very hard to screw up. Read, write ? N+1 ? Batching ? Who cares, it will be fast no matter the concurrency or the load.
Sure you can expire with Postgres, but having done so in practice, it's way harder than it looks to get right. With redis, you don't have to care. Set a cache, set a lock with a timeout, store data your are not sure you need. It will all disapear.
Redis is not just key/value. You have sets, so ensuring unicity is a piece of cake, no constraints to define, then insertion to check against. And of course, you have sorted sets, which you are kept ordered at insertion by any number you pair the value with, such as a timestamp/score/index, and truncate by range for extra magic.
And then you have bonuses like hyperloglog which you need an extension for in Posgres. Super handy for stats.
Finally, you have streams, which for most apps will fill the gap for a timeserie database or a way to store your app logs. All that with a 2ms latency at 10k/s requests. None of my projects ever needed more than 1K/s though, even one with 1M users/month.
You have all of that with a dead simple install and basically no maintenance.
In fact, redis by itself consume almost no resource, it's not even worth it to not have it in your toolset. I just install it by default on all my projects: I never regretted it, there is always something it can do for you. It not now, just wait a bit, it's costing you nothing, and something will come up.
So no, let use Postgres for what it's great at, which is being a robust all purpose database. Redis is a fantastic complement to it, not a competitor, just use both.
Unless you are google size, there are little chances you will reach a stage where you need to migrate from any of them.
It's part of those tech that are just too good to be true, like SQLite or Python.
My only regret is that it doesn't exist on windows.
P.S: if you need caching and can't afford redis on a small python script, use Disk Cache, it's awesome: http://www.grantjenks.com/docs/diskcache/index.html
by skunkworker on 6/12/21, 7:52 AM
With Postgres you also need to worry about high churn, especially since you are creating/locking/deleting rows constantly. This can be alleviated through a variety of means, of which personally I would use per-day table partitioning and truncate older partitions on a cron, not to mention the sharp increase in database connections to the host now required.
Ignoring the literal elephant in the room of synced writes to the store. Redis can be used quite effectively in a blocking manner with RPOPLPUSH/LMOVE(6.2+) for a reliable queue, allowing an item to not be lost because atomically the pop and push from two different lists are done together.
[1] https://github.com/que-rb/que [2] https://gist.github.com/chanks/7585810
by osigurdson on 6/12/21, 2:32 PM
Of course, if scalability is not a concern (or the connection limitations are eventually fixed in postgres), this would be a very viable approach.
by fasteo on 6/12/21, 9:53 AM
We have literally processed tens of billions of jobs without a single failure.
Old and simple technology that just works
by dutchbrit on 6/12/21, 11:47 AM
Obviously depends on the scale and needs of a project, Postgres etc is fine for simple queues.
I often see people waste unnecessary time by writing their own complex solutions, resulting in increasing technical debt, when you already have perfectly suitable open source options available that do a better job..
by cpursley on 6/12/21, 8:28 AM
You can also listen to Postgres WAL changes to build an Event like system with Elixir. Supabase is doing this with their Realtime app.
by chmod775 on 6/12/21, 7:07 AM
by leetrout on 6/12/21, 12:06 PM
Postgres is awesome but logging and queuing in a table can cause gotchas you won’t have with redis.
by yukinon on 6/12/21, 8:03 AM
That being said, I've found Redis largely a pleasure to work with for these use cases and don't really see a real incentive to changing my current approach.
by antihero on 6/12/21, 9:13 AM
by mxyzptlk on 6/12/21, 2:49 PM
by jdlyga on 6/12/21, 5:01 PM
by jugg1es on 6/12/21, 6:15 PM
I've had so many horrible experiences with DB-based queuing over my career once you get to a certain scale. Just use a message bus, bro!
by fiznool on 6/12/21, 6:13 PM
That being said, I agree with other comments that this is somewhat swimming against the tide. Redis is much more commonly used, and so if you don’t mind adding another service into the mix, I’d probably recommend going with Redis instead.
by gsvclass on 6/12/21, 6:54 PM
by larodi on 6/12/21, 10:16 AM
the point (ok, one point of many) of REDIS is that it is not the main DB so you can have a sense of security and decoupling in the architecture. Besides - there is no silver bullet for all things. While you can have your app do everything with PostgreSQL (and much more with Oracle, something people dislike it about), the fact itself does not mean is a good design decision or is a more stable decision.
Because when you have redis for sessions, kafka for event streams, postgre (or else) for data storage - you have components that can fail separately and thus the system degrades gracefully.
by manishsharan on 6/12/21, 12:11 PM
by void_mint on 6/12/21, 7:24 PM
Hopefully the emphasis is clear why this is silly.
by avinassh on 6/12/21, 8:03 AM
references:
0 - https://aphyr.com/posts/283-jepsen-redis
1 - https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...
by bob1029 on 6/12/21, 3:57 PM
Do you really need PostgreSQL? How to Get Away with Just SQLite
How many apps are running pg on a single node and don't really have plans to scale beyond this?Exclusive ownership of a SQLite database by the application can deliver better performance than exclusive ownership over a single node postgres database. This also extends to SQL Server, DB2, Oracle, or any other hosted solution.
by Xophmeister on 6/12/21, 1:40 PM
by markus_zhang on 6/12/21, 6:26 PM
by zomgwat on 6/12/21, 10:14 PM
by munro on 6/12/21, 9:47 PM
by dvfjsdhgfv on 6/12/21, 8:43 AM
by gchamonlive on 6/12/21, 3:49 PM
How sensitive is your app to latency? How much data and request volume you need to handle?
Do proof of concepts, write thorough load tests and go for what makes sense.
Either way, no matter which tech you choose, make sure monitoring and alarms are in place and that you do regular maintenance exercises.
by gsvclass on 6/12/21, 6:53 PM
by didip on 6/12/21, 7:36 PM
by gregors on 6/13/21, 3:52 AM
Read here https://status.circleci.com/incidents/8rklh3qqckp1
by phendrenad2 on 6/12/21, 8:02 AM
by beforeolives on 6/12/21, 12:14 PM
by gsvclass on 6/12/21, 6:52 PM
by papito on 6/12/21, 3:05 PM
by xdanger on 6/13/21, 10:33 AM
I've been very happy with it.
by posharma on 6/12/21, 8:43 AM
by flefto on 6/12/21, 10:38 AM
by MichaelMoser123 on 6/12/21, 12:27 PM
by qatanah on 6/12/21, 12:01 PM
by busymom0 on 6/13/21, 8:11 AM
by nameless912 on 6/12/21, 1:22 PM
by u678u on 6/12/21, 3:39 PM
by ibraheemdev on 6/12/21, 1:44 PM
by sgt on 6/12/21, 9:02 AM
Any opinions and experiences here with Kafka vs Redis as a queue?
by iamgopal on 6/12/21, 10:11 AM
by chrisallick on 6/12/21, 2:07 PM
by ram_rar on 6/12/21, 7:42 AM
by encoderer on 6/12/21, 4:56 PM
by dave_sid on 6/12/21, 12:58 PM
by dandarie on 6/12/21, 10:38 AM
by cs02rm0 on 6/12/21, 7:53 PM
Do you really need Postgres?
by sometimesshit on 6/12/21, 5:36 PM
by axiosgunnar on 6/12/21, 7:38 AM
They use packages such as celery which happen use Redis under the hood but the user doesn‘t have to deal with these details other than pasting some Redis connection URL somewhere.