by billwashere on 10/8/23, 9:43 AM with 71 comments
by samokhvalov on 10/8/23, 4:50 PM
Indeed, looking at the benchmark source code (thanks for providing it!), it completely lacks index for the native case, leading to a false statement the that native full-text search indexes Postgres provides (usually GIN indexes on tsvector columns) are slow.
https://github.com/paradedb/paradedb/blob/bb4f2890942b85be3e... – here the tsvector is being built. But this is not an index. You need CREATE INDEX ... USING gin(search_vector);
This mistake could be avoided if bencharks included query plans collected with EXPLAIN (ANALYZE, BUFFERS). It would quickly become clear that for the "native" case, we're dealing with SeqScan, not IndexScan.
GINs are very fast. They are designed to be very fast for search – but they have a problem with slower UPDATEs in some cases.
Another point, fuzzy search also exists, via pg_trgm. Of course, dealing with these things require understanding, tuning, and usually a "lego game" to be played – building products out of the existing (or new) "bricks" totally makes sense to me.
by retakeming on 10/8/23, 5:11 PM
pg_bm25 is our first step in building an Elasticsearch alternative on Postgres. We built it as a result of working on hybrid search in Postgres and becoming frustrated with Postgres' sparse feature set when it comes to full text search.
To address a few of the discussion points, today pg_bm25 can be installed on self-hosted Postgres instances. Managed Postgres providers like RDS are pretty restrictive when it comes to the Postgres extension ecosystem, which is why we're currently working on a managed Postgres database called ParadeDB which comes with pg_bm25 preinstalled. It'll be available in private beta next week and there's a waitlist on our website (https://www.paradedb.com/).
by phamilton on 10/8/23, 3:33 PM
My understanding of the spirit of the license is that it should be fine as long as modifications are made available. Anyone know of any existing extensions in RDS that are AGPL?
by hardwaresofton on 10/8/23, 2:47 PM
Awesome to see so many high quality extensions come out of it.
by wkoszek on 10/8/23, 7:20 PM
This was the top reason that made us (Segmed.ai) give up on PostgreSQL FTS -- our folks require a very exact count of matches for medical conditions that are present in 20M reports. And doing COUNT() in PostgreSQL was crazy, crazy slow. If your extension could do simple len(invertedindex[word]) that would already be a great improvement.
ELK has it immediately, but at a cost of being one more thing to maintain, and the whole Logstash thing is clunky. I'd love to use FTS inside of PostgreSQL.
by machty on 10/11/23, 11:18 PM
by mkleczek on 10/8/23, 2:29 PM
by iamdanieljohns on 10/8/23, 7:13 PM
by ckok on 10/9/23, 8:30 PM
by antman on 10/8/23, 2:25 PM
by tristan957 on 10/9/23, 1:27 AM
by rawsh on 10/8/23, 6:56 PM
by anon373839 on 10/8/23, 10:58 PM
by est on 10/8/23, 3:28 PM
by aiunboxed on 10/8/23, 2:56 PM
by canadiantim on 10/8/23, 3:04 PM
by eclectic29 on 10/8/23, 9:56 PM
by mugivarra69 on 10/8/23, 4:37 PM
by stopman on 10/8/23, 4:51 PM