from Hacker News

DuckDB 1.0.0

by nnx on 6/3/24, 1:18 PM with 31 comments

  • by gunapologist99 on 6/3/24, 1:49 PM

    > DuckDB Labs, the company that employs DuckDB’s core contributors, has not had any outside investments, and as a result, the company is fully owned by the team. Labs’ business model is to provide consulting and support services for DuckDB, and we’re happy to report that this is going well. With the revenue from contracts, we fund long-term and strategic DuckDB development with a team of almost 20 people. At the same time, the intellectual property in the project is guarded by the independent DuckDB Foundation. This non-profit foundation ensures that DuckDB will be around long-term under the MIT license.

    This seems like an excellent structure for long-term protection of the open source project. What other projects have taken this approach?

  • by nomilk on 6/3/24, 1:44 PM

    Do any data scientists here use duckdb daily? Keen to hear your experiences and comparisons to other tools you used before it.

    I love tools that make life simpler. I've been toying with the idea of storing 1TB of data in S3 and querying it using duckdb on an EC2. That's really old/boring infrastructure, but is hugely appealing to me, since it's so much simpler than than what I currently use.

    Would love to hear of others' experiences with duckdb.

  • by mgt19937 on 6/3/24, 2:25 PM

    One cool feature of duckdb is that you can directly run sql against a pandas dataframe/arrow table.[1] The seamless integration is amazing.

    [1]: https://duckdb.org/docs/api/python/overview.html#dataframes

  • by losvedir on 6/3/24, 1:47 PM

    Congrats to the team! I feel like I see lots of posts here on HN and go "wow, I didn't know DuckDB could do that". It seems like a very powerful tool, which I haven't had the pleasure of using yet.

    Due to policies at work it's unlikely we would use this in production, but as I understand it, it's still pretty useful for exploring and poking around local data. Is that right? Does anyone have examples of problems they've used it on to digest local files or logs or something?

  • by bufferoverflow on 6/3/24, 3:51 PM

    Have they fixed the incredibly slow queries on indexed columns?

    https://www.lukas-barth.net/blog/sqlite-duckdb-benchmark/

  • by aranw on 6/3/24, 2:10 PM

    I've been wanting to explore using DuckDB for in-process aggregation and windowing in stream processing with Golang, as I think it would be a great solution.

    Curious if anyone else is using DuckDB for something similar? Does anyone have an example?

  • by netcraft on 6/3/24, 1:37 PM

    I havent had a chance to really use it yet, but I know duckdb is in my future. Being able to connect it to all the different data sources to run analytical queries, plus the support for parquet.
  • by amath on 6/3/24, 8:56 PM

    This seems like a good model for sustaining open source, but raises some questions.

    Does anybody know how the DuckDB foundation works? The sponsors are MotherDuck, Voltron, and Posit, which are heavily venture-funded. Do DuckDB Lab employees work on sponsored projects for the foundation?

    I am also curious if anyone can shed light on what kind of contract work DuckDB does to align its work with the open source project. This has always seemed like the holy grail, but it is difficult to do in practice.

  • by art__friedman on 6/4/24, 6:35 AM

    may I ask, what is the value of using DuckDB vs loading data from parquet directly into pandas? Apart from the fact with duck db you can load part of data rather than the entire file into memory?
  • by dnoberon on 6/3/24, 1:35 PM

    Amazing. So glad this project came along!
  • by canadiantim on 6/3/24, 1:47 PM

    Does DuckDB have any graph capabilities?