from Hacker News

Show HN: AWS S3 Connector with DuckDB – Query AI/ML Batch Results Directly in S3

by nagstler on 7/5/24, 3:24 AM with 2 comments

[ https://github.com/Multiwoven/multiwoven ]

Multiwoven, our Open Source alternative to Hightouch, Census and RudderStack, has always been about making data available where it's needed.

We've added a new AWS S3 connector as a data source to Multiwoven, This data source connector has been a highly requested feature from the community.

We believe we've not only added AWS S3 as a data source, but also optimized the performance of querying data stored in S3 buckets.

We've integrated DuckDB, an in-memory analytical database, to provide fast and efficient SQL query execution on large datasets directly in S3.

-> Features:

1. IAM and Role-based Access - Securely connect to AWS S3 buckets using IAM or role-based permissions.

2. File Format Support - Native support for CSV and Parquet file formats.

3. DuckDB Powered Performance - Utilizes hashtag#DuckDB, an in-memory analytical database, for fast and efficient SQL query execution on large datasets directly in S3.

4. Native SQL Interface - Execute SQL queries directly on data stored in S3 buckets, eliminating the need for intermediate scripting steps or data movement to a separate database.

-> Use Cases:

* Query and Transform - Convert ML model batch results stored in S3 buckets into actionable insights.

* Sync Data - Sync log data or event streams from S3 to business applications like Salesforce, Google Sheets, or other destinations for real-time analytics.

[ https://github.com/Multiwoven/multiwoven ]

Refer to our GitHub repository for more information & hit the star button to show your support :)

  • by 8organicbits on 7/5/24, 4:34 AM

    Isn't DuckDB is an in-process database, not an in-memory database?