from Hacker News

Building an Open Source Real Time Data Replication in Go for MongoDB –> Iceberg

by pkhodiyar on 1/15/25, 7:14 AM with 1 comments

  • by pkhodiyar on 1/15/25, 7:14 AM

    When building OLake, our goal was simple: Fastest DB to Data LakeHouse (Apache Iceberg to start) data pipeline.

    Checkout GtiHub repository for OLake - https://github.com/datazip-inc/olake

    Over time, many of us who’ve worked with data pipelines have dealt with the toil of building one-off ETL scripts, battling performance bottlenecks, or worrying about vendor lock-in.

    With OLake, we wanted a clean, open-source solution that solves these problems in a straightforward, high-performing manner.

    In this blog, I’m going to walk you through the architecture of OLake—how we capture data from MongoDB, push it into S3 in Apache Iceberg format or other data Lakehouse formats, and handle everything from schema evolution to high-volume parallel loads.