from Hacker News

GitLab is working on a tool just for data teams

by TheMissingPiece on 8/1/18, 9:56 PM with 94 comments

  • by slap_shot on 8/1/18, 11:44 PM

    This looks like an amalgamation of 8+ open source projects or industries with products put forth by companies that have dozens of employees and worked on their products for years.

    It also doesn't even categorize the products they compete with correctly[0].

    Why not contribute some of your resources to one of the many active open source libraries already trying to solve some of these problems, and focus your engineering efforts on your core product?

    [0] Fivetran is only considered "Orchestrate" but is actually competes directly with Alooma in the Extract and Load. Also, there are DOZENS of company in that space. https://gitlab.com/meltano/meltano/blob/master/README.md#dat...

  • by cheghook on 8/2/18, 12:20 AM

    I can't understand why GitLab thinks they have to embark on a new project every so often instead of focusing on their current product and features. There is just a lot to work on, so many of the current features/products are half assed. At my place we moved to GitLab 2.5 years ago and updates where smoother back then but the past few months we had to hire a new sys admin for our build machines and GitLab server to follow on new issues created on GitLab.com and decide if it's safe release and even then he still reports 4-5 issues to GitLab support after every update. We were expecting it to be an easy `yum update` like a normal package but it's just getting worse update after update. It's so bad that my manager asked me to look into GitHub + another CI/CD solution.
  • by georgewfraser on 8/2/18, 6:29 AM

    Data pipelines are not a great subject for an open-source project. We've been building these for the last 3+ years at Fivetran, and I can tell you that the challenge is:

      - Studying each source to figure out the right data model
      - Chasing down a million weird corner cases
      - Working around dumb bugs in the data sources
    
    This is the kind of problem where paying for software really works better. When people build data pipelines in-house, they tend to hack at it until it works for their use case and then stop. When we build data pipelines, we map out every feature of the data source, implement the whole thing at once, and then put it through a beta period with multiple real users. This is easy to do when you have a tight-knit dev team; much harder for a group of part-time open-source contributors.
  • by tbrock on 8/2/18, 11:19 AM

    I wish they would focus on making a fast, stable, GitHub alternative.
  • by n42 on 8/2/18, 12:53 AM

    Is there any example of an open source software company that has taken on so many products at once, so early in its life, and succeeded?
  • by veritas3241 on 8/1/18, 11:31 PM

    Taylor from GitLab here! Happy to answer any questions about what we're doing.
  • by _pmf_ on 8/2/18, 8:36 AM

    GitLab's usage of team members in marketing material is creeping me out (as does the whole team page[0]).

    [0] https://about.gitlab.com/team/

  • by ageofwant on 8/1/18, 11:05 PM

    https://quiltdata.com/ ticks a lot of boxes in this space for me.
  • by danpalmer on 8/2/18, 8:57 AM

    Reading this I was concerned that it would be written in Ruby. While Ruby is a reasonable language for server development, it has almost no data science community when compared with some other ecosystems.

    I was very glad to see this is Python! Python has some of the best data tools out there, and a mature ecosystem for solving all the engineering problems that go along with a great data stack.

  • by tamersalama on 8/2/18, 5:16 AM

    Is there some resemblance with Floydhub http://floydhub.com/ ?
  • by Luuseens on 8/2/18, 10:06 AM

    The page talks mentions MVC, and the issue page[0] keeps mentioning MVC as well. Was this supposed to be MVP, or something else? Model-view-controller doesn't make sense in the context.

    [0] https://gitlab.com/meltano/meltano/issues/10

  • by ajbosco on 8/2/18, 1:51 PM

    Do you see this as a (future) competitor of Airflow/Luigi type workflow tools?
  • by hn_throwaway_99 on 8/1/18, 11:20 PM

    Be interested to know all the competitors in this space. https://data.world/ is one I am most familiar with.
  • by gandutraveler on 8/2/18, 6:57 AM

    Looks like gitlab just wants to be in news since Microsoft's aquisition of GitHub.
  • by sbr464 on 8/2/18, 9:00 AM

    Are you releasing/sharing any of the extractors you built for various services?