from Hacker News

Apache Arrow Flight: A Framework for Fast Data Transport

by stablemap on 10/15/19, 4:20 PM with 21 comments

  • by fulafel on 10/16/19, 5:24 AM

    It's interesting how much faster your laptop SSD is compared to these high end performance oriented systems. Keeping in mind that the localhost/tls-disabled number is a high bound. (Not singling out Arrow by any means, most others are slower. )

    I wonder which came first, the petering off of wired network hardware perf improvement, or the software bottlenecks that become obvious if we try to use today's faster networks. 100 Mb ethernet came in 1995, gigE in 1999, 10 gigE in 2002 and gained adoption in a few years.. on that track we should have had 100gigE in 2006 and seen it in servers in 2008 / workstations in 2010. And switches / routers should have seen terabit ethernet in 2010. Today's servers(X) seem to be at about 25 GBe, and with multicore that's just 1-2 gigabits per core.

    (X) according to https://www.supermicro.com/products/system/1U/

  • by jumpingmice on 10/15/19, 6:21 PM

    More people should try high performance services with non-traditional protobuf implementations. The fact that every language has a generated parser in no way preclude you from parsing them yourself. Hand-rolled serialization of your outbound messages can also be really fast, and the C++ gRPC stack will just accept preformatted messages and put them on the wire. Finally the existence of gRPC itself should not make you feel constrained against implementing the entire protocol yourself. It’s just HTTP/2 with conventional headers.
  • by wodenokoto on 10/16/19, 6:11 AM

    A bit off topic, but since this is implemented using gRPC, I’d like to ask, what is RPC and how does one make an (g)RPC call?

    My understanding is it’s a binary alternative to JSON/REST API and all google cloud platform services uses it, however, since I have not managed to figure out how to do a single interaction with RPC against gcp (or any other service), I am wondering if my understanding is completely wrong here.

  • by riboflavin on 10/15/19, 6:59 PM

  • by algorithmsRcool on 10/16/19, 12:32 AM

    Are there any thoughts about where compression fits into this model?

    I know networks are getting very fast but with this size of data I wonder if there are realizable gains left with modern algorithms like Snappy.

  • by RocketSyntax on 10/15/19, 6:43 PM

    We are struggling with reliability when using mounting solutions for big data in S3. Would this help?
  • by maximente on 10/15/19, 6:19 PM