from Hacker News

Pandas on Ray – Early Lessons from Parallelizing Pandas

by xmo on 7/7/18, 12:04 PM with 19 comments

  • by miggyrozay on 7/8/18, 2:03 AM

    How does this compare to dask.distributed? Dask dataframes are also a wrapper on pandas API.

    edit- They explain differences in a section of this blog post: https://rise.cs.berkeley.edu/blog/pandas-on-ray/

  • by innagadadavida on 7/7/18, 11:55 PM

    Does anyone here know if Ray is some sort of Yarn competitor? If not what problem space is it in?
  • by kgos on 7/7/18, 6:57 PM

  • by axiom92 on 7/8/18, 1:55 AM

    This could be really helpful for implementations that were written with relatively smaller datasets in mind but now need to be scaled up. However, for someone starting from scratch, it is not clear what advantages do they plan to offer against Spark used with the Dataframe API.
  • by rmbeard on 7/8/18, 12:02 AM

    Unclear what this is good for.
  • by guard0g on 7/8/18, 2:43 AM

    This looks interesting. Thanks for sharing and will have my DS team try it out.