by bshanks on 1/15/22, 7:11 PM with 59 comments
by sweezyjeezy on 1/17/22, 1:13 PM
Was this just a high level (possibly misguided) paradigm that the pandas devs fell in love with - or is there a good, performance related reason to embed it so deeply in the API?
by peatmoss on 1/17/22, 3:34 PM
Ditto the rest of the tidyverse.
by dunefox on 1/17/22, 5:23 PM
Comparison to dplyr, ...: https://dataframes.juliadata.org/stable/man/comparisons/#Com...
Comparison to Pandas: https://dataframes.juliadata.org/stable/man/comparisons/#Com...
by mgradowski on 1/17/22, 1:19 PM
by closed on 1/17/22, 4:07 PM
Siuba has come a long way since I wrote this, and now can optimize for fast grouped operations!:
* https://github.com/machow/siuba
* https://siuba.readthedocs.io/en/latest/developer/pandas-grou...
by psimm on 1/17/22, 10:29 PM
As other have said, escaping pandas is hard. Many visualization and data manipulation, validation and analysis libraries expect pandas input.
Siuba is really cool in that it offers a convenient syntax on top of pandas (and SQL databases) without requiring its own data format.
by lysecret on 1/17/22, 1:04 PM
out_rec = []
for id, group in data_frame.groupby("id"):
ladidida....
result = f(group)
out_rec.append(result)
in my experience it isn't much slower than a groupby.apply.by armanboyaci on 1/17/22, 7:23 PM
(user_courses
.set_index(["student_id",
"course_id"])
.unstack()
.apply(lambda x: x+1))
by tpoacher on 1/17/22, 10:35 PM
(no disrespect to to the package in the article or OP who I know is active in this thread. just a general motif that I keep coming across in python).
by usermi on 1/17/22, 2:57 PM
by mint2 on 1/16/22, 4:29 PM
Port the functionality of the R package but try to keep it python. Run flake8.
by bobolito on 1/17/22, 2:19 PM