by 0xedb on 6/27/24, 8:46 PM with 23 comments
by galkk on 6/27/24, 11:39 PM
Most recent example - converting huge amount of xml files to parquet. I started very fast with python + pyarrow, but when I realized that parallelizing execution would help enormously, I hit GIL or picking/unpickling/multiprocessing costs.
It did work in python, in the end, but I feel that writing that in Rust/C# (even if I don't know Rust besides tutorials) in the end would be much more performant.
by bin_bash on 6/28/24, 2:30 AM
by gumby on 6/27/24, 10:05 PM
by Talinx on 6/28/24, 2:54 PM
> Before our migration, the old pipeline utilized a C library accessed through a Python service, which buffered and bundled data. This was really the critical aspect that was causing our latency.
How much speed up would there have been if they moved to a Rust wrapper around the same C library?
Using something other than Python is almost always going to be faster. This Reddit post does not give any insights into which aspects of Python lead to small/large performance hits. They show that it was the right solution for them with ample documentation which is great, but they don't provide any generalizable information.
by Havoc on 6/28/24, 9:43 AM
Just comes down to whether you need speed of building it or speed of program
by pjmlp on 6/28/24, 9:01 AM
by gregors on 6/28/24, 4:44 PM
Guess I'm never satisfied
by jti107 on 6/28/24, 12:34 AM