by cgarciae on 9/24/18, 4:21 AM with 46 comments
by anentropic on 9/24/18, 11:43 AM
pypeline --> pypeln
multiprocessing pipeline --> pr
threads pipeline --> th
asyncio pipeline --> io
this is totally unnecessary
If I want to use short abbreviated names in my code I can always `from pypeline import multiprocess_pipeline as pr`
Your library shouldn't export them like this as the default.
`io` is especially bad since this overshadows the `io` module in the Python stdlib
by elsherbini on 9/24/18, 12:50 PM
It also allows you to use UNIX pipes with your dependent jobs when that is appropriate [1].
[0] https://snakemake.readthedocs.io/en/stable/index.html
[1] https://snakemake.readthedocs.io/en/stable/snakefiles/rules....
by somewhatoff on 9/24/18, 10:30 AM
by adamcharnock on 9/24/18, 9:12 AM
Pypeline was designed to solve simple medium
data tasks that require concurrency
and parallelism but where using frameworks
like Spark or Dask feel exaggerated or unnatural.
This is exactly what I was looking for very recently. Thank you for writing this, I'll certainly look into it.by chrisjc on 9/24/18, 3:25 PM
https://github.com/pditommaso/awesome-pipeline/blob/master/R...
by snidane on 9/24/18, 7:50 PM
Piping using the | operator can make tracebacks pretty ugly with some operators.
If you want to keep the code still somewhat 'pythonic' without introducing the syntax magic using |, you can do it similarly:
range(10)
| pp.flatmap(lambda x: [x + 1, x + 2])
| pp.map(lambda x: x * x)
...
You can do this instead: xs = range(10)
xs = pp.flatmap(xs, lambda x: [x + 1, x + 2])
xs = pp.map(xs, lambda x: x * x)
...
It helps to keep the operand as first argument, instead of last, because those lambdas are best kept at the end.So instead of
map(fn, xs)
do map(xs, fn)
by roel_v on 9/24/18, 1:48 PM
(I was actually just writing a spec for a new tool that does just this this afternoon because I can't find anything suitable)
by bayesian_horse on 9/24/18, 5:20 PM
Also, there is "Streamz" which solves a similar problem, seems more mature and can work with or without Dask or Dask-Distributed.
by timkpaine on 9/24/18, 3:05 PM
by TBastiani on 9/24/18, 12:11 PM
by davidnet on 9/24/18, 4:41 PM
by make3 on 9/24/18, 3:50 PM