by Mond_ on 4/21/25, 12:16 PM with 348 comments
by invalidator on 4/21/25, 11:31 PM
Compare with a simple pipeline in bash:
grep needle < haystack.txt | sed 's/foo/bar/g' | xargs wc -l
Each of those components executes in parallel, with the intermediate results streaming between them. You get a similar effect with coroutines.Compare Ruby:
data = File.readlines("haystack.txt")
.map(&:strip)
.grep(/needle/)
.map { |i| i.gsub('foo', 'bar') }
.map { |i| File.readlines(i).count }
In that case, each line is processed sequentially, with a complete array being created between each step. Nothing actually gets pipelined.Despite being clean and readable, I don't tend to do it any more, because it's harder to debug. More often these days, I write things like this:
data = File.readlines("haystack.txt")
data = data.map(&:strip)
data = data.grep(/needle/)
data = data.map { |i| i.gsub('foo', 'bar') }
data = data.map { |i| File.readlines(i).count }
It's ugly, but you know what? I can set a breakpoint anywhere and inspect the intermediate states without having to edit the script in prod. Sometimes ugly and boring is better.by bnchrch on 4/21/25, 1:14 PM
However.
I would be lying if I didn't secretly wish that all languages adopted the `|>` syntax from Elixir.
```
params
|> Map.get("user")
|> create_user()
|> notify_admin()
```
by Straw on 4/21/25, 5:25 PM
For example, we can write: (foo (bar (baz x))) as (-> x baz bar foo)
If there are additional arguments, we can accommodate those too: (sin (* x pi) as (-> x (* pi) sin)
Where expression so far gets inserted as the first argument to any form. If you want it inserted as the last argument, you can use ->> instead:
(filter positive? (map sin x)) as (->> x (map sin) (filter positive?))
You can also get full control of where to place the previous expression using as->.
Full details at https://clojure.org/guides/threading_macros
by duped on 4/21/25, 3:39 PM
One day, we'll (re)discover that partial application is actually incredibly useful for writing programs and (non-Haskell) languages will start with it as the primitive for composing programs instead of finding out that it would be nice later, and bolting on a restricted subset of the feature.
by SimonDorfman on 4/21/25, 12:41 PM
by amai on 4/21/25, 5:54 PM
by kordlessagain on 4/21/25, 12:51 PM
Pipelining can become hard to debug when chains get very long. The author doesn't address how hard it can be to identify which step in a long chain caused an error.
They do make fun of Python, however. But don't say much about why they don't like it other than showing a low-res photo of a rock with a pipe routed around it.
Ambiguity about what constitutes "pipelining" is the real issue here. The definition keeps shifting throughout the article. Is it method chaining? Operator overloading? First-class functions? The author uses examples that function very differently.
by epolanski on 4/21/25, 1:11 PM
Building pipelines:
https://effect.website/docs/getting-started/building-pipelin...
Using generators:
https://effect.website/docs/getting-started/using-generators...
Having both options is great (at the beginning effect had only pipe-based pipelines), after years of writing effect I'm convinced that most of the time you'd rather write and read imperative code than pipelines which definitely have their place in code bases.
In fact most of the community, at large, converged at using imperative-style generators over pipelines and having onboarded many devs and having seen many long-time pipeliners converging to classical imperative control flow seems to confirm both debugging and maintenance seem easier.
by vitus on 4/21/25, 9:36 PM
No longer do we have to explain that expressions are evaluated in the order of FROM -> JOIN -> ON -> SELECT -> WHERE -> GROUP BY -> HAVING -> ORDER BY -> LIMIT (and yes, I know I'm missing several other steps). We can simply just express how our data flows from one statement to the next.
(I'm also stating this as someone who has yet to play around with the pipelining syntax, but honestly anything is better than the status quo.)
by osigurdson on 4/21/25, 1:01 PM
by singularity2001 on 4/21/25, 1:39 PM
by 0xf00ff00f on 4/21/25, 2:36 PM
auto get_ids(std::span<const Widget> data)
{
return data
| filter(&Widget::alive)
| transform(&Widget::id)
| to<std::vector>();
}
by cutler on 4/21/25, 1:23 PM
by okayishdefaults on 4/21/25, 9:18 PM
Point-free style and pipelining were meant for each other. https://en.m.wikipedia.org/wiki/Tacit_programming
by mrkeen on 4/21/25, 1:07 PM
data.iter()
.filter(|w| w.alive)
.map(|w| w.id)
.collect()
collect(map(filter(iter(data), |w| w.alive), |w| w.id))
The second approach is open for extension - it allows you to write new functions on old datatypes.> Quick challenge for the curious Rustacean, can you explain why we cannot rewrite the above code like this, even if we import all of the symbols?
Probably for lack of
> weird operators like <$>, <*>, $, or >>=
by rocqua on 4/22/25, 5:16 AM
f.g = f(g(x))
Based on this, I think a reverse polish type of notation would be a lot better. Though perhaps it is a lot nicer to think of "the sine of an angle" than "angle sine-ed".Not that it matters much, the switching costs are immense. Getting people able to teach it would be impossible, and collaboration with people taught in the other system would be horrible. I am doubtful I could make the switch, even if I wanted.
by dapperdrake on 4/21/25, 1:47 PM
https://dspace.mit.edu/handle/1721.1/6035
https://dspace.mit.edu/handle/1721.1/6031
https://dapperdrake.neocities.org/faster-loops-javascript.ht...
by snthpy on 4/23/25, 9:41 PM
In fact I tried to make some similar points in my CMU "SQL or Death" Seminar Series talk on PRQL (https://db.cs.cmu.edu/events/sql-death-prql-pipelined-relati...) in that I would love to see PRQL (or something like it) become a universal DSL for data pipelines. Ideally this wouldn't even have to go through some query engine and could just do some (byte)codegen for your target language.
P.S. Since you mentioned the Google Pipe Syntax HYTRADBOI 2025 talk, I just want to throw out that I also have a 10 min version for the impatient: https://www.hytradboi.com/2025/deafce13-67ac-40fd-ac4b-175d5... That's just a PRQL overview though. The Universal Data Pipeline DSL ideas and comparison to LINQ, F#, ... are only in the CMU talk. I also go a bit into imperative vs declarative and point out that since "pipelining" is just function composition it should really be "functional" rather than imperative or declarative (which also came up in this thread).
by andyferris on 4/22/25, 3:20 AM
In fact, I always thought it would be a good idea for all statement blocks (in any given programming language) to allow an implicit reference to the value of the previous statement. The pipeline operation would essentially be the existing semicolons (in a C-like language) and there would be a new symbol or keyword used to represent the previous value.
For example, the MATLAB REPL allows for referring to the previous value as `ans` and the Julia REPL has inherited the same functionality. You can copy-paste this into the Julia REPL today:
[1, 2, 3];
map(x -> x * 2, ans);
@show ans;
filter(x -> x > 2, ans);
@show ans;
sum(ans)
You can't use this in Julia outside the REPL, and I don't think `ans` is a particularly good keyword for this, but I honestly think the concept is good enough. The same thing in JavaScript using `$` as an example: {
[1 ,2, 3];
$.map(x => x * 2);
(console.log($), $);
$.filter(x => x > 2);
(console.log($), $);
$.reduce((acc, next) => acc + next, 0)
}
I feel it would work best with expression-based languages having blocks that return their final value (like Rust) since you can do all sorts of nesting and so-on.by davemp on 4/22/25, 11:11 AM
by huyegn on 4/21/25, 8:46 PM
https://datapad.readthedocs.io/en/latest/quickstart.html#ove...
by weinzierl on 4/21/25, 3:55 PM
by RHSeeger on 4/21/25, 2:44 PM
fn get_ids(data: Vec<Widget>) -> Vec<Id> {
collect(map(filter(map(iter(data), |w| w.toWingding()), |w| w.alive), |w| w.id))
}
to fn get_ids(data: Vec<Widget>) -> Vec<Id> {
data.iter()
.map(|w| w.toWingding())
.filter(|w| w.alive)
.map(|w| w.id)
.collect()
}
The first one would read more easily (and, since it called out, diff better) fn get_ids(data: Vec<Widget>) -> Vec<Id> {
collect(
map(
filter(
map(iter(data), |w| w.toWingding()), |w| w.alive), |w| w.id))
}
Admittedly, the chaining is still better. But a fair number of the article's complaints are about the lack of newlines being used; not about chaining itself.by bjourne on 4/22/25, 9:50 AM
iter [ alive? ] filter [ id>> ] map collect
The beauty of this is that everything can be evaluated strictly left-to-right. Every single symbol. "Pipelines" in other languages are never fully left-to-right evaluated. For example, ".filter(|w| w.alive)" in the author's example requires one to switch from postfix to infix evaluation to evaluate the filter application.The major advantage is that handling multiple streams is natural. Suppose you want to compute the dot product of two files where each line contains a float:
fileA fileB [ lines [ str>float ] map ] bi@ [ mul ] 2map 0 [ + ] reduce
by relaxing on 4/21/25, 5:36 PM
Being able to inspect the results of each step right at the point you’ve written it is pretty convenient. It’s readable. And the compiler will optimize it out.
by flakiness on 4/21/25, 4:50 PM
by hliyan on 4/21/25, 1:49 PM
by EnPissant on 4/22/25, 3:53 AM
fn get_ids(data: Vec<Widget>) -> Vec<Id> {
let mut result = Vec::new();
for widget in &data {
if widget.alive {
result.push(widget.id);
}
}
result
}
more readable than this: fn get_ids(data: Vec<Widget>) -> Vec<Id> {
data.iter()
.filter(|w| w.alive)
.map(|w| w.id)
.collect()
}
and I also dislike Rust requiring you to write "mut" for function mutable values. It's mostly just busywork and dogma.by otsukare on 4/21/25, 5:19 PM
by neuroelectron on 4/21/25, 9:14 PM
by layer8 on 4/21/25, 4:12 PM
by 1899-12-30 on 4/21/25, 1:36 PM
by immibis on 4/22/25, 2:15 PM
x = iter(data);
y = filter(x, w=>w.isAlive);
z = map(y, w=>w.id);
return collect(z);
It doesn't need new syntax, but to implement this with the existing syntax you do have to figure out what the intermediate objects are, but you also have that problem with "pipelining" unless it compiles the whole chain into a single thing a la Linq.
by shae on 4/21/25, 12:45 PM
This is my biggest complaint about Python.
by wavemode on 4/21/25, 5:31 PM
The tone of this (and the entire Haskell section of the article, tbh) is rather strange. Operators aren't special syntax and they aren't "added" to the language. Operators are just functions that by default use infix position. (In fact, any function can be called in infix position. And operators can be called in prefix position.)
The commit in question added & to the prelude. But if you wanted & (or any other character) to represent pipelining you have always been able to define that yourself.
Some people find this horrifying, which is a perfectly valid opinion (though in practice, when working in Haskell it isn't much of a big deal if you aren't foolish with it). But at least get the facts correct.
by mexicocitinluez on 4/21/25, 12:48 PM
by pxc on 4/21/25, 4:06 PM
by jmyeet on 4/22/25, 12:07 AM
$x = vec[2,1,3]
|> Vec\map($$, $a ==> $a * $a) // $$ with value vec[2,1,3]
|> Vec\sort($$); // $$ with value vec[4,1,9]
It is a nice feature. I do worry about error reporting with any feature that combines multiple statements into a single statement, which is essentially what this does. In Java, there was always an issue with NullPointerExceptiosn being thrown and if you chain several things together you're never sure which one was null.[1]: https://docs.hhvm.com/hack/expressions-and-operators/pipe
by rokob on 4/22/25, 1:22 AM
by raggi on 4/21/25, 10:27 PM
Um, you can:
#![feature(import_trait_associated_functions)]
use Iterator::{collect, map, filter};
fn get_ids2(data: Vec<usize>) -> Vec<usize> {
collect(map(filter(<[_]>::iter(&data), |v| ...), |v| ...))
}
and you can because it's lazy, which is also the same reason you can write it the other way.. in rust. I think the author was getting at an ownership trap, but that trap is avoided the same way for both arrangements, the call order is the same in both arrangements. If the calls were actually a pipeline (if collect didn't exist and didn't need to be called) then other considerations show up.by jiggawatts on 4/22/25, 4:23 AM
For comparison, UNIX pipes support only trivial byte streams from output to input.
PowerShell allows typed object streams where the properties of the object are automatically wired up to named parameters of the commands on the pipeline.
Outputs at any stage can not only be wired directly to the next stage but also captured into named variables for use later in the pipeline.
Every command in the pipeline also gets begin/end/cancel handlers automatically invoked so you can set up accumulators, authentication, or whatever.
UNIX scripting advocates don’t know what they’re missing out on…
by TrianguloY on 4/21/25, 2:00 PM
a().let{ b(it) }.let{ c(it) }
by _heimdall on 4/22/25, 3:49 AM
by zelphirkalt on 4/21/25, 1:05 PM
Pipelining can guide one to write a bit cleaner code, viewing steps of computation as such, and not as modifications of global state. It forces one to make each step return a result, write proper functions. I like proper pipelining a lot.
by chewbacha on 4/21/25, 1:21 PM
by taeric on 4/21/25, 4:08 PM
This is far different than the pattern described in the article, though. Small shame they have come to have the same name. I can see how both work with the metaphor; such that I can't really complain. The "pass a single parameter" along is far less attractive to me, though.
by stuaxo on 4/21/25, 11:20 PM
by true_blue on 4/21/25, 5:17 PM
by dpc_01234 on 4/21/25, 7:54 PM
BTW. For people complaining about debug-ability of it: https://doc.rust-lang.org/std/iter/trait.Iterator.html#metho... etc.
by XorNot on 4/21/25, 10:35 PM
You have a create_user function that doesn't error? Has no branches based on type of error?
We're having arguments over the best way break these over multiple lines?
Like.. why not just store intermediate results in variables? Where our branch logic can just be written inline? And then the flow of data can be very simply determined by reading top to bottom?
by amelius on 4/21/25, 1:31 PM
Instead of writing: a().b().c().d(), it's much nicer to write: d(c(b(a()))), or perhaps (d ∘ c ∘ b ∘ a)().
by kissgyorgy on 4/22/25, 9:54 AM
gitRef = with lib;
pipe .git/HEAD [
readFile
trim
(splitString ":")
last
trim
(ref: ./git/${ref})
readFile
trim
];
Super clean and cool!by middayc on 4/22/25, 1:47 AM
by jesse__ on 4/22/25, 3:58 AM
by bcoates on 4/22/25, 1:10 AM
from customer
left join orders on c_custkey = o_custkey and o_comment not like '%unusual%'
group by c_custkey
alias count(o_orderkey) as count_of_orders
group by count_of_orders
alias count(*) as count_of_customers
order by count_of_customers desc
select count_of_customers, count_of_orders;
I'm using 'alias' here as a strawman keyword for what the slide deck calls a free-standing 'as' operator because you can't reuse that keyword, it makes the grammar a mess.The aliases aren't really necessary, you could just write the last line as 'select count(count(*)) ncust, count(*) nord' if you aren't afraid of nested aggregations, and if you are you'll never understand window functions, soo...
The |> syntax adds visual noise without expressive power, and the novelty 'aggregate'/'call' operators are weird special-case syntax for something that isn't that complex in the first place.
The implicit projection is unnecessary too, for the same reason any decent SQL linter will flag an ambiguous 'select *'
by moralestapia on 4/22/25, 2:50 PM
Anyway, JS wins again, give it a try if you haven't, it's one of the best languages out there.
by drchickensalad on 4/21/25, 12:52 PM
by kuon on 4/21/25, 12:52 PM
The |> operator is really cool.
by ZYbCRq22HbJ2y7 on 4/21/25, 9:48 PM
by bluSCALE4 on 4/21/25, 4:17 PM
by jongjong on 4/21/25, 10:11 PM
One difference is that currying returns an incomplete result (another function) which must be called again at a later time. On the other hand, pipelining usually returns raw values. Currying returns functions until the last step. The main philosophical failure of currying is that it treats logic/functions as if they were state which should be passed around. This is bad. Components should be responsible for their own state and should just talk to each other to pass plain information. State moves, logic doesn't move. A module shouldn't have awareness of what tools/logic other modules need to do their jobs. This completely breaks the separation of concerns principle.
When you call a plumber to fix your drain, do you need to provide them with a toolbox? Do you even need to know what's inside their toolbox? The plumber knows what tools they need. You just show them what the problem is. Passing functions to another module is like giving a plumber a toolbox which you put together by guessing what tools they might need. You're not a plumber, why should you decide what tools the plumber needs?
Currying encourages spaghetti code which is difficult to follow when functions are passed between different modules to complete the currying. In practice, if one can design code which gathers all the info it needs before calling the function once; this leads to much cleaner and much more readable code.
by Weryj on 4/22/25, 6:35 AM
by jaymbo on 4/21/25, 12:42 PM
by joeevans1000 on 4/21/25, 4:46 PM
by wslh on 4/21/25, 2:30 PM
A
.B
.C
|| D
|| E
by tantalor on 4/21/25, 2:34 PM
I have no idea what this is trying to say, or what it has to do with the rest of the article.
by guerrilla on 4/21/25, 3:30 PM
by HackerThemAll on 4/22/25, 5:04 PM
by tpoacher on 4/21/25, 4:12 PM
... looking at you R and tidyverse hell.
by blindseer on 4/21/25, 1:48 PM