from Hacker News

SQL pipe syntax available in public preview in BigQuery

by marcyb5st on 2/10/25, 10:38 AM with 113 comments

by jakozaur on 2/13/25, 4:06 PM
The SQL with pipe syntax is also been implemented at Databricks since Jan 30, 2025: https://docs.databricks.com/en/sql/language-manual/sql-ref-s...
Still, the best is yet to come. Previously, SQL extensions were a pain. There was no good place, and table-value functions were a mess.
Now, it would be possible to have higher-order functions such as enrichment, predictions, grouping or other data contracts. Example:
```
  FROM orders
  |> WHERE order_date >= '2024-01-01'
  |> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
  |> WHERE total_spent > 1000
  |> INNER JOIN customers USING(customer_id)
  |> CALL ENRICH.APOLLO(EMAIL > customers.email)
  |> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country
```
This may be called one SQL to determine distinct e-mail domains, then prepare an enriching dataset and later execute the final SQL with JOIN.
Iterative SQL with pipes may also work better with GenAI.
by willvarfar on 2/13/25, 3:17 PM
When the pipe syntax was first published last year the SQLite team did a quick hack to try it out.
https://sqlite.org/forum/forumpost/5f218012b6e1a9db
(Note that the fiddle linked no longer supports the syntax)
It is very interesting that they found it completely unnecessary to actually require the pipe characters. The grammar works when the pipe characters are optional.
And, imo, looks a lot better!

by notpushkin on 2/13/25, 2:18 PM

PRQL is a similar idea, and it compiles to SQL: https://prql-lang.org/

    from invoices
    filter invoice_date >= @1970-01-16
    derive {
      transaction_fees = 0.8,
      income = total - transaction_fees
    }
    filter income > 1

by jmull on 2/13/25, 3:05 PM
I like it, but I'm not sure it's worth the trouble of further expanding an ever-ballooning extended family of SQL syntaxes.
Of course, SQL has numerous issues, both in an absolute sense and relative to what we've come to expect. And the obvious course of action in each individual case is to add syntax to support the missing feature.
But as you keep adding syntax, SQL gets more and more complex, both in the individual variants and due to the complexities of multiple variants with different syntax support at different times.
A transpiling approach makes more sense to me. I'd like the sql implementors to focus on source maps and other things to better support plugging in external, alternate syntaxes (is a standardized mechanism too much to ask for?).
Then individual projects/people/efforts can choose the SQL syntax variant that works for them, and it can be one that evolves separate from the host db.
by Taikonerd on 2/13/25, 2:19 PM
Relatedly, PRQL is a lovely pipe-oriented syntax for SQL DBs: https://prql-lang.org/
Their syntax is a lot cleaner, because it's a new language -- it's not backwards-compatible with SQL. But then again, they don't have the resources of Google behind them.
by nicoritschel on 2/13/25, 2:52 PM
Also is in DuckDB https://github.com/ywelsch/duckdb-psql
by mccanne on 2/13/25, 3:12 PM
Really cool though typing ">" after "|" is a pain https://github.com/brimdata/super/blob/main/docs/language/pi...

by Xmd5a on 2/13/25, 4:01 PM

https://github.com/seancorfield/honeysql#vanilla-sql-clause-...

    (-> (select :a :b :c)
        (from :foo)
        (where [:= :foo.a "baz"]))
    => {:select [:a :b :c] :from [:foo] :where [:= :foo.a "baz"]}

Effort: zero (0). That's what "simple made easy" is about.

by code_runner on 2/13/25, 9:49 PM
I appreciate that somebody somewhere may appreciate and enjoy this, but I am not that person. I love SQL. I have always loved SQL. I know why others don't, but I do and its beautiful.
by mwexler on 2/13/25, 3:11 PM
Though not a pipe syntax, the Malloy language has some similar analytic sugar in its syntax.
Malloy is from Lloyd Tabb, a co-founder of Looker.
https://www.malloydata.dev/
by pjmlp on 2/13/25, 2:13 PM
Ever since using Kusto Query Language, that I look forward to SQL getting something like this, maybe there is hope if enough DBs have it as an extension.
by ajfriend on 2/13/25, 4:00 PM
I have project that's still very much at the experimental stage, where I try to get something similar to this pipe syntax by allowing users to chain "SQL snippets" together. That is, you can use standalone statements like `where col1 > 10` because the `select * from ...` is implied. https://ajfriend.github.io/duckboat/
```
    import duckboat as uck

    csv = 'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv'

    uck.Table(csv).do(
        "where sex = 'female' ",
        'where year > 2008',
        'select *, cast(body_mass_g as double) as grams',
        'select species, island, avg(grams) as avg_grams group by 1,2',
        'select * replace (round(avg_grams, 1) as avg_grams)',
        'order by avg_grams',
    )
```
I still can't tell if it's too goofy, or if I really like it. :)
I write a lot of SQL anyway, so this approach is nice in that I find I almost never need to look up function syntax like I would with Pandas, since it is just using DuckDB SQL under the hood, but removing the need to write `select * from ...` repeatedly. And when you're ready to exit the data exploration phase, its easy to gradually translate things back to "real SQL".
The whole project is pretty small, essentially just a light wrapper around DuckDB to do this expression chaining and lazy evaluation.
by ralmidani on 2/13/25, 7:51 PM
The first time I looked at Elixir, I thought it was hideous (I was really into Python and Ruby, and still admire them while liking Elixir much more). But I kept hearing about the virtues of Elixir (and of the BEAM) and coming back. Finally, it clicked, and I’m now in my 3rd year of doing Elixir full-time. I hope to never again use a language that doesn’t have pipes.
The moral of the story? Let’s give this new SQL syntax a chance.
by epistasis on 2/13/25, 2:52 PM
This is beautiful.
I first encountered this style of data manipulation in R's tidyverse. Some say C# (or F#?)had similar pipes and influences, but I haven't seen specifics.
Some other data libraries like Polars have sort of similar parse trees, but they also have lots of extra cruft because of limitations to the underlying language. Python's functional calling semantics are extremely powerful, but not quite as powerful as R's.
by tdfirth on 2/13/25, 7:49 PM
It should have always worked this way. Without this feature you take the algebra out of relational algebra. That's the root of most of the composition issues in SQL.
Sadly it's a few decades too late though, and sadly this just fragments the "ecosystem" further.
by xiphias2 on 2/13/25, 8:15 PM
One great feature of the SQL pipe syntax is that it works much better together with programming languages (generating, or modifying existing SQL queries).
It would be great to have it standardized fast and implemented everywhere.
by tqi on 2/13/25, 7:20 PM
If this helps people write queries more easily, I think that is unequivocally a great thing and have no reservations.
Personally, I continue to prefer CTEs because it allows me to write queries in a more modular way, and in conjunction with descriptive names helps me to keep less context in my head when working out the logic. When I look at a pipe syntax query, it almost feels like the difference between reading a long paragraph vs bullet points?
by perdomon on 2/13/25, 5:03 PM
This is great. It's a more linear workflow that mimics how we filter data in our minds. How long until it's available in MySQL (never, probably)
by ejcx on 2/13/25, 8:07 PM
We made pql.dev that works with the different sql syntaxes by translating kusto like queries to sql (using CTE). It's worked really well thusfar and I wish someone would make a standard pipelined query language that gets supported across most databases
I know prql exists, but the syntax is pretty cumbersome and not something I enjoyed writing, but I do understand why folks would gravitate towards it
by lordofgibbons on 2/13/25, 4:08 PM
I love it. Why did it take us so long to make SQL easier to reason about?
Hopefully, it gets adopted in Postgresql too.
by dangoodmanUT on 2/13/25, 5:35 PM
IMO pipe syntax is so much better. Not necessarily this version, but in general
by beoberha on 2/13/25, 3:08 PM
Somehow Microsoft (my employer) has totally failed in evangelizing Kusto to the masses. KQL is by far the best query language you’ll ever use.
This syntax from Google is nice but it’s still just SQL.
by sweeter on 2/13/25, 11:10 PM
The Unix pipe paradigm was honestly way ahead of it's time.
by fforflo on 2/13/25, 7:58 PM
I'm all for such an approach, but I also like Firebolt's approach with Lambda functions focusing on arrays. [0]
[0] https://www.firebolt.io/blog/sql-thinking-in-lambdas
by Taikonerd on 2/13/25, 2:17 PM
This is so exciting. I hope Postgres and MySQL get some level of support for it, too.
by schultzer on 2/13/25, 3:09 PM
When I first started working in Elixir I loved Ecto and the pipe syntax, but now I want a SQL sigil, that wont care about where I put my where or from clauses.
Why are SQL parsers even concerned with that? Isn’t that why we have RD parser.
by iddan on 2/14/25, 7:13 AM
This is such an improvement. Can’t wait for it to get into Postgres
by dnst on 2/13/25, 8:30 PM
It seems unfortunate that InfluxDB had the pipe operator with flux in version 2 and dropped it the new version due to low adoption. Now it seems to become more popular.
by zendist on 2/13/25, 8:19 PM
This reminds me a bit of MSFTs Kusto language. Such an immensely useful way to slice and dice large amounts of structured data.
by brikym on 2/14/25, 5:53 AM
As a user of Kusto query language in Azure I can highly recommend pipe syntax. The autocomplete is very good.
by whalesalad on 2/13/25, 1:45 PM
Who invented pipe syntax? Elixir? Or is there another runtime or language that made this prevalent beforehand?
by eb0la on 2/13/25, 10:14 PM
First thing I thought is "this is a SELECT * in disguise", which is something that you SHOULD never do in BigQuery. If you can combine it with CTEs, seems good because it adds legibility for complex queries. Also looks easier for the planner. Not for the optimizer, though.
by etaham on 2/14/25, 3:46 AM
KQL has this as well. Quite helpful and can make it easier to show intent.
by gigatexal on 2/13/25, 6:37 PM
I think the syntax is awesome. Too bad I’m using snowflake at the $DAYJOB
by data-ottawa on 2/13/25, 4:46 PM
I’ve been using this for a bit more than a week already, I would say this is a great feature for iteration and interactive queries. Here’s my review:
Doing data exploration, analysis, and cleaning, this is way more productive than just SQL. As an example, fusing the aggregate functions with the group by keys creates a much faster workflow. I like that it unifies WHERE/HAVING/QUALIFY, and the set/extend/drop functions help (though Snowflake still wins for being able to declare a column and use it in the same select). Ultimately this gives me a one liner for situations where I’m creating hard to name intermediate CTEs, and that’s awesome!
Iterating on array columns or parsing non-trivial JSON is much better with this syntax too.
This is a shift back to the data frame API of a few years ago, but we’re still missing typing helpers and support that data frame APIs could provide. It would be nice to have a system like this which plugs into language servers, or can list fields at each step with a mouse over/cursor hover, and getting editor completions (the BQ console does an okay job).
This syntax is great for DBT macros. You can just drop in entire transforms without worrying about polluting the query namespace or working around existing columns on your reference tables.
There’s a dark side to this syntax. The imperative coding style this comes with a tradeoff that the reader needs to track internal state through many steps. It’s the same reason why SELECT * is often not recommended.
As a best practice I like to throw a `|> SELECT X,Y,Z` at the end of these pipe blocks to reinforce to the reader what the output looks like.
I should mention that it’s not as portable, but frankly all the major DBs aren’t portable, and other DBs also now support this syntax.
tl;dr: I like this feature, but use it sparingly. Avoid overuse in model files, definitely lean on it in analyses.