by marcyb5st on 2/10/25, 10:38 AM with 113 comments
by jakozaur on 2/13/25, 4:06 PM
Still, the best is yet to come. Previously, SQL extensions were a pain. There was no good place, and table-value functions were a mess.
Now, it would be possible to have higher-order functions such as enrichment, predictions, grouping or other data contracts. Example:
FROM orders
|> WHERE order_date >= '2024-01-01'
|> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id
|> WHERE total_spent > 1000
|> INNER JOIN customers USING(customer_id)
|> CALL ENRICH.APOLLO(EMAIL > customers.email)
|> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country
This may be called one SQL to determine distinct e-mail domains, then prepare an enriching dataset and later execute the final SQL with JOIN.Iterative SQL with pipes may also work better with GenAI.
by willvarfar on 2/13/25, 3:17 PM
https://sqlite.org/forum/forumpost/5f218012b6e1a9db
(Note that the fiddle linked no longer supports the syntax)
It is very interesting that they found it completely unnecessary to actually require the pipe characters. The grammar works when the pipe characters are optional.
And, imo, looks a lot better!
by notpushkin on 2/13/25, 2:18 PM
from invoices
filter invoice_date >= @1970-01-16
derive {
transaction_fees = 0.8,
income = total - transaction_fees
}
filter income > 1
by jmull on 2/13/25, 3:05 PM
Of course, SQL has numerous issues, both in an absolute sense and relative to what we've come to expect. And the obvious course of action in each individual case is to add syntax to support the missing feature.
But as you keep adding syntax, SQL gets more and more complex, both in the individual variants and due to the complexities of multiple variants with different syntax support at different times.
A transpiling approach makes more sense to me. I'd like the sql implementors to focus on source maps and other things to better support plugging in external, alternate syntaxes (is a standardized mechanism too much to ask for?).
Then individual projects/people/efforts can choose the SQL syntax variant that works for them, and it can be one that evolves separate from the host db.
by Taikonerd on 2/13/25, 2:19 PM
Their syntax is a lot cleaner, because it's a new language -- it's not backwards-compatible with SQL. But then again, they don't have the resources of Google behind them.
by nicoritschel on 2/13/25, 2:52 PM
by mccanne on 2/13/25, 3:12 PM
by Xmd5a on 2/13/25, 4:01 PM
(-> (select :a :b :c)
(from :foo)
(where [:= :foo.a "baz"]))
=> {:select [:a :b :c] :from [:foo] :where [:= :foo.a "baz"]}
Effort: zero (0). That's what "simple made easy" is about.by code_runner on 2/13/25, 9:49 PM
by mwexler on 2/13/25, 3:11 PM
Malloy is from Lloyd Tabb, a co-founder of Looker.
by pjmlp on 2/13/25, 2:13 PM
by ajfriend on 2/13/25, 4:00 PM
import duckboat as uck
csv = 'https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv'
uck.Table(csv).do(
"where sex = 'female' ",
'where year > 2008',
'select *, cast(body_mass_g as double) as grams',
'select species, island, avg(grams) as avg_grams group by 1,2',
'select * replace (round(avg_grams, 1) as avg_grams)',
'order by avg_grams',
)
I still can't tell if it's too goofy, or if I really like it. :)I write a lot of SQL anyway, so this approach is nice in that I find I almost never need to look up function syntax like I would with Pandas, since it is just using DuckDB SQL under the hood, but removing the need to write `select * from ...` repeatedly. And when you're ready to exit the data exploration phase, its easy to gradually translate things back to "real SQL".
The whole project is pretty small, essentially just a light wrapper around DuckDB to do this expression chaining and lazy evaluation.
by ralmidani on 2/13/25, 7:51 PM
The moral of the story? Let’s give this new SQL syntax a chance.
by epistasis on 2/13/25, 2:52 PM
I first encountered this style of data manipulation in R's tidyverse. Some say C# (or F#?)had similar pipes and influences, but I haven't seen specifics.
Some other data libraries like Polars have sort of similar parse trees, but they also have lots of extra cruft because of limitations to the underlying language. Python's functional calling semantics are extremely powerful, but not quite as powerful as R's.
by tdfirth on 2/13/25, 7:49 PM
Sadly it's a few decades too late though, and sadly this just fragments the "ecosystem" further.
by xiphias2 on 2/13/25, 8:15 PM
It would be great to have it standardized fast and implemented everywhere.
by tqi on 2/13/25, 7:20 PM
Personally, I continue to prefer CTEs because it allows me to write queries in a more modular way, and in conjunction with descriptive names helps me to keep less context in my head when working out the logic. When I look at a pipe syntax query, it almost feels like the difference between reading a long paragraph vs bullet points?
by perdomon on 2/13/25, 5:03 PM
by ejcx on 2/13/25, 8:07 PM
I know prql exists, but the syntax is pretty cumbersome and not something I enjoyed writing, but I do understand why folks would gravitate towards it
by lordofgibbons on 2/13/25, 4:08 PM
Hopefully, it gets adopted in Postgresql too.
by dangoodmanUT on 2/13/25, 5:35 PM
by beoberha on 2/13/25, 3:08 PM
This syntax from Google is nice but it’s still just SQL.
by sweeter on 2/13/25, 11:10 PM
by fforflo on 2/13/25, 7:58 PM
by Taikonerd on 2/13/25, 2:17 PM
by schultzer on 2/13/25, 3:09 PM
Why are SQL parsers even concerned with that? Isn’t that why we have RD parser.
by iddan on 2/14/25, 7:13 AM
by dnst on 2/13/25, 8:30 PM
by zendist on 2/13/25, 8:19 PM
by brikym on 2/14/25, 5:53 AM
by whalesalad on 2/13/25, 1:45 PM
by eb0la on 2/13/25, 10:14 PM
by etaham on 2/14/25, 3:46 AM
by gigatexal on 2/13/25, 6:37 PM
by data-ottawa on 2/13/25, 4:46 PM
Doing data exploration, analysis, and cleaning, this is way more productive than just SQL. As an example, fusing the aggregate functions with the group by keys creates a much faster workflow. I like that it unifies WHERE/HAVING/QUALIFY, and the set/extend/drop functions help (though Snowflake still wins for being able to declare a column and use it in the same select). Ultimately this gives me a one liner for situations where I’m creating hard to name intermediate CTEs, and that’s awesome!
Iterating on array columns or parsing non-trivial JSON is much better with this syntax too.
This is a shift back to the data frame API of a few years ago, but we’re still missing typing helpers and support that data frame APIs could provide. It would be nice to have a system like this which plugs into language servers, or can list fields at each step with a mouse over/cursor hover, and getting editor completions (the BQ console does an okay job).
This syntax is great for DBT macros. You can just drop in entire transforms without worrying about polluting the query namespace or working around existing columns on your reference tables.
There’s a dark side to this syntax. The imperative coding style this comes with a tradeoff that the reader needs to track internal state through many steps. It’s the same reason why SELECT * is often not recommended.
As a best practice I like to throw a `|> SELECT X,Y,Z` at the end of these pipe blocks to reinforce to the reader what the output looks like.
I should mention that it’s not as portable, but frankly all the major DBs aren’t portable, and other DBs also now support this syntax.
tl;dr: I like this feature, but use it sparingly. Avoid overuse in model files, definitely lean on it in analyses.