by nicholashandel on 4/6/22, 10:12 PM with 26 comments
MetricFlow strives to make what has historically been an extremely repetitive process, writing SQL queries on core normalized data models, much more DRY. MetricFlow consolidates the definitions for joins, aggregations, filters, etc., and programmatically generates SQL to construct data marts. You can think of it like LookML, but more powerful and ergonomic (and open source!). The project has three components:
1. MetricFlow Spec: The specification encapsulates metric logic in a more reusable set of abstractions: data_sources, measures, dimensions, identifiers, metrics, and materializations.
2. DataFlow Planner: The Query Planner is a generalized SQL constructor. We take in data sources (ideally normalized data models) and generate a graph of data transformations (a flow, if you will) – joins, aggregations, filters, etc. We take that graph and render it down to db-specific SQL while optimizing it for performance and legibility.
3. MetricFlow Interfaces: The CLI and Python SDK rely on the flexibility of the Spec and Planner to build just about any query you could ask for on top of your data warehouse.
These components enable novel features that other semantic layers struggle to support today:
- MetricFlow enables the user to traverse the entire graph of a company’s data warehouse without confining their analysis to pre-built data models (dbt), Explores (in Looker), or Cubes (in lots of tools).
- The Metric abstraction allows the construction of complex metrics that traverse the graph described above to rely on multiple data sources. We support several common metric types today, and adding more is a critical part of the open-source roadmap.
- The Materialization abstraction allows users to define and then programmatically generate data marts that rely on a single DRY expression of the metrics and dimensions.
MetricFlow is open source(https://github.com/transform-data/metricflow) and distributed through pypi (`pip install metricflow`). You can set up (`mf setup`) a set of sample configs and try out a tutorial (`mf tutorial). The docs are all here(https://docs.transform.co/docs/overview/metricflow-overview). We’d love contributions on GitHub. We’re adding new Issues to share our roadmap in the coming days, but feel free to open your own.
We’re also opening up a Slack community(https://community.transform.co/metricflow-signup) to talk about the project and, more generally, metric tooling.
Let us know what you think – we’ll be here answering any questions!
by mjirv on 4/7/22, 12:21 AM
What’s the quick pitch for why I should use this instead of Cube or dbt’s metrics layer?
by seektable on 4/7/22, 7:33 AM
In other words, BI tools needs to have a special connector that automatically utilizes MetricFlow Python API (or CLI). What BI tools already can use MetricFlow in this way (open-source part of the project)?
Actually I'm asking about that as a BI tool vendor. We have added an ability to use custom connectors (web API) so potentially this kind of connector can use MetricFlow for SQL generation.
by theboat on 4/12/22, 2:02 AM
It seems like you think MetricFlow should be the data mart layer and not just the metrics layer. If that's true...why? Why would I join my fact and dimension tables in metricflow instead of in dbt? One of the value adds of dbt is that it centralizes business logic in a single place. Joins are business logic. The industry seems to be moving towards creating very wide data mart tables in dbt and surfacing them to the semantic layer 1:1, or building the metrics layer on top of them.
by XCSme on 4/8/22, 12:57 PM
by pstoll on 4/6/22, 11:17 PM
We adopted Looker at $previous_job. Then they got bought by Google, which was great for us as we were becoming a big GCP customer. I strongly encouraged google / looker team to at least open source their LookMl (looker modeling language - equivalent to MQL). They couldn’t figure it out.
This type of metric definition is so empowering for businesses. Not enough engineers grok why this is useful.
by mritchie712 on 4/7/22, 12:18 PM
by awinter-py on 4/6/22, 11:47 PM
my wishlist item for 'standard metrics definitions' is for libraries + servers to ship with a spec of what they export
so that if I'm using, for example, a plugin for a reverse proxy, or a twilio verification library, it can install its own metrics and alerts in my dashboard system
by moltar on 4/7/22, 12:35 AM