from Hacker News

Show HN: MetricFlow – open-source metric framework

by nicholashandel on 4/6/22, 10:12 PM with 26 comments

Hi HN community, I’m Nick, co-founder/CEO of Transform.co. I’m thrilled to share MetricFlow, an open-source metric creation framework: https://github.com/transform-data/metricflow

MetricFlow strives to make what has historically been an extremely repetitive process, writing SQL queries on core normalized data models, much more DRY. MetricFlow consolidates the definitions for joins, aggregations, filters, etc., and programmatically generates SQL to construct data marts. You can think of it like LookML, but more powerful and ergonomic (and open source!). The project has three components:

1. MetricFlow Spec: The specification encapsulates metric logic in a more reusable set of abstractions: data_sources, measures, dimensions, identifiers, metrics, and materializations.

2. DataFlow Planner: The Query Planner is a generalized SQL constructor. We take in data sources (ideally normalized data models) and generate a graph of data transformations (a flow, if you will) – joins, aggregations, filters, etc. We take that graph and render it down to db-specific SQL while optimizing it for performance and legibility.

3. MetricFlow Interfaces: The CLI and Python SDK rely on the flexibility of the Spec and Planner to build just about any query you could ask for on top of your data warehouse.

These components enable novel features that other semantic layers struggle to support today:

- MetricFlow enables the user to traverse the entire graph of a company’s data warehouse without confining their analysis to pre-built data models (dbt), Explores (in Looker), or Cubes (in lots of tools).

- The Metric abstraction allows the construction of complex metrics that traverse the graph described above to rely on multiple data sources. We support several common metric types today, and adding more is a critical part of the open-source roadmap.

- The Materialization abstraction allows users to define and then programmatically generate data marts that rely on a single DRY expression of the metrics and dimensions.

MetricFlow is open source(https://github.com/transform-data/metricflow) and distributed through pypi (`pip install metricflow`). You can set up (`mf setup`) a set of sample configs and try out a tutorial (`mf tutorial). The docs are all here(https://docs.transform.co/docs/overview/metricflow-overview). We’d love contributions on GitHub. We’re adding new Issues to share our roadmap in the coming days, but feel free to open your own.

We’re also opening up a Slack community(https://community.transform.co/metricflow-signup) to talk about the project and, more generally, metric tooling.

Let us know what you think – we’ll be here answering any questions!

  • by mjirv on 4/7/22, 12:21 AM

    This is awesome, though I would love some more detail in the documentation.

    What’s the quick pitch for why I should use this instead of Cube or dbt’s metrics layer?

  • by seektable on 4/7/22, 7:33 AM

    It looks like MetricFlow shines in constructing SQL queries on-demand, which means that it should be directly used by a BI tool, am I right with this?.. Generation of the static SQL (with CLI) for each report doesn't seem very usable on practice.

    In other words, BI tools needs to have a special connector that automatically utilizes MetricFlow Python API (or CLI). What BI tools already can use MetricFlow in this way (open-source part of the project)?

    Actually I'm asking about that as a BI tool vendor. We have added an ability to use custom connectors (web API) so potentially this kind of connector can use MetricFlow for SQL generation.

  • by theboat on 4/12/22, 2:02 AM

    Thank you for open sourcing this. More competition in the budding metrics ecosystem is good for end users.

    It seems like you think MetricFlow should be the data mart layer and not just the metrics layer. If that's true...why? Why would I join my fact and dimension tables in metricflow instead of in dbt? One of the value adds of dbt is that it centralizes business logic in a single place. Joins are business logic. The industry seems to be moving towards creating very wide data mart tables in dbt and surfacing them to the semantic layer 1:1, or building the metrics layer on top of them.

  • by XCSme on 4/8/22, 12:57 PM

    It's a bit unclear how would I go with integrating MetricFlow with https://uxwizz.com for example that uses a MySQL database to store analytics data. From the docs, I don't really understand how it actually "understands" the underlying SQL database and how to retrieve the data I need. It feels like I have to write the query to get the data I want, but in a different syntax. Is there any point to use MetricFlow if you only have one data source?
  • by pstoll on 4/6/22, 11:17 PM

    Cool. Like open source Looker.

    We adopted Looker at $previous_job. Then they got bought by Google, which was great for us as we were becoming a big GCP customer. I strongly encouraged google / looker team to at least open source their LookMl (looker modeling language - equivalent to MQL). They couldn’t figure it out.

    This type of metric definition is so empowering for businesses. Not enough engineers grok why this is useful.

  • by mritchie712 on 4/7/22, 12:18 PM

    Is MetricFlow what Transform (the product) uses under the hood?
  • by awinter-py on 4/6/22, 11:47 PM

    love this as an area of innovation

    my wishlist item for 'standard metrics definitions' is for libraries + servers to ship with a spec of what they export

    so that if I'm using, for example, a plugin for a reverse proxy, or a twilio verification library, it can install its own metrics and alerts in my dashboard system

  • by moltar on 4/7/22, 12:35 AM

    How does it compare and plans to compete with dbt metrics?