by chotmat on 9/3/21, 7:07 AM with 19 comments
by erosenbe0 on 9/5/21, 3:42 AM
Cube means you can answer questions across dimensions of time, region, and the like. Temporal queries are not always simple.
You don't query your live transactional db to answer a question like, "are third week of February margins on Acme Widget Q better in place A or B."
That's really it.
by rstuart4133 on 9/5/21, 10:03 AM
That changed recently. I was asked to load some data into a OLAP package. Nothing special about that, but it meant I got to sit in on the training session for the people using it. They were the same people our IT had to help on a daily basis with the simplest of things. Yet within an hour it had clicked, and they were happily exploring how each dimension in many dimensional data effected the others.
That was my light bulb moment. The reason OLAP is still a thing 20 years later isn't because of some special algorithm, or data structure - it's because it's a unique GUI for exploring very complex data that non-computer find intuitive. That's my take on it anyway.
by rahimnathwani on 9/5/21, 1:30 AM
A) rows and columns
B) values
Dimensions are things you might include in A.
Facts (or measures) are things you can aggregate, and would be part of B.
In the old days, you often couldn't quickly (on demand) compute the answer to a question, due to memory and processing limitations.
But, if you could decide ahead of time what dimensions and facts you wanted, you could compute all the little slices, e.g. value of sales in the north region by salesperson Bob in month July 1996.
Then, when you needed some particular pivot table, your software could provide it to you just by summing these precomputed slices.
The OL in OLAP stands for 'online', i.e. you'd do your analytical processing ('AP') live, rather than waiting for some long-running batch job.
by mr_toad on 9/5/21, 2:33 AM
by sradman on 9/5/21, 3:46 AM
by gfody on 9/5/21, 1:53 AM
by vinay_ys on 9/5/21, 11:57 AM
In my experience, people often underestimate the continuous effort to maintain the the Kimball's "Enterprise Data Warehouse Bus Architecture" diagram, even with more powerful machines and modern distributed tooling.
In today's fast evolving Internet apps world, the data use cases and scenarios are very fast evolving. That brings its own set of challenges.
Having good usable tools for managing the lifecycle of entity or event definitions, their variants like emitted/logged vs cleaned/processed/synthesized, their data quality checks etc and ensuring they are easily discoverable and understandable by everyone in the org is super crucial and it is significantly under-appreciated.
Usually, strong systems engineers who are in charge of the data platform focus on building the data infra (job scheduling, data pipelines execution, storage etc) but the crucial work of defining the data dictionaries, event or entity models etc are left out. The data producers and data consumers who are spread out throughout the organization have to muddle through this on their own without any centralized tooling to support this activity. These make data use very difficult and siloed.
Usually, there would be a team of BI analysts who are tasked to get some answers out of the data for the questions asked of them by various data users. Funnily these analysts are also working in silos assigned to those different data users. Inevitably, they become the super-inefficient intermediary between the data users and the data insights.
The pre-cooked data insights are presented in spreadsheets and slides in review meetings – where a narrative is already prepared by the analysts.
This robs the opportunity for the data users to explore and ask data questions on their own in a fast iteration cycle to improve their intuition and understanding of their product/business environment.
IMO, these challenges still remain largely unsolved even to this day across organizations of all size and scale.
by cm277 on 9/5/21, 3:04 PM
The core insight though that the article is missing is that these dimensions are well-defined (and finite!) within the OLAP architecture and so are the roll-ups (aggregations) with their parent dimensions. The classical example is a date: a date has to roll-up to a week. Obvious. But the set of dates within the environment has to (or should anyway) also be well defined: so you can't just use any date, it has to fall, e.g. between 2019 and 2021.
Why? because the structure of the dimensions makes algebra across OLAP cubes trivial. In fact, OLAP algebra is pretty much linear algebra, just defined slightly differently for a more fluid problem space. That's why I like to think that OLAP isn't obsolete: in fact, most databases are just now becoming OLAP-y :)
Source: 20+ years of OLAP systems development.
by olapalo on 9/5/21, 12:21 AM
by philipodonnell on 9/5/21, 11:43 AM
by 7thaccount on 9/5/21, 4:18 AM
It sounds like certain common queries may be extra fast as it's kind of pre-computed?