by collectedparts on 12/19/21, 4:55 AM with 5 comments
Maybe some other business KPIs, but the emphasis is on building "graphs that the operational team should be looking at on ~hourly basis to prioritize fixes and spot systemic regressions."
Ideally it would integrate with Pagerduty, in that it'd be easy to express "page me if if X is above Y."
My preference is that I have my own cron job that captures the metrics from our queues/databases and pushes them to the tool via a ~REST API but I could be convinced of a different method.
So far the only thing I can think of is Datadog, specifically https://www.datadoghq.com/solutions/real-time-business-intelligence/
Anything else I should be looking at? Or any other tips?
by XCSme on 12/19/21, 10:07 PM
From what I understand, it's more like an easier-to-use grafana, where you can build charts/graphs from different data sources.
by yuppie_scum on 12/19/21, 6:07 AM
Self hosted wise Prometheus/Grafana, ELK stack, old school Nagios maybe
AWS also has managed Grafana/Prometheus for a fairly reasonable price, I would recommend this if you don’t have a lot of time to mess around updating your grafana stack every few months
by AishwaryaVenkat on 12/20/21, 7:41 AM
by Jugurtha on 12/19/21, 7:26 PM