by wastedbrains on 10/30/13, 11:53 PM with 16 comments
by foz on 10/31/13, 5:32 AM
Often, when things have gone really wrong (DoS, internal network issues, app errors, disk full) the affected machine(s) stop reporting to graphite (or under-report data). We get alerted by monitoring the services, not the stats.
Being alerted about low or unusual values might be helpful in some cases, but based on my experience, it would too noisy. Usually when something bad happens, we anyway investigate Graphite and analytics tools to understand the impact on traffic and KPIs.
I could see Rearview being useful for some cases, but not as a replacement for real monitoring and alerting tools.
by jwatte on 10/31/13, 6:10 AM
by dekz on 10/31/13, 6:03 AM
Why not a full ruby stack, or was the "live" scripting done after the initial inception?
by fit2rule on 10/31/13, 10:42 AM
by mh- on 10/31/13, 3:25 AM
the UI is quite polished