by aiunboxed on 10/7/23, 7:23 AM with 8 comments
We tend to miss a lit of critical alerts that come to simply because the alerting is not set up properly.
by slap_shot on 10/10/23, 5:26 PM
Personally, I find notifications in Slack to be an anti-pattern: a lot of teams expect someone to just "pick up" the incident based on their availability or expertise and _maybe_ the resolution is documented. Assigning direct responsibility by component and on-call schedule appending the RCA reduces the time-to-resolution and overall toil of the process.
by nip on 10/7/23, 2:55 PM
Errors are reported in dedicated slack channels
The “MVP” was built in 1 week after we were faced with an outrageous bill from an observability vendor and decided to give a shot at implementing it ourselves.
In total I’d say that we invested 2 additional weeks of man-hour to get to where we are today.
It has worked extremely well for us and has needed little maintenance (granted we pay AWS to not have to do that maintenance)
by mtmail on 10/7/23, 11:14 AM
by guybedo on 10/13/23, 2:55 AM
- regular http monitoring for websites
- run test queries on my sql & mongo databases
- check that rabbitmq queues are not overflowing
- check that docker container are up
If something goes wrong, email & telegram alerts.
fwiw i'm using https://uptimefunk.com
by rozenmd on 10/7/23, 10:55 AM
by girishso on 10/7/23, 10:45 AM
by Cicero22 on 10/7/23, 2:18 PM
by 0xebo on 10/7/23, 12:05 PM