by threesevenths on 7/5/24, 11:25 PM with 106 comments
by ttul on 7/6/24, 12:21 AM
It’s a shame on Canada and on Canadians that foreign competitors are still prohibited from coming into the market.
by elchief on 7/5/24, 11:52 PM
ooops. that's actually quite funny
in case you didn't know, Rogers is one of canada's "big 3" telecom providers. The outage in 2022 basically crippled our economy for a couple days (most ATMs / interac didn't work)
by maltalex on 7/6/24, 12:15 AM
This is networking 101. Heck, this is engineering 101. The real question is how a network provider as large as Rogers managed to be so poorly engineered in the first place.
by atyvr on 7/6/24, 2:35 AM
Route leaks can happen to anyone, but the fact that it brought down their entire network, including voice and internet services, across all provinces, was unacceptable.
What's even more concerning is that they had no out-of-band access, which meant no management access to their network. This explains why the outage lasted a whopping 24 hours.
In my opinion, the lack of OOB was the most critical and yet the most preventable. Proper OOB is a must; I wouldn't operate a network without it, I don't understand why Rogers thought that was acceptable.
by WaitWaitWha on 7/6/24, 12:53 AM
> [...] both the wireless and wireline networks sharing a common IP core network, the scope of the outage was extreme in that it resulted in a catastrophic loss of all services. [...] It is a design choice by [...] Rogers, that seeks to balance cost with performance.
Based on this write up and details I gathered, I believe the root cause, the fundamental reason for the failure is incorrect cost-to-performance balance at senior management.
by great_psy on 7/6/24, 1:15 AM
by kn100 on 7/7/24, 3:56 AM
by yelnatz on 7/6/24, 2:00 AM
1) Remote into the boxes to see what's happening.
2) Talk to other devs because their phones are on Rogers network.
My stress couldn't.
by shrubble on 7/6/24, 12:47 AM
by jtchang on 7/6/24, 1:41 AM
Talk about some grade A gaslighting here. Reading the post mortem they first tell you it wasn't a design flaw then say they routed all their data through one core router ( including a lack of a management network). Then they say they are going to fix things by separating out the wireless and wired traffic.
Why would you fix things if it wasn't a design flaw?
Out of band access is like resilient architecture 101. Hell even homelabs generally have some way to do it. It's appalling that Rogers didn't have a way to access the core IP routers out of band. Yes it might mean having to use a competitors infrastructure but they ended up having to do it anyway. And with the failure of the service now all the infrastructure providers are under additional scrutiny. Rogers should be striking some agreements with other providers to carry core traffic in case of an outage such as in this DR situation. For example Visa, MC, Amex all have agreements in place to process each others auth data in case the other party goes down. The thinking here being an outage for credit cards makes everyone look bad.
by Insanity on 7/6/24, 12:36 AM