IT outage

Reply Subscribe

Page 15 of 39

« First

Last »

Thread Tools

Search this Thread

07-22-2024, 05:11 AM

#141

CloudMonkey

On Reserve

Joined APC: Jul 2024

Posts: 12

Quote:

Originally Posted by 170Till5

bro, you’re an idiot. The chief pilot office in ATL made an announcement over the loud speaker that if you are willing to fly, come into the office. The coverage ladder is out the window with this. GA about to call in the national guard to airlift people out of ATL 😂… that’s how awful Delta is doing.

But to each their own. I’m done with you all on this conversation. See what you want to see.

Sometimes you have to let it all burn down in order to build it back up stronger. I don't see Ed Bastian loading bags on the ramp or rebooking customers so don't expect me to go out of my way and do the job that someone else should be doing.

07-22-2024, 05:26 AM

#142

tripled

Gets Weekends Off

Joined APC: Feb 2007

Position: Big ones

Posts: 774

Quote:

Originally Posted by CloudMonkey

And yet no trips are covered using the prefix C. Scheduling has the tools to rapidly fix things today in order to buy time to get better for tomorrow. I wonder why they’re not doing it.

07-22-2024, 05:29 AM

#143

PilotJ3

Gets Weekends Off

Joined APC: Jul 2010

Posts: 3,371

Quote:

Originally Posted by Hotel Kilo

That's not the solution they are looking at. A simple rules change to permit commuters to NOT have to get in contact with CS or PA to book a seat on a capacity flight is an easy fix.

DOT has declared this a "controllable" incident, which means air lines are on the hook.

I fail to see how they concluded this when the issue was buggy software update from Crowd strike that was the causal factor. The updates are automatically pushed with no testing done prior to their deployment. That's a huge foul. We've got thousands of affected computers and servers that have required a lengthy manual reboot process to un f#@$& this mess they (crowd strike) is responsible for.

Is crowd strike the issue here? Or how the company uses their application?

everyone else recovered already.

The Ps solution works. But the company doesn’t know where the crews are. More bases = more Gs can go out where people leave.

07-22-2024, 05:36 AM

#144

Hotel Kilo

Gets Weekends Off

Joined APC: Jan 2023

Posts: 1,520

Quote:

Originally Posted by PilotJ3

Crowd strike is the "third party vendor" referred to in prior news releases from the company. It wasn't until yesterday we named them in our messaging.

Crowd strike is a contractor to Microsoft. Crowd strike routinely pushes updates to their piece of the Microsoft puzzle. The one they pushed in early morning (here in the US) of July 19th is the culprit. It hit Europe first. We knew about it but by the time we had to react the updates where already deployed to the Microsoft systems we employ.

So yes, the buggy update pushed automatically by crowd strike is what's responsible for bricking our computers and servers (gate agent computers crew tracking and scheduling tools etc) all were affected.

To your PS statement. Not really. The common scenario lately has been crew booked PS then that flight cancelled. So now they are trying to book another PS flight at close range but they can't because that flight is at capacity and travel net will not allow it. Now they have to reach out to CS or PA to get them to make the PS listing for them. This takes hours to do and by then that flight they were trying to PS on is departed. Gate Agents can't put you on there PS unless they get the OK from Mecca.

So, simple solution is to remove the rules on booking PS on capacity flights when it's invoked during IROP. Or at least give gate agents the ability to list you at the gate.

07-22-2024, 05:42 AM

#145

ancman

Gets Weekends Off

Joined APC: Jul 2022

Posts: 930

Quote:

Originally Posted by Hotel Kilo

It’s fair to blame CrowdStrike for Friday’s initial disruption. However, Saturday, Sunday, and now Monday fall squarely on Delta. Our infrastructure and recovery capability is severely lacking, as this management team has always been more focused on optics than substance.

Our competitors are blowing us away with their recovery efforts.

07-22-2024, 05:47 AM

#146

Hotel Kilo

Gets Weekends Off

Joined APC: Jan 2023

Posts: 1,520

Quote:

Originally Posted by ancman

I think that's being a bit disengenous. In order to "recover" we had to manually reboot thousands of computers ( gate agent, load crew track sched etc) all over the world. The manual reboot process is lengthy. It requires IT folks to accomplish one machine/server at a time.

That is why we have taken some time to get back on step.

Point fingers at the cause.

Now can we improve comms between those out on the line and Mecca - yes. This is like my 7th major meltdown and I see the same every time. We can do better at getting robust and timely comms to and from our personnel out on the line. RA even talked about it many years ago, but here we are.

07-22-2024, 05:50 AM

#147

SoloPilot

Line Holder

Joined APC: Sep 2022

Posts: 95

Quote:

Originally Posted by Hotel Kilo

Didn't UAL and AAL have to do the same fix/reset?

07-22-2024, 05:50 AM

#148

Transit

Line Holder

Joined APC: Feb 2020

Posts: 84

Quote:

Originally Posted by Hotel Kilo

Crowd strike is the "third party vendor" referred to in prior news releases from the company. It wasn't until yesterday we named them in our messaging.

Crowd strike is a contractor to Microsoft. Crowd strike routinely pushes updates to their piece of the Microsoft puzzle. The one they pushed in early morning (here in the US) of July 19th is the culprit. It hit Europe first. We knew about it but by the time we had to react the updates where already deployed to the Microsoft systems we employ.

So yes, the buggy update pushed automatically by crowd strike is what's responsible for bricking our computers and servers (gate agent computers crew tracking and scheduling tools etc) all were affected.

This part is not correct. Crowdstrike is a corporate MDR solution (think antivirus on steriods). They have nothing to do with Microsoft or Windows updates. Corporate security/IT chose this particular piece of software because they are the market leader, somewhere around 25%. The problem was cause by Crowstrike themselves who pushed out a faulty update. How this happened we won't know until an offical post mortem report is written. Modern MDR/Anti-Virus solutions run in Ring 0 (Kernel) level of the operating system which is the highest level of access for Windows and thus the most dangerous when things go wrong. It needs to however, live here to operate sufficently and do the things it needs to do to stop threat actors.

The preliminary information that has come out so far is the file that Crowstrike pushed contained nothing but zeros. Because this file lived in such low level in the operating system, it caused Windows to boot loop. How this file came to be containing no information and why it was not caught in QA before being pushed is yet to be determined.

The reason why this is so devestating is because it requires physical access to each machine if the machine does not have out of band mangement (Intel vPRO/IPMI comes to mind but this has vunerability issues of it's own) in order to remove the faulty file. Bitlocker (Which is a microsoft feature for windows) comes into place in all this as a device encryption feature in windows. This prevents someone from removing the drive and placing it into another computer OR booting from a USB drive and manipulating the underlying Windows OS (Remove Passwords, etc) However bitlocker is actually doing what it's designed to do in this case and isn't a culperate. The bitlocker recovery key is needed to get to a command prompt in recovery mode in order to remove the fault Crowdstrike update.

TLDR: This is entirely on Crowdstrike. They are a third party vendor. Nothing to do with Microsoft for a change.

Quote:

Originally Posted by SoloPilot

Didn't UAL and AAL have to do the same fix/reset?

Backend systems are different. Rumor on the block is Crew360 is the cause for our pain. I don't know much about the software or it's backend Database but apparently this is our archillis heel at the moment.

Last edited by Transit; 07-22-2024 at 05:57 AM. Reason: Spelling

07-22-2024, 05:51 AM

#149

ancman

Gets Weekends Off

Joined APC: Jul 2022

Posts: 930

Quote:

Originally Posted by Hotel Kilo

United had the same problem. They’re WAY ahead of us with their recovery.

The continued abysmal recovery lies squarely on Delta. We lack system redundancy, IT personnel, OCC personnel.

None of that matters to management, as they believe that our customers will continue to pay a premium to fly on us if we wear hats and stand in the way saying goodbye during deplaning.