IT outage
#141
On Reserve
Joined APC: Jul 2024
Posts: 12
bro, you’re an idiot. The chief pilot office in ATL made an announcement over the loud speaker that if you are willing to fly, come into the office. The coverage ladder is out the window with this. GA about to call in the national guard to airlift people out of ATL 😂… that’s how awful Delta is doing.
But to each their own. I’m done with you all on this conversation. See what you want to see.
But to each their own. I’m done with you all on this conversation. See what you want to see.
#142
Gets Weekends Off
Joined APC: Feb 2007
Position: Big ones
Posts: 774
#143
Gets Weekends Off
Joined APC: Jul 2010
Posts: 3,371
That's not the solution they are looking at. A simple rules change to permit commuters to NOT have to get in contact with CS or PA to book a seat on a capacity flight is an easy fix.
DOT has declared this a "controllable" incident, which means air lines are on the hook.
I fail to see how they concluded this when the issue was buggy software update from Crowd strike that was the causal factor. The updates are automatically pushed with no testing done prior to their deployment. That's a huge foul. We've got thousands of affected computers and servers that have required a lengthy manual reboot process to un f#@$& this mess they (crowd strike) is responsible for.
DOT has declared this a "controllable" incident, which means air lines are on the hook.
I fail to see how they concluded this when the issue was buggy software update from Crowd strike that was the causal factor. The updates are automatically pushed with no testing done prior to their deployment. That's a huge foul. We've got thousands of affected computers and servers that have required a lengthy manual reboot process to un f#@$& this mess they (crowd strike) is responsible for.
everyone else recovered already.
The Ps solution works. But the company doesn’t know where the crews are. More bases = more Gs can go out where people leave.
#144
Gets Weekends Off
Joined APC: Jan 2023
Posts: 1,520
Crowd strike is a contractor to Microsoft. Crowd strike routinely pushes updates to their piece of the Microsoft puzzle. The one they pushed in early morning (here in the US) of July 19th is the culprit. It hit Europe first. We knew about it but by the time we had to react the updates where already deployed to the Microsoft systems we employ.
So yes, the buggy update pushed automatically by crowd strike is what's responsible for bricking our computers and servers (gate agent computers crew tracking and scheduling tools etc) all were affected.
To your PS statement. Not really. The common scenario lately has been crew booked PS then that flight cancelled. So now they are trying to book another PS flight at close range but they can't because that flight is at capacity and travel net will not allow it. Now they have to reach out to CS or PA to get them to make the PS listing for them. This takes hours to do and by then that flight they were trying to PS on is departed. Gate Agents can't put you on there PS unless they get the OK from Mecca.
So, simple solution is to remove the rules on booking PS on capacity flights when it's invoked during IROP. Or at least give gate agents the ability to list you at the gate.
#145
Gets Weekends Off
Joined APC: Jul 2022
Posts: 930
Crowd strike is the "third party vendor" referred to in prior news releases from the company. It wasn't until yesterday we named them in our messaging.
Crowd strike is a contractor to Microsoft. Crowd strike routinely pushes updates to their piece of the Microsoft puzzle. The one they pushed on the morning (here in the US) of July 19th at around midnight is the culprit. It hit Europe first. We knew about it but the time we had to react the updates where deployed to the Microsoft systems we employ.
So yes, the buggy update pushed automatically by crowd strike is what's responsible for bricking our computers and servers (gate agent computers crew tracking and scheduling tools etc) all were affected.
Crowd strike is a contractor to Microsoft. Crowd strike routinely pushes updates to their piece of the Microsoft puzzle. The one they pushed on the morning (here in the US) of July 19th at around midnight is the culprit. It hit Europe first. We knew about it but the time we had to react the updates where deployed to the Microsoft systems we employ.
So yes, the buggy update pushed automatically by crowd strike is what's responsible for bricking our computers and servers (gate agent computers crew tracking and scheduling tools etc) all were affected.
Our competitors are blowing us away with their recovery efforts.
#146
Gets Weekends Off
Joined APC: Jan 2023
Posts: 1,520
It’s fair to blame CrowdStrike for Friday’s initial disruption. However, Saturday, Sunday, and now Monday fall squarely on Delta. Our infrastructure and recovery capability is severely lacking, as this management team has always been more focused on optics than substance.
Our competitors are blowing us away with their recovery efforts.
Our competitors are blowing us away with their recovery efforts.
That is why we have taken some time to get back on step.
Point fingers at the cause.
Now can we improve comms between those out on the line and Mecca - yes. This is like my 7th major meltdown and I see the same every time. We can do better at getting robust and timely comms to and from our personnel out on the line. RA even talked about it many years ago, but here we are.
#147
Line Holder
Joined APC: Sep 2022
Posts: 95
I think that's being a bit disengenous. In order to "recover" we had to manually reboot thousands of computers ( gate agent, load crew track sched etc) all over the world. The manual reboot process is lengthy. It requires IT folks to accomplish one machine/server at a time.
That is why we have taken some time to get back on step.
Point fingers at the cause.
Now can we improve comms between those out on the line and Mecca - yes. This is like my 7th major meltdown and I see the same every time. We can do better at getting robust and timely comms to and from our personnel out on the line. RA even talked about it many years ago, but here we are.
That is why we have taken some time to get back on step.
Point fingers at the cause.
Now can we improve comms between those out on the line and Mecca - yes. This is like my 7th major meltdown and I see the same every time. We can do better at getting robust and timely comms to and from our personnel out on the line. RA even talked about it many years ago, but here we are.
#148
Line Holder
Joined APC: Feb 2020
Posts: 84
Crowd strike is the "third party vendor" referred to in prior news releases from the company. It wasn't until yesterday we named them in our messaging.
Crowd strike is a contractor to Microsoft. Crowd strike routinely pushes updates to their piece of the Microsoft puzzle. The one they pushed in early morning (here in the US) of July 19th is the culprit. It hit Europe first. We knew about it but by the time we had to react the updates where already deployed to the Microsoft systems we employ.
So yes, the buggy update pushed automatically by crowd strike is what's responsible for bricking our computers and servers (gate agent computers crew tracking and scheduling tools etc) all were affected.
Crowd strike is a contractor to Microsoft. Crowd strike routinely pushes updates to their piece of the Microsoft puzzle. The one they pushed in early morning (here in the US) of July 19th is the culprit. It hit Europe first. We knew about it but by the time we had to react the updates where already deployed to the Microsoft systems we employ.
So yes, the buggy update pushed automatically by crowd strike is what's responsible for bricking our computers and servers (gate agent computers crew tracking and scheduling tools etc) all were affected.
The preliminary information that has come out so far is the file that Crowstrike pushed contained nothing but zeros. Because this file lived in such low level in the operating system, it caused Windows to boot loop. How this file came to be containing no information and why it was not caught in QA before being pushed is yet to be determined.
The reason why this is so devestating is because it requires physical access to each machine if the machine does not have out of band mangement (Intel vPRO/IPMI comes to mind but this has vunerability issues of it's own) in order to remove the faulty file. Bitlocker (Which is a microsoft feature for windows) comes into place in all this as a device encryption feature in windows. This prevents someone from removing the drive and placing it into another computer OR booting from a USB drive and manipulating the underlying Windows OS (Remove Passwords, etc) However bitlocker is actually doing what it's designed to do in this case and isn't a culperate. The bitlocker recovery key is needed to get to a command prompt in recovery mode in order to remove the fault Crowdstrike update.
TLDR: This is entirely on Crowdstrike. They are a third party vendor. Nothing to do with Microsoft for a change.
Backend systems are different. Rumor on the block is Crew360 is the cause for our pain. I don't know much about the software or it's backend Database but apparently this is our archillis heel at the moment.
Last edited by Transit; 07-22-2024 at 05:57 AM. Reason: Spelling
#149
Gets Weekends Off
Joined APC: Jul 2022
Posts: 930
I think that's being a bit disengenous. In order to "recover" we had to manually reboot thousands of computers ( gate agent, load crew track sched etc) all over the world. The manual reboot process is lengthy. It requires IT folks to accomplish one machine/server at a time.
That is why we have taken some time to get back on step.
Point fingers at the cause.
Now can we improve comms between those out on the line and Mecca - yes. This is like my 7th major meltdown and I see the same every time. We can do better at getting robust and timely comms to and from our personnel out on the line. RA even talked about it many years ago, but here we are.
That is why we have taken some time to get back on step.
Point fingers at the cause.
Now can we improve comms between those out on the line and Mecca - yes. This is like my 7th major meltdown and I see the same every time. We can do better at getting robust and timely comms to and from our personnel out on the line. RA even talked about it many years ago, but here we are.
The continued abysmal recovery lies squarely on Delta. We lack system redundancy, IT personnel, OCC personnel.
None of that matters to management, as they believe that our customers will continue to pay a premium to fly on us if we wear hats and stand in the way saying goodbye during deplaning.
Thread
Thread Starter
Forum
Replies
Last Post