A joint statement from UniSuper CEO Peter Chun and Google Cloud CEO Thomas Kurian
8 May 2024
UniSuper and Google Cloud understand the disruption to services experienced by members has been extremely frustrating and disappointing. We extend our sincere apologies to all members.
While supporting UniSuper to bring its systems back online, Google Cloud has been conducting a root cause analysis.
Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events, where an inadvertent misconfiguration during provisioning of UniSuperās Private Cloud services ultimately resulted in the deletion of UniSuperās Private Cloud subscription.
This is described as an isolated, āone-of-a-kind occurrenceā that has never before occurred with any Google Cloud client globally. This should not have happened. Google Cloud has identified the sequence of events and taken measures to ensure it does not happen again.
Why did the outage last so long?
UniSuper had duplication across two geographies as protection against outages and data loss. However, the deletion of the Private Cloud subscription triggered deletion across both geographies.
Restoring the Private Cloud required significant coordination and effort between UniSuper and Google Cloud, including recovery of hundreds of virtual machines, databases, and applications.
dangoodmanUTtoday at 1:22 AM
It has been 0 days since GCP has taken down a startup (again).
You see this at least once a year. Never heard of this from AWS or Azure.
In all seriousness, this is why we don't use them. They have the most ergonomic cloud of the big three, then absolutely murder it by having this kind of reputation.
binaryclerictoday at 2:00 AM
How the heck do these things happen, especially with companies with huge monthly spend? At my last job we had some suspicious workloads running on AWS and our TAM reached out to us before taking any action. Who wants to bet this was some AI automation gone wrong and because GCP seems to be allergic to actually contacting a human to get a response, this just sits in some support queue that outsourced workers look at after a few hours just to give a canned response?
BitWiseVibetoday at 2:07 AM
As someone who runs some public APIs, the amount of spam from Railway IPs is insane. They have horrible abuse prevention. Hopefully this encourages them to improve their operations.
chatmastatoday at 2:57 AM
I thought Railway was building their own data centers? [0]
> The fact of the matter is, you simply cannot build a cloud on someone elseās cloud.
All in on cloud so we donāt need to worry about backups. Now your subscription is the single point of failure.
bearjawstoday at 2:02 AM
I will never leverage GCP in an enterprise setting, it's honestly amazing how hard they fumble the bag. Will be interesting to see when GCP support started working with them, from the updates there was an hour and change from when they identified the issue and GCP support was confirmed.
In the cloud space it seems like AWS does nothing and wins.
brokenodotoday at 2:20 AM
Well, as a 2 week tenured and very happy Railway customer until now, I am now a Render customer. Somehow DNS cut over within 1 min(!) and live after about 30 minutes of work. Not bad!
thrownthatwaytoday at 4:20 AM
Huh.
Railway dot com
Has nothing to do with railways.
I wish software people would get their own words.
UrbanNorminaltoday at 2:05 AM
Is google allergic to humans or something? Cannot they just send an email or call the company before taking a wrecking ball to the entire company's infra? Are they stupid?
codegeektoday at 1:33 AM
This is bad. Even their own website is down at railway.com. Looks like total dependency on google cloud. Surprising for a company of their scale with all this VC money.
padolseytoday at 1:59 AM
Does anyone know how this even happens inside the walls of google? Is it an automated process? How is such a (presumably) high revenue account just magically blocked without human intervention? I'm quite perplexed.
sammy2255today at 2:53 AM
The 3-2-1 backup rule is pretty outdated in the world of cloud. You could have 3 complete copies of your data in different S3 buckets, but if they're all under the same account you've lost your blast radius protection
r_leetoday at 1:39 AM
seriously, is it possible to trust GCP with critical data/services at this point if you're not a billion dollar company?
I'm exaggerating but someone said they got "auto banned"
what if that happens to a small account which hosts some really important data/services there?
tuxtoday at 1:45 AM
At this point you canāt trust Google anymore, it keeps breaking things. Imagine having Google AI do this thins automatically. Will have apocalypse in in a day.
jefborgestoday at 1:49 AM
Railway is back, but Iām not sure if I can trust keeping my projects there, so Iām going to migrate to another company.
usernametaken29today at 2:46 AM
I didnāt knew Railway so with this misleading headline I thought a Google Cloud data centre was being built in the way of a railroad. Thatād been a funny story to read..
bilalqtoday at 3:57 AM
Building a startup on GCP (or even Google Workspace) is an existential risk.
zelon88today at 3:10 AM
Wild to me that any tech sector business would want to rent an operating environment to park their entire infrastructure into. This is the equivalent to traveling shoe salesmen setting up a tent in the parking lot of a strip mall.
koolhead17today at 4:01 AM
Let's blame some rouge AI agent at GCP causing this.
hnburnsytoday at 2:50 AM
From their founder on X...
"Absolutely. The Railway network is a mesh ring between AWS, GCP, and Metal
So:
- High availability interconnects
- High availability path routing between clouds
- Database itself is high availability
However, Google's VPC itself is not. So we will add a shard to Metal and AWS"
orliesaurustoday at 1:51 AM
I wonder if someone has exploited a weird Google-safety automated process to report something on Railway which caused Google to block the whole thing.
There's a lot of, what seems to me, unfounded blame being directed at Google for this. Isn't railway the company that just blamed Anthropic for deleting their prod database?
jujube3today at 3:00 AM
If you buy a cloud-on-a-cloud, you're a clown-on-a-clown.
shevy-javatoday at 4:00 AM
Do not become dependent on Google. Ever.
deletedtoday at 1:50 AM
redanddeadtoday at 2:16 AM
one of the many reasons companies are cloud agnostic and dont want to get locked in