Railway Blocked by Google Cloud

309 points - today at 12:23 AM

Source

Comments

valgaze today at 3:32 AM
May 2024 UniSuper incident: https://cloud.google.com/blog/products/infrastructure/detail...

https://www.unisuper.com.au/about-us/media-centre/2024/a-joi...

A joint statement from UniSuper CEO Peter Chun and Google Cloud CEO Thomas Kurian

8 May 2024

UniSuper and Google Cloud understand the disruption to services experienced by members has been extremely frustrating and disappointing. We extend our sincere apologies to all members.

While supporting UniSuper to bring its systems back online, Google Cloud has been conducting a root cause analysis.

Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events, where an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services ultimately resulted in the deletion of UniSuper’s Private Cloud subscription.

This is described as an isolated, ā€œone-of-a-kind occurrenceā€ that has never before occurred with any Google Cloud client globally. This should not have happened. Google Cloud has identified the sequence of events and taken measures to ensure it does not happen again.

Why did the outage last so long?

UniSuper had duplication across two geographies as protection against outages and data loss. However, the deletion of the Private Cloud subscription triggered deletion across both geographies.

Restoring the Private Cloud required significant coordination and effort between UniSuper and Google Cloud, including recovery of hundreds of virtual machines, databases, and applications.

dangoodmanUT today at 1:22 AM
It has been 0 days since GCP has taken down a startup (again).

You see this at least once a year. Never heard of this from AWS or Azure.

In all seriousness, this is why we don't use them. They have the most ergonomic cloud of the big three, then absolutely murder it by having this kind of reputation.

binarycleric today at 2:00 AM
How the heck do these things happen, especially with companies with huge monthly spend? At my last job we had some suspicious workloads running on AWS and our TAM reached out to us before taking any action. Who wants to bet this was some AI automation gone wrong and because GCP seems to be allergic to actually contacting a human to get a response, this just sits in some support queue that outsourced workers look at after a few hours just to give a canned response?
BitWiseVibe today at 2:07 AM
As someone who runs some public APIs, the amount of spam from Railway IPs is insane. They have horrible abuse prevention. Hopefully this encourages them to improve their operations.
chatmasta today at 2:57 AM
I thought Railway was building their own data centers? [0]

> The fact of the matter is, you simply cannot build a cloud on someone else’s cloud.

Indeed…

[0] https://blog.railway.com/p/launch-week-02-welcome

mjy78 today at 4:16 AM
All in on cloud so we don’t need to worry about backups. Now your subscription is the single point of failure.
bearjaws today at 2:02 AM
I will never leverage GCP in an enterprise setting, it's honestly amazing how hard they fumble the bag. Will be interesting to see when GCP support started working with them, from the updates there was an hour and change from when they identified the issue and GCP support was confirmed.

In the cloud space it seems like AWS does nothing and wins.

brokenodo today at 2:20 AM
Well, as a 2 week tenured and very happy Railway customer until now, I am now a Render customer. Somehow DNS cut over within 1 min(!) and live after about 30 minutes of work. Not bad!
thrownthatway today at 4:20 AM
Huh.

Railway dot com

Has nothing to do with railways.

I wish software people would get their own words.

UrbanNorminal today at 2:05 AM
Is google allergic to humans or something? Cannot they just send an email or call the company before taking a wrecking ball to the entire company's infra? Are they stupid?
codegeek today at 1:33 AM
This is bad. Even their own website is down at railway.com. Looks like total dependency on google cloud. Surprising for a company of their scale with all this VC money.
padolsey today at 1:59 AM
Does anyone know how this even happens inside the walls of google? Is it an automated process? How is such a (presumably) high revenue account just magically blocked without human intervention? I'm quite perplexed.
sammy2255 today at 2:53 AM
The 3-2-1 backup rule is pretty outdated in the world of cloud. You could have 3 complete copies of your data in different S3 buckets, but if they're all under the same account you've lost your blast radius protection
r_lee today at 1:39 AM
seriously, is it possible to trust GCP with critical data/services at this point if you're not a billion dollar company?

I'm exaggerating but someone said they got "auto banned"

what if that happens to a small account which hosts some really important data/services there?

tux today at 1:45 AM
At this point you can’t trust Google anymore, it keeps breaking things. Imagine having Google AI do this thins automatically. Will have apocalypse in in a day.
jefborges today at 1:49 AM
Railway is back, but I’m not sure if I can trust keeping my projects there, so I’m going to migrate to another company.
usernametaken29 today at 2:46 AM
I didn’t knew Railway so with this misleading headline I thought a Google Cloud data centre was being built in the way of a railroad. That’d been a funny story to read..
bilalq today at 3:57 AM
Building a startup on GCP (or even Google Workspace) is an existential risk.
zelon88 today at 3:10 AM
Wild to me that any tech sector business would want to rent an operating environment to park their entire infrastructure into. This is the equivalent to traveling shoe salesmen setting up a tent in the parking lot of a strip mall.
koolhead17 today at 4:01 AM
Let's blame some rouge AI agent at GCP causing this.
hnburnsy today at 2:50 AM
From their founder on X...

"Absolutely. The Railway network is a mesh ring between AWS, GCP, and Metal

So: - High availability interconnects - High availability path routing between clouds - Database itself is high availability

However, Google's VPC itself is not. So we will add a shard to Metal and AWS"

orliesaurus today at 1:51 AM
I wonder if someone has exploited a weird Google-safety automated process to report something on Railway which caused Google to block the whole thing.
eezing today at 4:02 AM
ā€œDeletion of private cloud subscriptionā€¦ā€

Who deleted it?

gnabgib today at 12:24 AM
Dupe - join the discussion started an hour ago instead of query string work (12 points, 4 comments) https://news.ycombinator.com/item?id=48200827
parineum today at 2:19 AM
There's a lot of, what seems to me, unfounded blame being directed at Google for this. Isn't railway the company that just blamed Anthropic for deleting their prod database?
jujube3 today at 3:00 AM
If you buy a cloud-on-a-cloud, you're a clown-on-a-clown.
shevy-java today at 4:00 AM
Do not become dependent on Google. Ever.
deleted today at 1:50 AM
redanddead today at 2:16 AM
one of the many reasons companies are cloud agnostic and dont want to get locked in
isninkhamiss today at 1:35 AM
github got way more noise for less
ChrisArchitect today at 2:00 AM
fnord77 today at 3:21 AM
wish I knew what "railway" is
rvz today at 1:31 AM
Let me guess… Googler running AI agent in production that blocked this startup’s account.
deleted today at 1:34 AM
codepack today at 3:48 AM
[dead]
codepack today at 4:01 AM
[dead]
codepack today at 3:47 AM
[dead]
codepack today at 3:45 AM
[dead]
htrp today at 4:01 AM
[dead]
unit490 today at 2:34 AM
[dead]
rekabis today at 12:55 AM
TL;DR: putting all your eggs into one basket is bad, man.