How the Google Cloud Outage Crashed the Internet

How the Google Cloud Outage Crashed the InternetOn June 12, 2025, a significant portion of the internet experienced a sudden outage. What started as intermittent failures on Gmail and Spotify soon escalated into a global infrastructure meltdown. For millions of users and hundreds of companies, critical apps simply stopped working.
͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     
Forwarded this email? Subscribe here for more
How the Google Cloud Outage Crashed the Internet
ByteByteGo
Jun 17 

READ IN APP

7 Key Insights from the State of DevSecOps Report (Sponsored)
Datadog analyzed data from tens of thousands of orgs to uncover 7 key insights on modern DevSecOps practices and application security risks.
Highlights:
Why smaller container images reduce severe vulns
How runtime context helps you prioritize critical CVEs
The link between deploy frequency and outdated dependencies
Plus, learn proven strategies to implement infrastructure as code, automated cloud deploys, and short-lived CI/CD credentials.
Get the report
Disclaimer: The details in this post have been derived from the details shared online by the Google Engineering Team. All credit for the technical details goes to the Google Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.
On June 12, 2025, a significant portion of the internet experienced a sudden outage. What started as intermittent failures on Gmail and Spotify soon escalated into a global infrastructure meltdown. For millions of users and hundreds of companies, critical apps simply stopped working.
At the heart of it all was a widespread outage in Google Cloud Platform (GCP), which serves as the backend for a vast ecosystem of digital services. The disruption began at 10:51 AM PDT, and within minutes, API requests across dozens of regions were failing with 503 errors. Over a few hours, the ripple effects became undeniable.
Among consumer platforms, the outage took down:
Spotify (approximately 46,000 users reported on Downdetector).
Snapchat, Discord, Twitch, and Fitbit: users were unable to stream, chat, or sync their data.
Google Workspace apps (including Gmail, Calendar, Meet, and Docs). These apps power daily workflows for hundreds of millions of users.
The failure was just as acute for enterprise and developer tools:
GitLab, Replit, Shopify, Elastic, LangChain, and other platforms relying on GCP services saw degraded performance, timeouts, or complete shutdowns.
Thousands of CI/CD pipelines, model serving endpoints, and API backends stalled or failed outright.
Vertex AI, BigQuery, Cloud Functions, and Google Cloud Storage were all affected, halting data processing and AI operations.
In total, more than 50 distinct Google Cloud services across over 40 regions worldwide were affected. 
Perhaps the most significant impact came from Cloudflare, a company often viewed as a pillar of internet reliability. While its core content delivery network (CDN) remained operational, Cloudflare's authentication systems, reliant on Google Cloud, failed. This led to issues with session validation, login workflows, and API protections for many of its customers. 
The financial markets also felt the impact of this outage. Alphabet (Google’s parent) saw its stock fall by nearly 1 percent. The logical question that arose from this incident is as follows: How did a platform built for global scale suffer such a cascading collapse? 
Let’s understand more about it.
Special Event: Save 20% on Top Maven Courses (Sponsored)
Your education is expiring faster than ever. What you learned in college won’t help you lead in the age of AI.
That's why Maven specializes in live courses with practitioners who have actually done the work and shipped innovative products:
Shreyas Doshi (Product leader at Stripe, Twitter, Google) teaching Product Sense
Hamel Husain (renowned ML engineer, Github) teaching AI evals
Aish Naresh Reganti (AI scientist at AWS) teaching Agentic AI
Hamza Farooq (Researcher at Google) teaching RAG
This week only: Save 20% on Maven’s most popular courses in AI, product, engineering, and leadership to accelerate your career.
Explore Event (Ends Sunday)
Inside the Outage
To understand how such a massive outage occurred, we need to look under the hood at a critical system deep inside Google Cloud’s infrastructure. It’s called the Service Control.
The Key System: Service Control
Service Control is one of the foundational components of Google Cloud's API infrastructure. 
Every time a user, application, or service makes an API request to a Google Cloud product, Service Control sits between the client and the backend. It is responsible for several tasks such as:
Verifying if the API request is authorized.
Enforcing quota limits (how many requests can be made).
Checking various policy r