7 Key Insights from the State of DevSecOps Report (Sponsored)Datadog analyzed data from tens of thousands of orgs to uncover 7 key insights on modern DevSecOps practices and application security risks. Highlights:
Plus, learn proven strategies to implement infrastructure as code, automated cloud deploys, and short-lived CI/CD credentials. Disclaimer: The details in this post have been derived from the details shared online by the Google Engineering Team. All credit for the technical details goes to the Google Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them. On June 12, 2025, a significant portion of the internet experienced a sudden outage. What started as intermittent failures on Gmail and Spotify soon escalated into a global infrastructure meltdown. For millions of users and hundreds of companies, critical apps simply stopped working. At the heart of it all was a widespread outage in Google Cloud Platform (GCP), which serves as the backend for a vast ecosystem of digital services. The disruption began at 10:51 AM PDT, and within minutes, API requests across dozens of regions were failing with 503 errors. Over a few hours, the ripple effects became undeniable. Among consumer platforms, the outage took down:
The failure was just as acute for enterprise and developer tools:
In total, more than 50 distinct Google Cloud services across over 40 regions worldwide were affected. Perhaps the most significant impact came from Cloudflare, a company often viewed as a pillar of internet reliability. While its core content delivery network (CDN) remained operational, Cloudflare's authentication systems, reliant on Google Cloud, failed. This led to issues with session validation, login workflows, and API protections for many of its customers. The financial markets also felt the impact of this outage. Alphabet (Google’s parent) saw its stock fall by nearly 1 percent. The logical question that arose from this incident is as follows: How did a platform built for global scale suffer such a cascading collapse? Let’s understand more about it. Special Event: Save 20% on Top Maven Courses (Sponsored)Your education is expiring faster than ever. What you learned in college won’t help you lead in the age of AI. That's why Maven specializes in live courses with practitioners who have actually done the work and shipped innovative products:
This week only: Save 20% on Maven’s most popular courses in AI, product, engineering, and leadership to accelerate your career. Inside the OutageTo understand how such a massive outage occurred, we need to look under the hood at a critical system deep inside Google Cloud’s infrastructure. It’s called the Service Control. The Key System: Service ControlService Control is one of the foundational components of Google Cloud's API infrastructure. Every time a user, application, or service makes an API request to a Google Cloud product, Service Control sits between the client and the backend. It is responsible for several tasks such as:
|