k8s.af — screenshot of k8s.af

k8s.af

This site compiles public Kubernetes failure stories, offering a valuable resource to learn from operational incidents. It's an unvarnished look at common pitfalls and complex issues in K8s environments.

Visit k8s.af →

Questions & Answers

What is k8s.af?
k8s.af is a curated public list of links to blog posts, postmortems, and presentations detailing various failures and incidents encountered when operating Kubernetes clusters. It focuses on real-world problems and their resolutions.
Who is the target audience for k8s.af?
This resource is primarily for SREs, DevOps engineers, platform engineers, and anyone managing or developing applications on Kubernetes. It helps professionals anticipate potential issues and improve system resilience.
How does k8s.af differ from official Kubernetes documentation or general incident reports?
Unlike official documentation, k8s.af aggregates specific, real-world operational failures and their detailed postmortems from various companies. It provides practical, "in the trenches" insights often missing from theoretical guides.
When should I refer to k8s.af?
Refer to k8s.af when troubleshooting persistent or unusual Kubernetes issues, during system design to learn from past mistakes, or for general knowledge enhancement to prevent common operational blunders.
What kind of technical details can be found in the failure stories listed on k8s.af?
The stories often detail specific Kubernetes components involved, such as CNI plugins (e.g., Calico, AWS CNI), CPU limits, conntrack, kubelet, DNS, or etcd, along with their root causes and impacts like outages or high latency.