Bomb squad — screenshot of github.com

Bomb squad

Bomb Squad is a Kubernetes sidecar for Prometheus, automatically detecting and silencing high cardinality series to maintain operational stability. I find it a crucial tool for preventing cardinality explosions.

Visit github.com →

Questions & Answers

What is Bomb Squad?
Bomb Squad is an alpha project implemented as a sidecar for Kubernetes-deployed Prometheus instances. Its primary function is to detect and suppress cardinality explosions by inserting dynamic silencing rules into Prometheus configurations.
Who is Bomb Squad for?
Bomb Squad is for SREs and operators managing Prometheus in Kubernetes environments who need to ensure operational stability. It helps keep Prometheus instances online and usable when faced with rapid cardinality inflation caused by misconfigured services or applications.
How does Bomb Squad differ from other Prometheus management tools?
Bomb Squad uniquely operates as a Kubernetes sidecar that automatically detects cardinality explosions and dynamically inserts metric_relabel_configs into ALL scrape configurations. This automated, real-time suppression within the Prometheus ecosystem differentiates it from manual or static remediation methods.
When should I use Bomb Squad?
You should use Bomb Squad when your Prometheus instances are vulnerable to high-cardinality issues, such as those caused by bad code deployments injecting unique identifiers into metric labels or runaway autoscaling. It is designed to prevent these 'explosions' from destabilizing your monitoring system.
What is a key technical detail about Bomb Squad's operation?
Bomb Squad monitors custom recording rules to detect cardinality explosions, specifically exploding label values. Once detected, it generates and injects dynamic metric_relabel_configs into all Prometheus scrape jobs and triggers a hot-reload of the Prometheus configuration to apply these suppression rules.