{"id":1293,"date":"2026-02-09T15:48:49","date_gmt":"2026-02-09T14:48:49","guid":{"rendered":"https:\/\/simon-frey.com\/blog\/?p=1293"},"modified":"2026-02-13T09:39:36","modified_gmt":"2026-02-13T08:39:36","slug":"vibeops-kubernetes","status":"publish","type":"post","link":"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/","title":{"rendered":"VibeOps: A Secure read-only setup for AI-Assisted Kubernetes (k8s) Debugging"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">There is a lot of noise right now about letting AI &#8220;fix&#8221; your infrastructure (be it via aws cli commands, or in the case for this article: kubernetes). You paste an error, the AI suggests a <code>kubectl apply<\/code>, and you hope it knows what it&#8217;s doing.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Do NOT work that way.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When production is acting up, you need to maintain a complete mental model of the system. If you let the AI be the driving force, you lose the overview.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Instead, use Claude for what I (and others) call <strong>&#8220;VibeDebugging&#8221;<\/strong>, getting a second opinion on the state of the cluster while I do the actual surgery.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When I have a problem\/an alert fires, I kick off two parallel streams:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>The Vibe Check (AI):<\/strong> I trigger Claude with MCP access to the kubernetes cluster with a broad directive. e.g. <em>&#8220;Analyze the logs and events for the <code>payment-service<\/code> in the <code>prod<\/code> namespace. Look for correlation with the database.&#8221;<\/em><\/li>\n\n\n\n<li><strong>The Deep Dive (me):<\/strong> While Claude is processing, I start my own investigation, checking pods, tailing logs, looking at Grafana&#8230;the same flow as I always do and did in the pre-AI days.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">This effectively forces me to stay in the loop. I don&#8217;t just blindly follow the AI; I verify its findings against what I am seeing myself. It\u2019s not an autopilot; it\u2019s a force multiplier that spots the weird log lines I might scroll past.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It would never feel comfortable doing this if the AI had write access or could read secrets. My rule for AI in Ops: <strong>Read-Only<\/strong>, No Secrets.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I don&#8217;t want an LLM to hallucinate a command that prints my DB passwords into a chat history, or have the AI start changing stuff in the cluster (and making the entire incident even worse, by introducing this non-deterministic agent)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Setup Kubernetes MCP Server in read-only mode<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Dedicated, read-only ServiceAccount<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We need a ServiceAccount that has permissions to see everything relevant (Deployments, Pods, Logs) but is strictly blind to sensitive data and can&#8217;t change anything.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Kubernetes Roles work with an explicit-allow strategy, hence we have to actively grant access to resources. There is a wildcard <code>*<\/code> option, but especially for the core group I advice against this, as we don&#8217;t want to have the agent be able to read secrets.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What exactly you want to give the agent\/mcp server access to, is your decision. Here is a basic setup, that grants access to most &#8220;standard&#8221; resources. (If you have operators and CRDs in your cluster, you might want to allow access to those as well)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Note: We are adding this into a new namespace <code>debug-access-ns<\/code> , so the full purge of the resources is as easy as just deleting that namespace. (To make sure there is no left-overs)<\/em><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">apiVersion: v1<br>kind: Namespace<br>metadata:<br>  name: debug-access-ns<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Hint: If you want to figure out all the resources in your cluster, use the command <code>kubectl api-resources<\/code>. NAME =&gt; &#8220;resource&#8221; and everything before the <code>\/<\/code> in APIVERSION =&gt; &#8220;apiGroup&#8221;<\/em><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: rbac.authorization.k8s.io\/v1\nkind: ClusterRole\nmetadata:\n  name: ai-read-everything-except-secrets\nrules:\n  # 1. Core API Group (The critical one)\n  # explicitly listing resources to exclude 'secrets'\n  - apiGroups: &#91;\"\"]\n    resources: \n      - bindings\n      - componentstatuses\n      - configmaps\n      - endpoints\n      - events\n      - limitranges\n      - namespaces\n      - nodes\n      - persistentvolumeclaims\n      - persistentvolumes\n      - pods\n      - pods\/log\n      - pods\/status\n      - podtemplates\n      - replicationcontrollers\n      - resourcequotas\n      - serviceaccounts\n      - services\n    verbs: &#91;\"get\", \"list\", \"watch\"]\n\n  # 2. All other common API Groups\n  # It is safe to wildcard '*' resources here because Secrets live in the Core group above.\n  - apiGroups: \n      - \"apps\"\n      - \"autoscaling\"\n      - \"batch\"\n      - \"cronjob\"\n      - \"extensions\"\n      - \"policy\"\n      - \"networking.k8s.io\"\n      - \"rbac.authorization.k8s.io\"\n      - \"storage.k8s.io\"\n      - \"apiextensions.k8s.io\"\n      - \"admissionregistration.k8s.io\"\n      - \"metrics.k8s.io\"\n      - \"discovery.k8s.io\"\n    resources: &#91;\"*\"]\n    verbs: &#91;\"get\", \"list\", \"watch\"]\n\n  # 3. Non-Resource URLs (Optional but recommended for full visibility)\n  # Allows checking healthz, version, and metrics endpoints\n  - nonResourceURLs: &#91;\"*\"]\n    verbs: &#91;\"get\"]<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Applying the service account<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: v1<br>kind: ServiceAccount<br>metadata:<br>  name: ai-debugger<br>  namespace: debug-access-ns<br>---<br>apiVersion: rbac.authorization.k8s.io\/v1<br>kind: ClusterRoleBinding<br>metadata:<br>  name: ai-debugger-binding<br>roleRef:<br>  apiGroup: rbac.authorization.k8s.io<br>  kind: ClusterRole<br>  name: ai-read-everything-except-secrets<br>subjects:<br>  - kind: ServiceAccount<br> &nbsp;  name: ai-debugger<br> &nbsp;  namespace: debug-access-ns<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">In modern Kubernetes version (1.24+), ServiceAccounts do not get long-lived tokens by default. You have to manually create a Secret to generate one.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: v1<br>kind: Secret<br>metadata:<br>  name: ai-debugger-token<br>  namespace: debug-access-ns<br>  annotations:<br> &nbsp;  kubernetes.io\/service-account.name: \"ai-debugger\"<br>type: kubernetes.io\/service-account-token<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Apply all that resources, then generate a dedicated <code>kubeconfig<\/code> for it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Creating the Kubeconfig<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now we need to extract the data (Token, CA Cert, and Server URL) and mash it into a valid <code>kubeconfig<\/code> file. You can use this bash script to generate a <code>readonly-config.yaml<\/code> automatically:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># 1. Get the Token<br>TOKEN=$(kubectl get secret ai-debugger-token -n debug-access-ns -o jsonpath='{.data.token}' | base64 --decode)<br>\u200b<br># 2. Get the Cluster CA Certificate<br>CA=$(kubectl config view --raw -o jsonpath='{.clusters&#91;0].cluster.certificate-authority-data}')<br>\u200b<br># 3. Get the API Server URL<br>SERVER=$(kubectl config view -o jsonpath='{.clusters&#91;0].cluster.server}')<br>\u200b<br># 4. Write the kubeconfig file<br>cat &lt;&lt;EOF &gt; ~\/.kube\/readonly-config.yaml<br>apiVersion: v1<br>kind: Config<br>clusters:<br>- cluster:<br> &nbsp;  certificate-authority-data: $CA<br> &nbsp;  server: $SERVER<br>  name: secure-cluster<br>contexts:<br>- context:<br> &nbsp;  cluster: secure-cluster<br> &nbsp;  user: ai-debugger<br>  name: secure-context<br>current-context: secure-context<br>users:<br>- name: ai-debugger<br>  user:<br> &nbsp;  token: $TOKEN<br>EOF<br>\u200b<br>echo \"File '~\/.kube\/readonly-config.yaml' created successfully.\"<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: The Locked-Down MCP Server<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now that we have the identity, we need to connect it. I use the <a href=\"https:\/\/github.com\/containers\/kubernetes-mcp-server\">Kubernetes MCP Server<\/a> to let Claude talk to the cluster.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But I don&#8217;t just run it. I lock it down with flags to ensure it can&#8217;t switch contexts or try to write data.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here is the command I use to add it to my Claude config:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>claude mcp add kubernetes --scope user -- npx -y kubernetes-mcp-server@v0.0.57 \\<br> &nbsp;--read-only \\<br> &nbsp;--kubeconfig ~\/.kube\/readonly-config.yaml \\ <br> &nbsp;--cluster-provider kubeconfig \\<br> &nbsp;--disable-multi-cluster<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">There are a few important things happening here:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Pin the Version (<code>@v0.0.57<\/code>):<\/strong> Never, ever use <code>@latest<\/code> for infrastructure tools. I don&#8217;t want an auto-update to change behavior or introduce a bug during a debugging session. Check the <a href=\"https:\/\/github.com\/containers\/kubernetes-mcp-server\/releases\/\">releases page<\/a> and pin it.<\/li>\n\n\n\n<li><strong><code>--kubeconfig<\/code><\/strong>: I point this explicitly to the restricted config we made above. Even if the MCP server code has a bug, the Kubernetes API will reject any write attempts.<\/li>\n\n\n\n<li><strong><code>--read-only<\/code><\/strong>: A second layer of defense. This tells the application layer to disable tool-calling for creating or updating resources.<\/li>\n\n\n\n<li><strong><code>--disable-multi-cluster<\/code><\/strong>: This keeps the AI focused. It ensures Claude works <em>only<\/em> on the specific cluster I pointed it to and can&#8217;t wander off into other contexts defined in my standard kubeconfig.<\/li>\n<\/ol>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n<div class=\"lazyblock-ad-LqOtG wp-block-lazyblock-ad\"><div style=\"display:none;font-family:sans-serif; border:2px solid #00000020;padding: 0.5em;margin-top:1em;margin-bottom:1em;\">\n  <div style=\"display:flex;justify-content:center;align-items:center;gap:10px;\">\n  <div style=\"line-height: 1.3em;text-align:left;\"><h3>Highly skilled DevOps\/SRE Freelancer<\/h3>\n  <p>I am Simon, the author of this blog. And I have great news: <b style=\"font-weight:bold;\">You can work with me<\/b><\/p>\n  <p>As DevOps and Infrastructure freelancer, I will help you choose the right Infrastructure technology for your company, fix your cloud problems and support your team in building scalable products.<\/p>\n  <p>I work with Golang, Docker, Kubernetes, Google Cloud, AWS and Terraform.<\/p>\n  <p>Checkout my <a href=\"https:\/\/simon-frey.com\/cv\" target=\"_blank\">CV<\/a> to learn more or directly contact me via the button below.<\/p>\n  <\/div>  \n  <img decoding=\"async\" data-src=\"https:\/\/simon-frey.com\/cv\/img\/simon-frey.jpg\" alt=\"Simon Frey Header image\" style=\"height:10em;\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" class=\"lazyload\">\n\n  <\/div>\n  \n  <a href=\"mailto:contact@simon-frey.com\" style=\"display:block;text-align:center;color:black;text-decoration:none;border:solid 2px black;padding:10px;border-radius:5px;margin-top:1em;\">Let&#8217;s work together!<\/a>\n<\/div><\/div>\n\n\n<p class=\"wp-block-paragraph\">This setup gives me the best of both worlds. I get the speed and correlation powers of AI, but I keep the situational awareness of a human engineer. I verify the infrastructure; Claude checks the vibes. And thanks to the setup, I know for a fact it can&#8217;t delete my database.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There is a lot of noise right now about letting AI &#8220;fix&#8221; your infrastructure (be it via aws cli commands, or in the case for this article: kubernetes). You paste an error, the AI suggests&hellip;<\/p>\n<p><a href=\"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/\" class=\"more-link\">Read more<span class=\"screen-reader-text\"> of VibeOps: A Secure read-only setup for AI-Assisted Kubernetes (k8s) Debugging<\/span><span aria-hidden=\"true\"> &rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[240,374,231],"tags":[],"class_list":["post-1293","post","type-post","status-publish","format-standard","hentry","category-ai","category-kubernetes-k8s","category-sre"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>VibeOps: A Secure read-only setup for AI-Assisted Kubernetes (k8s) Debugging - Blog by Simon Frey<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Simon Frey\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"VibeOps: A Secure read-only setup for AI-Assisted Kubernetes (k8s) Debugging - Blog by Simon Frey","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/","twitter_misc":{"Written by":"Simon Frey","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/#article","isPartOf":{"@id":"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/"},"author":{"name":"Simon Frey","@id":"https:\/\/simon-frey.com\/blog\/#\/schema\/person\/34753982b648415636ee7a079f3e19a3"},"headline":"VibeOps: A Secure read-only setup for AI-Assisted Kubernetes (k8s) Debugging","datePublished":"2026-02-09T14:48:49+00:00","dateModified":"2026-02-13T08:39:36+00:00","mainEntityOfPage":{"@id":"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/"},"wordCount":806,"publisher":{"@id":"https:\/\/simon-frey.com\/blog\/#\/schema\/person\/34753982b648415636ee7a079f3e19a3"},"articleSection":["AI","Kubernetes (k8s)","SRE"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/","url":"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/","name":"VibeOps: A Secure read-only setup for AI-Assisted Kubernetes (k8s) Debugging - Blog by Simon Frey","isPartOf":{"@id":"https:\/\/simon-frey.com\/blog\/#website"},"datePublished":"2026-02-09T14:48:49+00:00","dateModified":"2026-02-13T08:39:36+00:00","breadcrumb":{"@id":"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/simon-frey.com\/blog\/vibeops-kubernetes\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/simon-frey.com\/blog\/"},{"@type":"ListItem","position":2,"name":"VibeOps: A Secure read-only setup for AI-Assisted Kubernetes (k8s) Debugging"}]},{"@type":"WebSite","@id":"https:\/\/simon-frey.com\/blog\/#website","url":"https:\/\/simon-frey.com\/blog\/","name":"Blog by Simon Frey","description":"","publisher":{"@id":"https:\/\/simon-frey.com\/blog\/#\/schema\/person\/34753982b648415636ee7a079f3e19a3"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/simon-frey.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/simon-frey.com\/blog\/#\/schema\/person\/34753982b648415636ee7a079f3e19a3","name":"Simon Frey","logo":{"@id":"https:\/\/simon-frey.com\/blog\/#\/schema\/person\/image\/"},"sameAs":["https:\/\/simon-frey.com","https:\/\/www.linkedin.com\/in\/simonfrey\/","https:\/\/x.com\/eu_frey"]}]}},"_links":{"self":[{"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/posts\/1293","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/comments?post=1293"}],"version-history":[{"count":4,"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/posts\/1293\/revisions"}],"predecessor-version":[{"id":1303,"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/posts\/1293\/revisions\/1303"}],"wp:attachment":[{"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/media?parent=1293"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/categories?post=1293"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/simon-frey.com\/blog\/wp-json\/wp\/v2\/tags?post=1293"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}