<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[KubeLedger Engineering]]></title><description><![CDATA[Bringing High-Performance Computing (HPC) rigor to Kubernetes efficiency. We explore advanced strategies for resource optimization, capacity planning, and cost ]]></description><link>https://blog.kubeledger.io</link><generator>RSS for Node</generator><lastBuildDate>Sun, 19 Apr 2026 12:03:32 GMT</lastBuildDate><atom:link href="https://blog.kubeledger.io/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[What 6 Months of Tracking a Production OpenShift Cluster Revealed About Kubernetes Costs]]></title><description><![CDATA[Most Kubernetes teams track pod CPU and memory. Almost none track what the cluster actually costs to run.
We decided to change that. For the past 6 months, we have been running KubeLedger on our own O]]></description><link>https://blog.kubeledger.io/what-6-months-of-tracking-a-production-openshift-cluster-revealed-about-kubernetes-costs</link><guid isPermaLink="true">https://blog.kubeledger.io/what-6-months-of-tracking-a-production-openshift-cluster-revealed-about-kubernetes-costs</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[finops]]></category><category><![CDATA[openshift]]></category><category><![CDATA[cost-optimisation]]></category><category><![CDATA[kubeledger]]></category><dc:creator><![CDATA[Rodrigue Chakode]]></dc:creator><pubDate>Tue, 24 Mar 2026 08:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/6978bf83cfe27952e5ca8545/41fdb71f-48e2-40ce-88b5-9835bd43278c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>Most Kubernetes teams track pod CPU and memory. Almost none track what the cluster <strong>actually</strong> costs to run.</p>
<p>We decided to change that. For the past 6 months, we have been running KubeLedger on our own OpenShift cluster — the one we use daily to manage hosted control planes, multi-cluster operations, and security workloads.</p>
<p>The results were eye-opening. Here is what we found, with real numbers and zero embellishment.</p>
<h2>1. The Hidden 30% Tax</h2>
<p>Every Kubernetes node reserves resources for the operating system, kubelet, and system daemons. These resources are called <strong>non-allocatable</strong> — they are consumed but can never be scheduled to pods.</p>
<blockquote>
<p><strong>24–30% of total CPU</strong> was non-allocatable — consistent across every month for 6 months, never dropping below 24%.</p>
</blockquote>
<p>On our cluster, non-allocatable overhead consumed between 24% and 30% of total CPU every single month. That means for every 10 cores you pay for, 2.5 to 3 cores are invisible to <code>kubectl top nodes</code>.</p>
<p>Most monitoring tools report namespace-level usage and call it a day. They show you what pods consumed, but not the gap between what the node provides and what can actually be scheduled. This gap is your hidden tax.</p>
<p><strong>KubeLedger tracks both sides: allocatable and non-allocatable.</strong> Because you cannot optimize what you cannot see.</p>
<h2>2. Where CPU Actually Goes</h2>
<p>When we broke down CPU consumption by functional category, the picture became clear:</p>
<table>
<thead>
<tr>
<th>Category</th>
<th>CPU Share</th>
<th>What It Includes</th>
</tr>
</thead>
<tbody><tr>
<td>Platform (OpenShift)</td>
<td>31%</td>
<td>API Server, etcd, OVN, monitoring, OLM, DNS</td>
</tr>
<tr>
<td>Non-Allocatable</td>
<td>30%</td>
<td>OS, kubelet, system daemons</td>
</tr>
<tr>
<td>Multi-cluster (ACM)</td>
<td>16%</td>
<td>Hub controllers, observability, agents</td>
</tr>
<tr>
<td>Hosted Control Planes</td>
<td>14%</td>
<td>HCP #1 and HCP #2 API servers</td>
</tr>
<tr>
<td>Security (ACS)</td>
<td>8%</td>
<td>StackRox/RHACS operator and services</td>
</tr>
<tr>
<td>Other</td>
<td>1%</td>
<td>Storage, networking, misc operators</td>
</tr>
</tbody></table>
<p><strong>Only ~14% of CPU was available for hosted workloads.</strong> The remaining 86% was infrastructure tax — the cost of running the platform itself.</p>
<p>This is not unusual. Most enterprise OpenShift clusters with ACM, ACS, and ODF will see similar ratios. The difference is whether you can see it or not.</p>
<h2>3. Growth Nobody Noticed: 840 to 3,708 Cores</h2>
<p>KubeLedger tracked total cluster CPU consumption month by month:</p>
<table>
<thead>
<tr>
<th>Month</th>
<th>Total CPU (cores)</th>
<th>Change</th>
</tr>
</thead>
<tbody><tr>
<td>Sep 2025</td>
<td>840</td>
<td>Baseline</td>
</tr>
<tr>
<td>Oct 2025</td>
<td>1,796</td>
<td>+114%</td>
</tr>
<tr>
<td>Nov 2025</td>
<td>1,808</td>
<td>+1%</td>
</tr>
<tr>
<td>Dec 2025</td>
<td>1,928</td>
<td>+7%</td>
</tr>
<tr>
<td>Jan 2026</td>
<td>3,708</td>
<td>+92%</td>
</tr>
<tr>
<td>Feb 2026</td>
<td>1,936</td>
<td>-48% (partial month)</td>
</tr>
</tbody></table>
<blockquote>
<p><strong>+341% CPU growth</strong> from September 2025 to January 2026 — driven by HCP rollout, ACM scaling, and ACS deployment.</p>
</blockquote>
<p>The jump from September to January was driven by three events: the rollout of two hosted control planes (December), ACM scaling to manage additional clusters, and the deployment of the full ACS security stack.</p>
<p>Without historical tracking, this growth would have been invisible until the infrastructure bill arrived. KubeLedger stores 12 months of history with zero database management, powered by RRDtool.</p>
<p><strong>Capacity planning starts with data. Not guesswork.</strong></p>
<h2>4. Top 5 Consumers Own 72% of CPU</h2>
<p>In February 2026, just five namespaces consumed 72% of all allocatable CPU:</p>
<table>
<thead>
<tr>
<th>Rank</th>
<th>Namespace</th>
<th>CPU (cores)</th>
<th>Share</th>
</tr>
</thead>
<tbody><tr>
<td>#1</td>
<td>open-cluster-management (ACM)</td>
<td>187.6</td>
<td>26%</td>
</tr>
<tr>
<td>#2</td>
<td>rhacs-operator (ACS)</td>
<td>155.6</td>
<td>22%</td>
</tr>
<tr>
<td>#3</td>
<td>clusters-hcp2</td>
<td>140.4</td>
<td>10%</td>
</tr>
<tr>
<td>#4</td>
<td>clusters-hcp1</td>
<td>134.8</td>
<td>10%</td>
</tr>
<tr>
<td>#5</td>
<td>openshift-kube-apiserver</td>
<td>115.8</td>
<td>8%</td>
</tr>
</tbody></table>
<p>This is per-namespace accounting — the foundation for chargeback and showback. When a platform team can show exactly which components consume which resources, conversations about infrastructure cost become data-driven instead of political.</p>
<p><strong>KubeLedger is the System of Record that makes this data available without Prometheus queries or custom dashboards.</strong></p>
<h2>5. Memory: The Silent Budget Killer</h2>
<p>CPU gets all the attention in Kubernetes capacity discussions. Meanwhile, memory quietly drains budgets.</p>
<p>In January 2026, our cluster consumed 25 TiB of cumulative memory:</p>
<table>
<thead>
<tr>
<th>Category</th>
<th>Memory (TiB)</th>
<th>Share</th>
</tr>
</thead>
<tbody><tr>
<td>Platform (OpenShift)</td>
<td>10.6</td>
<td>42%</td>
</tr>
<tr>
<td>Multi-cluster (ACM)</td>
<td>3.9</td>
<td>16%</td>
</tr>
<tr>
<td>Security (ACS)</td>
<td>3.9</td>
<td>15%</td>
</tr>
<tr>
<td>Hosted Clusters</td>
<td>3.2</td>
<td>13%</td>
</tr>
<tr>
<td>Storage backends</td>
<td>2.3</td>
<td>9%</td>
</tr>
<tr>
<td>Non-Allocatable</td>
<td>0.8</td>
<td>3%</td>
</tr>
<tr>
<td>Other</td>
<td>0.3</td>
<td>1%</td>
</tr>
</tbody></table>
<p>Memory is often the first resource to hit limits — and the hardest to debug when OOM kills start cascading across pods. Unlike CPU, which can be throttled, memory pressure causes immediate evictions.</p>
<p><strong>KubeLedger tracks CPU and memory side by side, per namespace, over time.</strong> Because optimizing one while ignoring the other is half a strategy.</p>
<h2>6. 88 Namespaces, Zero Configuration</h2>
<p>Over the 6-month tracking period, our cluster grew from 77 to 88 namespaces:</p>
<table>
<thead>
<tr>
<th>Month</th>
<th>Namespaces</th>
<th>New additions</th>
</tr>
</thead>
<tbody><tr>
<td>Sep 2025</td>
<td>77</td>
<td>Baseline</td>
</tr>
<tr>
<td>Oct 2025</td>
<td>80</td>
<td>+3 (operators)</td>
</tr>
<tr>
<td>Nov 2025</td>
<td>81</td>
<td>+1</td>
</tr>
<tr>
<td>Dec 2025</td>
<td>87</td>
<td>+6 (HCP namespaces)</td>
</tr>
<tr>
<td>Jan 2026</td>
<td>88</td>
<td>+1</td>
</tr>
<tr>
<td>Feb 2026</td>
<td>88</td>
<td>Stable</td>
</tr>
</tbody></table>
<p>KubeLedger auto-discovers every namespace on the cluster and starts tracking immediately. No labels to add. No annotations to configure. No Prometheus rules to write.</p>
<p>Every new namespace was automatically picked up — including the HCP namespaces that appeared in December when we rolled out hosted control planes.</p>
<h2>7. Hourly Trends Reveal What Daily Averages Hide</h2>
<p>KubeLedger captures hourly granularity, revealing patterns that daily averages smooth away:</p>
<ul>
<li><p><strong>API Server:</strong> peaks at 0.7 cores during business hours, drops to 0.4 at night</p>
</li>
<li><p><strong>ACM:</strong> bursts up to 1.0 core during reconciliation cycles, with a baseline of 0.66</p>
</li>
<li><p><strong>Monitoring:</strong> fluctuates 0.3–0.7 cores depending on scrape intervals and alert evaluation</p>
</li>
<li><p><strong>HCP #2:</strong> shows the widest variance (0.45–1.25 cores) due to workload scheduling patterns</p>
</li>
</ul>
<p>These patterns matter for right-sizing. If you provision based on daily averages, you will either overprovision (wasting money) or underprovision (risking throttling during peaks).</p>
<p><strong>Daily averages hide the peaks. KubeLedger does not.</strong></p>
<h2>8. A Cost Monitoring Tool That Costs Almost Nothing</h2>
<p>We believe a cost monitoring tool should not cost you a fortune to run. Here is what KubeLedger consumes on the cluster it monitors:</p>
<blockquote>
<p><strong>0.016 cores / 12 MiB</strong> — that is 0.002% of total cluster CPU, tracking 88 namespaces with 12 months of history.</p>
</blockquote>
<p>For context, KubeLedger uses less CPU than a single sidecar container. The tool tracking 88 namespaces with hourly, daily, and monthly granularity runs on 16 millicores.</p>
<p>How? No Prometheus dependency. No external database. No storage backend to scale. KubeLedger uses RRDtool for fixed-size, append-only storage that never grows beyond its initial allocation.</p>
<p><strong>Less than 100 MB of storage for 1 year of retention.</strong> Deploy and get answers.</p>
<h2>Conclusion: You Cannot Optimize What You Cannot See</h2>
<p>Six months of tracking our own cluster taught us three things:</p>
<ol>
<li><p><strong>The hidden tax is real.</strong> 24–30% of CPU is non-allocatable and invisible to most tools. If you are not tracking it, you are underestimating your actual infrastructure cost.</p>
</li>
<li><p><strong>Growth is gradual until it is not.</strong> Our cluster went from 840 to 3,708 cores in 5 months. Without historical tracking, this is invisible until the bill arrives.</p>
</li>
<li><p><strong>Per-namespace accounting changes conversations.</strong> When you can show exactly which components consume which resources, cost discussions become evidence-based.</p>
</li>
<li><p><strong>And this is just CPU and memory.</strong> KubeLedger also tracks NVIDIA GPU utilization and memory consumption per pod — with the same per-namespace accounting, powered by DCGM Exporter metrics. As AI workloads drive GPU costs through the roof, the same visibility gap exists: teams pay for expensive GPUs without knowing how they are actually consumed.</p>
</li>
</ol>
<p>KubeLedger is the open-source System of Record that tracks the full picture of Kubernetes costs — revealing the 30% hidden in non-allocatable overhead for precise, per-namespace accounting.</p>
<hr />
<h3>Try KubeLedger</h3>
<p>Deploy on your cluster in under 5 minutes. Zero configuration. Zero database. Full visibility.</p>
<ul>
<li><p><strong>Website:</strong> <a href="https://kubeledger.io">kubeledger.io</a></p>
</li>
<li><p><strong>GitHub:</strong> <a href="https://github.com/realopslabs/kubeledger">github.com/realopslabs/kubeledger</a></p>
</li>
<li><p><strong>Follow RealOps Labs</strong> on LinkedIn for more production insights.</p>
</li>
</ul>
<hr />
<h3>Appendix: Methodology</h3>
<ul>
<li><p><strong>Cluster:</strong> OpenShift 4.x bare-metal, multi-node, running ACM, ACS, CNV, and ODF.</p>
</li>
<li><p><strong>Tracking period:</strong> September 2025 – February 2026 (6 months).</p>
</li>
<li><p><strong>Tool:</strong> KubeLedger (formerly kube-opex-analytics), deployed as a single pod with read-only RBAC access.</p>
</li>
<li><p><strong>Data source:</strong> Kubernetes Metrics API (resource.metrics.k8s.io) polled at 5-minute intervals.</p>
</li>
<li><p><strong>Non-allocatable calculation:</strong> Total node capacity minus allocatable capacity, as reported by the Kubernetes API.</p>
</li>
<li><p><strong>Storage:</strong> RRDtool round-robin databases. Fixed-size, no external database required.</p>
</li>
</ul>
<p><em>All data in this article comes from real CSV exports of KubeLedger running on our own OpenShift lab cluster. No data was simulated or adjusted.</em></p>
]]></content:encoded></item><item><title><![CDATA[Announcing KubeLedger: The Evolution of kube-opex-analytics]]></title><description><![CDATA[We are thrilled to announce a major milestone in our journey: kube-opex-analytics is now KubeLedger.
This is more than just a new logo. It represents a maturation of the project and a sharpened focus on what we do best: Kubernetes Resource Accounting...]]></description><link>https://blog.kubeledger.io/announcing-kubeledger-the-evolution-of-kube-opex-analytics</link><guid isPermaLink="true">https://blog.kubeledger.io/announcing-kubeledger-the-evolution-of-kube-opex-analytics</guid><category><![CDATA[Announcement]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[finops]]></category><category><![CDATA[Rebranding]]></category><category><![CDATA[accounting]]></category><dc:creator><![CDATA[Rodrigue Chakode]]></dc:creator><pubDate>Sat, 14 Feb 2026 07:45:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770887403175/8c7ebd2f-f179-4258-8375-ac3a8998acb5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We are thrilled to announce a major milestone in our journey: <strong>kube-opex-analytics is now KubeLedger.</strong></p>
<p>This is more than just a new logo. It represents a maturation of the project and a sharpened focus on what we do best: <strong>Kubernetes Resource Accounting.</strong></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1770969005048/9b49605f-a5da-412b-9831-05b791b5b8f4.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-why-the-change">Why the change?</h3>
<p>When <strong>Rodrigue Chakode</strong> first architected this tool, the goal was to bring <strong>High-Performance Computing (HPC) rigor</strong> to Kubernetes efficiency. The mission was to track resources with extreme precision without the "heaviness" of standard observability stacks.</p>
<p>Over time, we realized that our users weren't just using us for "analytics" (looking at graphs). They were using us as a <strong>Ledger</strong>—a System of Record to allocate costs, charge back to namespaces, and hold teams accountable for their usage.</p>
<p>The name <strong>KubeLedger</strong> reflects this reality. We are the accountant for your cluster, designed to reveal the <strong>30-50% of costs often hidden</strong> in unused resources and non-allocatable overhead.</p>
<h3 id="heading-what-is-changing">What is changing?</h3>
<h4 id="heading-1-the-name-amp-home">1. The Name &amp; Home</h4>
<p>The project has moved to a dedicated GitHub organization backed by <strong>RealOps Labs</strong>: <a target="_blank" href="http://github.com/realopslabs/kubeledger">github.com/realopslabs/kubeledger</a>. This ensures long-term stability, enterprise-grade focus, and continuous innovation for the community.</p>
<h4 id="heading-2-the-development-model-ensuring-sustainability">2. The Development Model: Ensuring Sustainability</h4>
<p>To ensure the long-term sustainability of KubeLedger in an era where open source projects are often commoditized by large platforms without contribution, we are adopting the <strong>Business Source License (BSL 1.1)</strong>.</p>
<p><strong>What does this mean for you?</strong></p>
<ul>
<li><p><strong>For Users &amp; Companies:</strong> Nothing changes. You can use, modify, and deploy KubeLedger internally for free, just as before.</p>
</li>
<li><p><strong>For Contributors:</strong> The source code remains fully available. Pull requests and community contributions are welcome.</p>
</li>
<li><p><strong>For Cloud Vendors:</strong> You cannot sell KubeLedger as a managed service (e.g. SaaS, PaaS).</p>
</li>
</ul>
<p><strong>Our Commitment to Openness:</strong> We believe in the open ecosystem. That is why our license includes a strict <strong>Change Date</strong>. After 4 years, every version of KubeLedger automatically converts to the permissive <strong>Apache 2.0 License</strong>, guaranteed. This model protects our ability to innovate today while ensuring the software eventually belongs to the public domain.</p>
<h4 id="heading-3-the-terminology">3. The Terminology</h4>
<p>We are standardizing our environment variables from <code>KOA_</code> to <code>KL_</code>. (Don't worry, the old variables will continue to work for the next 6 months to ensure a smooth transition).</p>
<h3 id="heading-what-remains-the-same">What remains the same?</h3>
<p><strong>The Efficiency.</strong> We still believe that a cost monitoring tool shouldn't cost you a fortune to run. KubeLedger remains:</p>
<ul>
<li><p><strong>Lightweight:</strong> &lt; 100MB memory footprint.</p>
</li>
<li><p><strong>Maintenance-Free:</strong> No database management required (thanks to RRDtool).</p>
</li>
<li><p><strong>Long-Term:</strong> 12 months of history out of the box.</p>
</li>
</ul>
<h3 id="heading-how-to-migrate">How to Migrate</h3>
<p>Moving to KubeLedger is straightforward. We have prepared a detailed <a target="_blank" href="https://www.kubeledger.io/docs/migration-from-kube-opex-analytics-to-kubeledger/">Migration Guide</a> to help you switch your Docker images and update your manifests.</p>
<h3 id="heading-a-note-from-realops-labs">A Note from RealOps Labs</h3>
<p>We are cloud specialists passionate about efficiency. By bringing HPC strategies to Cloud Native environments, we aim to eliminate waste and "guesswork" in capacity planning.</p>
<p>Thank you for trusting us with your cluster metrics. We are excited to build the future of Kubernetes Accounting with you.</p>
<p>— <strong>The RealOps Labs Team</strong></p>
]]></content:encoded></item><item><title><![CDATA[Stop Guessing Your Kubernetes Costs: Introducing kube-opex-analytics]]></title><description><![CDATA[TL;DR: kube-opex-analytics is a new open-source tool designed to help you track resource usage and allocate costs per namespace. It's lightweight, easy to deploy, and features built-in GPU tracking capabilities.
The Hidden Cost of Kubernetes
If you'r...]]></description><link>https://blog.kubeledger.io/stop-guessing-your-kubernetes-costs-introducing-kube-opex-analytics</link><guid isPermaLink="true">https://blog.kubeledger.io/stop-guessing-your-kubernetes-costs-introducing-kube-opex-analytics</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[finops]]></category><category><![CDATA[Devops]]></category><category><![CDATA[cost-optimisation]]></category><dc:creator><![CDATA[Rodrigue Chakode]]></dc:creator><pubDate>Tue, 27 Jan 2026 15:11:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769537826675/40c426ff-a253-42f3-8119-13dc1a5f2ca3.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>TL;DR:</strong> <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">kube-opex-analytics</a> is a ne<a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">w open-source tool</a> designed to help you track resource usage and allocate costs per namespace. It's lightweight, easy to deploy, and features built-in GPU tracking capabilities.</p>
<h2 id="heading-the-hidden-cost-of-kubernetes"><strong>The Hidden Cost of Kubernetes</strong></h2>
<p>If you're running Kubernetes, you know the drill: <strong>Setting requests and limits is an art, and paying the cloud bill is a painful reality.</strong></p>
<p>We've all been there—over-provisioning "just in case," or wondering why the cluster autoscaler is spinning up nodes when half the pods are idle. Efficiency is hard to track without expensive enterprise tools.</p>
<p>That's why I built <strong>kube-opex-analytics</strong>.</p>
<h2 id="heading-what-is-kube-opex-analytics"><strong>What is kube-opex-analytics?</strong></h2>
<p><a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">kube-opex-analytics</a> is an open-source tool designed to give you instant visibility into your Kubernetes cluster's resource consumption and cost allocation.</p>
<p>It focuses on what matters most for cost optimization:</p>
<ul>
<li><p><strong>Actual Usage vs. Requests</strong>: See exactly how much CPU and Memory your namespaces are using compared to what they reserved.</p>
</li>
<li><p><strong>Cost Allocation</strong>: Break down costs per namespace based on flexible billing models.</p>
</li>
<li><p><strong>Trends Over Time</strong>: visualize hourly, daily, and monthly trends to spot patterns.</p>
</li>
<li><p><strong>GPU Support</strong>: First-class citizen support for tracking NVIDIA GPU utilization (because those resources are very expensive!).</p>
</li>
</ul>
<p><img src="https://raw.githubusercontent.com/rchakode/kube-opex-analytics/main/screenshots/kube-opex-analytics-demo.gif" alt="Dashboard Demo" /></p>
<h2 id="heading-key-features"><strong>Key Features</strong></h2>
<h3 id="heading-1-real-time-amp-historical-analytics"><strong>1. 📊 Real-time &amp; Historical Analytics</strong></h3>
<p>Unlike some tools that only show "now," kube-opex-analytics consolidates metrics into hourly, daily, and monthly views. This is crucial for spotting that 3 AM spike or the weekend idle time.</p>
<h3 id="heading-2-cost-chargeback-amp-showback"><strong>2. 💰 Cost Chargeback &amp; Showback</strong></h3>
<p>Want to show the Data Science team how much their experiments are costing? The tool allows you to set hourly rates or use a ratio-based cost model to generate accurate cost reports per namespace.</p>
<h3 id="heading-3-efficiency-analysis"><strong>3. 📉 Efficiency Analysis</strong></h3>
<p>The "Usage Efficiency" view overlays your actual resource usage against your requested capacity. This is your "aha!" moment for right-sizing pods.</p>
<h3 id="heading-4-gpu-metrics"><strong>4. 🧠 GPU Metrics</strong></h3>
<p>With the rise of AI/ML workloads, GPU visibility is non-negotiable. We integrate with the NVIDIA DCGM exporter to show you who is actually saturating those A100s.</p>
<h2 id="heading-quick-start"><strong>Quick Start 🚀</strong></h2>
<p>You can have it running in your cluster in less than 2 minutes.</p>
<p><strong>Using Kustomize:</strong></p>
<pre><code class="lang-bash">kubectl create namespace kube-opex-analytics
kubectl apply -k https://github.com/rchakode/kube-opex-analytics/manifests/kustomize -n kube-opex-analytics
</code></pre>
<p><strong>Using Helm:</strong></p>
<pre><code class="lang-bash"><span class="hljs-comment"># Clone the repos</span>
git <span class="hljs-built_in">clone</span> https://github.com/rchakode/kube-opex-analytics.git --depth=1
<span class="hljs-built_in">cd</span> kube-opex-analytics

<span class="hljs-comment"># Create namespace</span>
kubectl create namespace kube-opex-analytics

<span class="hljs-comment"># Install with Helm</span>
helm upgrade --install kube-opex-analytics ./manifests/helm -n kube-opex-analytics

<span class="hljs-comment"># Watch pod status</span>
kubectl get pods -n kube-opex-analytics -w
</code></pre>
<p>Once deployed, just port-forward to see the dashboard:</p>
<pre><code class="lang-bash">kubectl port-forward svc/kube-opex-analytics 5483:80 -n kube-opex-analytics
</code></pre>
<p>Open <code>http://localhost:5483</code> and start optimizing!</p>
<h2 id="heading-why-open-source"><strong>Why Open Source?</strong></h2>
<p>Cloud cost management shouldn't be a luxury. By making this tool open source, I hope to help DevOps engineers and platform teams get the visibility they need without the enterprise price tag.</p>
<p>We are actively looking for contributors! Whether it's a new dashboard feature, a helm chart tweak, or just a bug report—your input is welcome.</p>
<ul>
<li><p><strong>GitHub</strong>: <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">rchakode/kube-opex-analytics</a></p>
</li>
<li><p><strong>License</strong>: Apache 2.0</p>
</li>
</ul>
<hr />
<p><em>Give it a spin and let me know what you think in the comments! 👇</em></p>
]]></content:encoded></item><item><title><![CDATA[Bringing Prometheus Metrics and Grafana Dashboard for Cost Allocation on Kubernetes Clusters]]></title><description><![CDATA[This story introduces a Prometheus Exporter along with a Grafana Dashboard intending to provide cost-oriented consolidated resource usage analytics for Kubernetes clusters. Those analytics actually aim at highlighting factual metrics to help organiza...]]></description><link>https://blog.kubeledger.io/kube-opex-analytics-prometheus-metrics-and-grafana-dashboard-for-cost-allocation-on-kubernetes-clusters</link><guid isPermaLink="true">https://blog.kubeledger.io/kube-opex-analytics-prometheus-metrics-and-grafana-dashboard-for-cost-allocation-on-kubernetes-clusters</guid><category><![CDATA[ Cost Allocation]]></category><category><![CDATA[Capacity Planning]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[#prometheus]]></category><category><![CDATA[Grafana]]></category><category><![CDATA[finops]]></category><dc:creator><![CDATA[Rodrigue Chakode]]></dc:creator><pubDate>Sun, 30 Jun 2019 10:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770973400002/847a6cca-2a3d-4ed6-bd7d-af457daee673.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This story introduces a Prometheus Exporter along with a Grafana Dashboard intending to provide cost-oriented consolidated resource usage analytics for Kubernetes clusters. Those analytics actually aim at highlighting factual metrics to help organizations easily make cost allocation and capacity planning decisions on short-, mid-, and long terms.</p>
<blockquote>
<p><strong><em>Sidenote:</em></strong> <em>Readers may also be interested in this</em> <a target="_blank" href="https://blog.kubeledger.io/introducing-an-analytics-tool-for-kubernetes-cost-allocation-and-capacity-planning"><em>related story</em></a></p>
</blockquote>
<h2 id="heading-introduction"><strong>Introduction</strong></h2>
<p>A couple of months ago <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">Kubernetes Opex Analytics</a> has been introduced as an original open source resource usage analytics tool to make cost sharing and capacity planning decisions easy on Kubernetes clusters. Released under the terms of Apache 2.0 License the tool comes with built-in analytics charts covering various use cases. It received a lot of user feedback, some requesting the ability to expose its metrics to existing Prometheus environments.</p>
<p>A Prometheus exporter has been introduced in version 0.3.0, so in this story I’ll be pleased to introduce and describe it. This comes along with an integrated Grafana dashboard to let users easily getting started. All that comes in complement of the native Kubernetes Opex Analytics features and dashboard.</p>
<h3 id="heading-what-will-be-covered-next"><strong>What will be covered next</strong></h3>
<p>The story will give details on the exposed metrics and provide steps to setup the Exporter and the provided Grafana dashboard. Before that we’ll review the core features of Kubernetes Opex Analytics for readers not familiar with it.</p>
<blockquote>
<p><em>If you’re already familiar with the concepts of Kubernetes Opex Analytics, you can skip the next section and move forward.</em></p>
</blockquote>
<h2 id="heading-kubernetes-opex-analytics-in-a-nutshell"><strong>Kubernetes Opex Analytics in a Nutshell</strong></h2>
<p>Kubernetes Opex Analytics is designed on the following core concepts:</p>
<ul>
<li><p><strong>Namespace-focused</strong>: Means that consolidated resource usage metrics consider individual namespaces as fundamental units for resource sharing. A special care is taken to also account and highlight <code>non-allocatable resources</code> .</p>
</li>
<li><p><strong>Hourly Usage &amp; Trends</strong>: Like on public clouds, resource use for each namespace is consolidated on a hourly-basic. This actually corresponds to the ratio (<code>%</code>) of resource used per namespace during each hour. It’s the foundation for cost calculation and also allows to get over time trends about resources being consuming per namespace and also at the Kubernetes cluster scale.</p>
</li>
<li><p><strong>Daily and Monthly Usage Costs:</strong> Provide for each period (daily/monthly), namespace, and resource type (CPU/memory), <em>consolidated cost</em> computed given one of the following ways: <em>(i)</em> accumulated hourly usage over the period; <em>(ii)</em> actual costs computed based on resource usage and a given hourly billing rate; <em>(iii)</em> normalized ratio of usage per namespace compared against the global cluster usage.</p>
</li>
<li><p><strong>Efficient Visualization:</strong> For metrics it generates, Kubernetes Opex Analytics provides dashboards with relevant charts covering as well the last couple of hours than the last 12 months (i.e. year) as shown below.</p>
</li>
</ul>
<p><img src="https://miro.medium.com/v2/resize:fit:1000/1*cGfFeGTUezdDmo7_zGBEtg.png" alt /></p>
<p>Kubernetes Opex Analytics — Screenshot of the Built-in Dashboard</p>
<h2 id="heading-installing-kubernetes-opex-analytics"><strong>Installing Kubernetes Opex Analytics</strong></h2>
<p>Next on we assume that the installation will be done on a Kubernetes cluster, as you can also install it on Docker as described <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics#start-koa-on-docker">here</a>.</p>
<p>There is a <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics/tree/master/helm/kube-opex-analytics">Helm chart</a> to ease the deployment on Kubernetes, either by using Helm <code>Tiller</code> or <code>kubectl</code>.</p>
<p>In both cases check the <code>values.yaml</code> file to modify the configuration options according to your needs (e.g. to have persistent volume for data storage).</p>
<p>Using <code>Helm Tiller</code>:</p>
<pre><code class="lang-plaintext">helm upgrade \
  --install kube-opex-analytics \
  helm/kube-opex-analytics/
</code></pre>
<p>Using <code>kubectl</code>:</p>
<pre><code class="lang-plaintext">helm template \
  --name kube-opex-analytics \
  helm/kube-opex-analytics/ | kubectl apply -f -
</code></pre>
<blockquote>
<p><em>This will enable the built-in dashboard via a HTTP service named</em> <code>kube-opex-analytics</code> on port <code>80</code>.</p>
</blockquote>
<h2 id="heading-prometheus-exporter"><strong>Prometheus Exporter</strong></h2>
<p>Metrics are exposed for Prometheus via the <code>/metrics</code> endpoint.</p>
<h3 id="heading-exposed-metrics"><strong>Exposed Metrics</strong></h3>
<p>As shown on the sample on Figure below the exposed metrics are:</p>
<ul>
<li><p><code>koa_namespace_hourly_usage</code> : exposes for each namespace its current hourly resource usage for both CPU and memory.</p>
</li>
<li><p><code>koa_namespace_daily_usage</code> : exposes for each namespace and for the ongoing day, its current resource usage for both CPU and memory.</p>
</li>
<li><p><code>koa_namespace_monthly_usage</code> : exposes for each namespace and for the ongoing month, its current resource usage for both CPU and memory.</p>
</li>
</ul>
<p>Press enter or click to view image in full size</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*obCxbKRH3j_i4Wf4zvgD8Q.png" alt /></p>
<p>Kubernetes Opex Analytics — Sample of Metrics Exposed for Prometheus</p>
<h3 id="heading-prometheus-scrape-job"><strong>Prometheus Scrape Job</strong></h3>
<p>The job can be configured like below. An interval less than 5 minutes (i.e. <code>300s</code>) is useless as no new metrics will be generated within the meantime.</p>
<pre><code class="lang-plaintext">scrape_configs:
  - job_name: 'kube-opex-analytics'
    scrape_interval: 300s
    static_configs:
      - targets: ['kube-opex-analytics:80']
</code></pre>
<blockquote>
<p><em>Recall that Kubernetes Opex Analytics works with hourly-consolidated metrics, so you may need to wait at least an hour to have all metrics available.</em></p>
</blockquote>
<h2 id="heading-grafana-dashboard"><strong>Grafana Dashboard</strong></h2>
<p>Once metrics available in Prometheus, get <a target="_blank" href="https://grafana.com/dashboards/10282">this Grafana Dashboard</a> and import it into Grafana. The dashboard relies on a variable <code>KOA_DS_PROMETHEUS</code> that shall point to your Prometheus data source.</p>
<p>Once configured properly the dashboard shall just work out of the box to display charts as described hereafter.</p>
<h3 id="heading-hourly-usage"><strong>Hourly Usage</strong></h3>
<p>There are two panels displaying respectively usage charts for CPU (left) and memory (right) over the selected interval (7 days by default). Series for the different namespaces are stacked. This makes usage comparison easy and also helps show how loaded the cluster is. On the below example we can see that during the last 5 days, global CPU and memory usage reached more than 90%.</p>
<p><img src="https://miro.medium.com/v2/resize:fit:1000/1*5bKOYCGVnlizBFeFEZMWRg.png" alt /></p>
<p>Kubernetes Opex Analytics — Hourly Resource Usage</p>
<h3 id="heading-current-days-usage"><strong>Current Day’s Usage</strong></h3>
<p>The two panels display respectively cost charts for CPU (left) and memory (right) over the ongoing day. Values are computed using cost algorithms described early in this story.</p>
<p><img src="https://miro.medium.com/v2/resize:fit:1000/1*PSEQP-oQ5MoqGt97RDIQag.png" alt /></p>
<p>Kubernetes Opex Analytics — Current’s Day Resource Usage</p>
<h3 id="heading-current-months-usage"><strong>Current Month’s Usage</strong></h3>
<p>The two panels below display respectively cost charts for CPU (left) and memory (right) over the ongoing month. Values computed using cost algorithms described early in this story.</p>
<p>Press enter or click to view image in full size</p>
<p><img src="https://miro.medium.com/v2/resize:fit:1000/1*mU0qKLMq8iIQ1gnl_6jPgg.png" alt /></p>
<p>Kubernetes Opex Analytics — Current’s Month Resource Usage</p>
<h2 id="heading-move-forward"><strong>Move forward</strong></h2>
<p>In short we’ve introduced in this story a Prometheus exporter as well as a Grafana dashboard for <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">Kubernetes Opex Analytics</a>.</p>
<p>As you may have noticed charts of the described Grafana dashboard are less rich than the ones of the built-in dashboard of Kubernetes Opex Analytics. For instance daily and monthly usage are limited to the current day, respectively current month. This makes difficult to compare current usage with previous ones. <a target="_blank" href="https://community.grafana.com/t/display-stacked-series/3402/15">These are limitations inherent in how Grafana handles bar charts based on series names</a>. The current implementation hence leaves room for further improvements, any contributions will be really appreciated.</p>
<p>I want to recall that Kubernetes Opex Analytics is open source, it’s an open-to-contribution project. We’re always pleased to receive feedback and contributions on <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">Github</a>: submit an issue if you encounter problems or have some ideas of improvement; make a pull request; or give a star.</p>
<p>Enjoy!</p>
]]></content:encoded></item><item><title><![CDATA[Introducing an Analytics Tool for Kubernetes Cost Allocation and Capacity Planning]]></title><description><![CDATA[[Article originally published on Medium]
Usage accounting and cost allocation are two of mains operational problems for production Kubernetes clusters. This article is not just another discussion about the problem, but a pragmatic contribution that g...]]></description><link>https://blog.kubeledger.io/introducing-an-analytics-tool-for-kubernetes-cost-allocation-and-capacity-planning</link><guid isPermaLink="true">https://blog.kubeledger.io/introducing-an-analytics-tool-for-kubernetes-cost-allocation-and-capacity-planning</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[cost-optimisation]]></category><category><![CDATA[Cost efficiency]]></category><category><![CDATA[Cloud Usage Analysis]]></category><category><![CDATA[finops]]></category><dc:creator><![CDATA[Rodrigue Chakode]]></dc:creator><pubDate>Sun, 19 May 2019 10:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1770368491579/d103c4d2-d672-491f-8f68-32c1ade67e8e.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>[Article originally <a target="_blank" href="https://rodrigue-chakode.medium.com/kubernetes-resource-usage-analytics-for-cost-allocation-and-capacity-planning-416800e85d16"><em>published</em></a> <em>on Medium</em>]</p>
<p>Usage accounting and cost allocation are two of mains operational problems for production Kubernetes clusters. This article is not just another discussion about the problem, but a pragmatic contribution that goes further and introduces <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">Kubernetes Opex Analytics</a> — an open source project (Apache License 2.0) that provides, short-, mid- and long-term CPU and memory resource usage analytics for cost allocation and capacity planning on Kubernetes clusters. The provided analytics are computed per namespace basis with different time aggregation perspectives that cover up to a year. Beyond namespaces, the analytics also highlight part of resource dedicated to the infrastructure (i.e. non-allocatable capacity) as illustrated by the following screenshot.</p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*vIFEMU73dqttBR8BnOqaoA.png" alt /></p>
<p>Kubernetes Opex Analytics — A Sample of Monthly Resource Usage per Namespace.</p>
<h2 id="heading-before-the-tool-a-story"><strong>Before the Tool, a Story</strong></h2>
<p>A few months ago I decided to give a try to <a target="_blank" href="https://cloud.google.com/kubernetes-engine/">Google Kubernetes Engine</a> (GKE). Going on my way I found that Google offers Free Tier accounts with $300 that you can spend at will for any Google Cloud Platform products you want. I found that really interesting to have sufficient money to try out GKE with no limited capacity as it can be the case elsewhere. Then I asked myself how to make these free $300 more beneficial than start and stop GKE clusters? Okay, I would have learnt GKE things at no cost, but…</p>
<p>But I decided to spend my free $300 in a more productive way, so I started working on Kubernetes Opex Analytics while trying out GKE. Today I’m very happy for my journey through GKE and really proud to contribute back Kubernetes Opex Analytics to the cloud native and Kubernetes communities.</p>
<h2 id="heading-how-it-works"><strong>How it Works</strong></h2>
<p>Kubernetes Opex Analytics periodically collects CPU and memory usage metrics from Kubernetes’s APIs, stores, processes and consolidates them over time to produce its target analytics.</p>
<h3 id="heading-what-kubernetes-apis"><strong>What Kubernetes APIs</strong></h3>
<p>Kubernetes Opex Analytics needs read-only access to the following APIs:</p>
<ul>
<li><p>/apis/metrics.k8s.io/v1beta1</p>
</li>
<li><p>/api/v1</p>
</li>
</ul>
<h3 id="heading-current-analytics"><strong>Current Analytics</strong></h3>
<p>The following analytics are currently implemented, but they are expected to be extended in future versions:</p>
<ul>
<li><p><strong>One-week CPU and Memory Usage Trends</strong> as consolidated hourly usage per namespace and globally for a cluster over the last 7 days.</p>
</li>
<li><p><strong>Two-weeks Daily CPU and Memory Usage</strong> per namespace as cumulative hourly usage for each namespace during each day of the last 14 ones.</p>
</li>
<li><p><strong>One-year Monthly CPU and Memory Usage</strong> per namespace as cumulative daily usage for each namespace during each month of the last 12 ones.</p>
</li>
<li><p><strong>Last Nodes’ Occupation by Pods</strong> providing for each node the share of resources used by active pods on the node.</p>
</li>
</ul>
<blockquote>
<p><strong><em>Edit: May 12, 2019.</em></strong> <em>Starting from version 0.2.0, Kubernetes Opex Analytics introduces other cost allocation models. Refer to the</em> <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics/blob/master/README.md"><em>README</em></a> <em>to learn how to configure cost model.</em></p>
</blockquote>
<h2 id="heading-getting-started"><strong>Getting Started</strong></h2>
<p>This section will show you how to install and get started with Kubernetes Opex Analytics in less than 5 minutes.</p>
<h3 id="heading-requirements"><strong>Requirements</strong></h3>
<p>We assume that you use a proxied access to Kubernetes APIs as follows:</p>
<pre><code class="lang-plaintext">$ kubetcl proxy
</code></pre>
<p>This command will make the proxy access available at <a target="_blank" href="http://127.0.0.1:8001/"><code>http://127.0.0.1:8001</code></a> .</p>
<h3 id="heading-start-kubernetes-opex-analytics"><strong>Start Kubernetes Opex Analytics</strong></h3>
<p>Kubernetes Opex Analytics is released as a Docker image. So you can quickly start an instance as follows:</p>
<pre><code class="lang-plaintext">$ docker run -d \
        --net="host" \
        --name 'kube-opex-analytics' \
        -v /var/lib/kube-opex-analytics:/data \
        -e KOA_DB_LOCATION=/data/db \
        -e KOA_K8S_API_ENDPOINT=http://127.0.0.1:8001 \
        rchakode/kube-opex-analytics
</code></pre>
<p>This will make the web access available at <a target="_blank" href="http://127.0.0.1:5483/"><code>http://127.0.0.1:5483/</code></a> .</p>
<p>This command provides:</p>
<ul>
<li><p>A local path <code>/var/lib/kube-opex-analytics</code> as data volume for the container. That's where Kubernetes Opex Analytics will store its internal analytics data. You can change the local path to another location, but you MUST take care to adapt the <code>KOA_DB_LOCATION</code> environment variable accordingly.</p>
</li>
<li><p>An environment variable <code>KOA_DB_LOCATION</code> pointing to the path to use by Kubernetes Opex Analytics to store its internal data. You can remark that this directory belongs to the data volume atached to the container.</p>
</li>
<li><p>An environment variable <code>KOA_K8S_API_ENDPOINT</code> setting the address of the Kubernetes API endpoint.</p>
</li>
</ul>
<h3 id="heading-watch-analytics-charts"><strong>Watch Analytics Charts</strong></h3>
<p>Due to the time needed to have sufficient data to consolidate, you may need to wait almost a hour to have all charts filled. This is a normal operations of Kubernetes Opex Analytics.</p>
<p>When everything will be ready you should be able to watch charts as below.</p>
<blockquote>
<p><em>Each chart provides tooltips (mouse hover action) giving details on resource usage for each individual namespace. Furthermore, you can also export each chart as PNG image and its associated dataset in JSON and CSV formats.</em></p>
</blockquote>
<p><strong>Last Week Hourly Resource Usage Trends</strong></p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*uoQOdklF8XCgeVyRLGQ7dg.png" alt /></p>
<p><strong>Two-weeks Daily CPU and Memory Usage</strong></p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*X9xiWRX0mUPxl0-WVc0uzw.png" alt /></p>
<p><strong>One-year Monthly CPU and Memory Usage</strong></p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*vIFEMU73dqttBR8BnOqaoA.png" alt /></p>
<p><strong>Last Nodes’ Occupation by Pods</strong></p>
<p><img src="https://miro.medium.com/v2/resize:fit:700/1*Fz7b05UONEagJr17H5FFng.png" alt /></p>
<h2 id="heading-summary-amp-next-steps"><strong>Summary &amp; next steps</strong></h2>
<p>I’m very happy to share this open source project with cloud-native and Kubernetes community. I hope that you’ll enjoy it. <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">Kubernetes Opex Analytics</a> is expected to be a fast moving project, so we welcome contributions in any forms (feedback, documentation, code…). Please submit any issues or enhancement requests <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics/issues">here</a>. If you like this work, consider to give it a star on <a target="_blank" href="https://github.com/rchakode/kube-opex-analytics">Github</a>. You can also fork it, and make a pull request if you wan to share back improvements you made on your side!</p>
<blockquote>
<p><em>Edit — June 30, 2019: Kuberneres Opex Analytics now enables a Prometheus Exporter.</em> <a target="_blank" href="https://medium.com/swlh/bringing-prometheus-metrics-and-grafana-dashboard-for-cost-allocation-on-kubernetes-clusters-1ee7f68cd677"><em>Read the related story</em></a><em>.</em></p>
</blockquote>
]]></content:encoded></item></channel></rss>