Skip to main content
News

KubeCon Europe 2026 — When Kubernetes Becomes the "Operating System" for Global AI Infrastructure

At KubeCon EU 2026 in Amsterdam, 13,500+ attendees confirmed that Kubernetes is now the control plane for AI — with Red Hat, IBM, and Google donating llm-d to CNCF in a move that could reshape inference across the industry.

1 Apr 202611 minCNCF Blog
KubernetesKubeConAI InfrastructureCNCFDevOpsCloud Native

Introduction — The Event That Proved "Kubernetes Has Won"

Last week, Amsterdam became the center of the cloud native world as KubeCon + CloudNativeCon Europe 2026 drew the largest attendance in its history — more than 13,500 people including engineers, architects, CTOs, and DevOps leaders from around the globe.

But more interesting than the headcount was the event’s clear shift in focus.

Last year, everyone was talking about "AI hype." This year, everyone was talking about AI infrastructure.

Jonathan Bryce, Executive Director of the CNCF, opened the event by declaring that cloud native is entering its "second founding era" — a phase where Kubernetes is no longer just a container management tool, but is becoming the operating system for AI workloads.

In this article, we’ll break down what happened in Amsterdam and look at what it means for Thai organizations now planning their AI and cloud native strategies.


The Numbers Say It All — Cloud Native in 2026

Before diving deeper, here are the figures CNCF shared at the event:

  • 19.9 million — cloud native developers worldwide (up 28% in six months)
  • 7.3 million — developers working specifically on AI in cloud native environments
  • 82% — Kubernetes adoption among enterprise organizations
  • 2 out of 3 — the share of generative AI workloads already running on Kubernetes
  • $255 billion — projected inference market value by 2030
  • 67% — the share of AI compute expected to be inference, not training, by the end of 2026

All of these numbers point to the same conclusion: AI is moving from the lab into production, and Kubernetes is the foundation most organizations are choosing.

But there was one especially striking figure: although 82% are already using Kubernetes, only 7% deploy AI to production every day. That is the "execution gap" this year’s event was trying to address.


llm-d — When Red Hat, IBM, and Google Join Forces to Build a Standard for AI Inference

The biggest announcement of the event was the donation of llm-d to CNCF as a Sandbox project.

What is llm-d? In simple terms, it’s a "blueprint" for running large AI models efficiently on Kubernetes. It was developed through a collaboration between IBM Research, Red Hat, and Google Cloud, with support from NVIDIA, CoreWeave, AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI.

Why does it matter?

The hardest part of running LLMs in production is not just choosing the right model. It’s how to manage inference so it stays cost-efficient, fast, and scalable.

In reality, serving AI models is highly complex. You need to manage GPU memory, route requests to the right replicas, scale up and down under unpredictable traffic, and keep latency low at the same time. Today, most organizations have to stitch together multiple tools on their own — which is difficult and expensive.

llm-d addresses this with several core ideas:

  • Intelligent routing — sends requests to the replica with the most suitable cache state instead of distributing traffic randomly
  • Disaggregated inference — separates prompt processing from token generation so each can scale independently
  • Hierarchical caching — manages cache across multiple layers, from GPU to CPU to storage
  • Hardware-aware autoscaling — scales based on the actual state of the hardware, not just generic metrics

Real-world results

In tests using the Qwen3-32B model, llm-d was able to maintain near-zero latency while scaling up to ~120,000 tokens per second across eight pods — significantly outperforming standard Kubernetes services under real load.

Why donate it to CNCF?

Red Hat explained that llm-d was created to "close the gap between AI experimentation and production" — and that the best way to do that is through neutral governance that no single vendor controls.

IBM Research reinforced that the goal is a "vendor-agnostic, Kubernetes-native blueprint for high-performance inference that any enterprise can adopt."

This is not just a technical move — it’s also a business signal that the industry is moving toward open standards for AI inference.


NVIDIA Donates a GPU Driver to Kubernetes — A Turning Point for GPU Orchestration

Another major announcement was that NVIDIA joined CNCF as a Platinum Member and donated its GPU Dynamic Resource Allocation (DRA) Driver to the Kubernetes community.

Why is this such a big deal?

Until now, GPU allocation in Kubernetes has relied heavily on vendor-specific plugins, creating lock-in and incompatibility across hardware platforms. Donating the DRA driver means GPU scheduling is becoming part of standard Kubernetes — no longer tied to a single vendor.

AWS, Broadcom, Canonical, Google Cloud, Microsoft, Nutanix, Red Hat, and SUSE all backed the effort.

NVIDIA also announced that KAI Scheduler, a scheduler purpose-built for AI workloads, has been accepted as a CNCF Sandbox project as well.

For enterprises, the takeaway is simple: GPUs are becoming first-class citizens in Kubernetes, just as CPU and memory already are.


SNCF Wins Top End User Award — How France’s Railway Proved Kubernetes Works at Any Scale

This year’s CNCF Top End User Award went to SNCF, the French national railway operator — and their story is remarkable.

The numbers are impressive:

  • Migrated more than 2,000 applications to the cloud, with 70% using Kubernetes as a unified control plane
  • Managing more than 200 clusters across AWS and Azure
  • Built a private cloud with OpenStack for workloads requiring data sovereignty
  • Achieved public cloud parity with full automation on an open source platform

What makes this especially compelling is that SNCF is not a tech company — it is a railway operator with more than 200,000 employees and systems that support 5 million passengers per day.

If an organization at this scale can transform successfully, the question for Thai enterprises becomes: why haven’t we done the same yet?

Saxo Bank — Another award-winning case study

Saxo Bank of Denmark won the End User Case Study Contest for extending GitOps automation far beyond containerized workloads to include databases, identity providers, and other enterprise systems.

The result: instead of manual provisioning that once took days, they now run more than 1,800 automated operations completed in just minutes.


How Did Kubernetes Become the "Operating System for AI"?

A question many people are asking is: why did Kubernetes win? Why not some other platform?

The answer lies in three key pillars that no other platform currently matches as well in combination:

1. Dynamic Resource Allocation (DRA) — GPUs Become a Standard Resource

With NVIDIA donating the GPU DRA driver, GPUs, TPUs, and other accelerators can now be allocated through Kubernetes more like CPUs — without custom workarounds.

2. Gateway API Inference Extension — Intelligent Routing for AI

Kubernetes is developing an extension specifically for routing AI traffic — sending requests to model replicas with the right cache state, rather than just using round-robin load balancing.

3. LeaderWorkerSet — Orchestration for Distributed Inference

This new Kubernetes primitive is designed to manage multi-node AI workloads, allowing large models that require multiple GPUs across several nodes to operate as a single coordinated unit.

Together, these three components allow Kubernetes to manage the full lifecycle of AI models — from deployment, serving, and scaling to monitoring and cost governance.

Microsoft went so far as to state publicly that "Kubernetes is the operating system for AI Infrastructure" — not just as an analogy, but as an official positioning.


Why Does All of This Matter for Thai Enterprises?

This is where we need to bring the conversation closer to home.

The challenge Thai organizations are facing

Many organizations in Thailand are currently stuck in the same situation: they’ve invested in AI, but the results still haven’t materialized. They have trained models that can’t be deployed, successful proofs of concept that can’t scale, and teams that are still struggling to bridge the gap.

This is exactly the same "execution gap" highlighted at KubeCon — the gap between AI that works in a notebook and AI that works in production.

5 lessons from KubeCon for Thai CTOs

1. Kubernetes is no longer nice-to-have — it is the foundation

When 82% of enterprises globally use Kubernetes and two-thirds of GenAI workloads run on it, not having a Kubernetes strategy effectively means being left out of the AI ecosystem as it grows.

2. Inference, not training, is the next battleground

Training a model is expensive, but it usually happens once — or a few times. Inference happens every single time a user interacts with the system. If it’s not managed well, inference cost will become the biggest recurring expense and a financial black hole.

3. Vendor lock-in is a risk you need to avoid

The fact that llm-d, the GPU DRA driver, and KAI Scheduler are all being donated to CNCF sends a strong message: open standards will win. Organizations tied too closely to a single vendor are building unnecessary risk into their future.

4. "Many small models" is better than "one giant model"

One of the clearest themes from KubeCon was that the future of AI lies in smaller models fine-tuned for specific tasks, not massive frontier models. That is good news for Thai enterprises — because it means you don’t need a huge GPU cluster, but you do need infrastructure that can manage multiple models efficiently.

5. Platform Engineering + AI = teams that move 10x faster

Combining platform engineering with AI infrastructure on Kubernetes gives development teams the ability to deploy AI features on their own — without waiting on infrastructure teams every time.


Looking Ahead — What Will AI Infrastructure Look Like in 2027?

Based on what we saw at KubeCon Europe 2026, several trends are becoming clear:

Inference will outgrow training

By the end of 2026, inference is expected to account for 67% of all AI compute, and that share will continue growing toward 93.3 GW of compute capacity by 2030. Inference startups with combined valuations above $12 billion — including Baseten, Fireworks, and Modal — are strong evidence that capital is already flowing in this direction.

GPU orchestration will become standard

NVIDIA’s donation of the DRA driver means that within the next 1–2 years, GPU scheduling may become as simple as CPU scheduling. That would dramatically lower the barrier for mid-sized enterprises.

A real inference standard will emerge

llm-d + Gateway API Inference Extension + LeaderWorkerSet are likely to become the standard "inference stack" on Kubernetes — much like Ingress, Service Mesh, and Prometheus became foundational standards in the earlier cloud native era.

Organizations that are ready will gain the advantage

The gap between the 7% deploying AI every day and the 82% already running Kubernetes represents a massive opportunity for organizations that can close that gap early.


Conclusion — Kubernetes Is No Longer Just a Container Orchestrator

KubeCon Europe 2026 in Amsterdam delivered a clear message:

Kubernetes is evolving from a container orchestrator into an operating system for AI.

When major players such as IBM, Red Hat, Google, and NVIDIA all choose to donate core technology to open source, it signals that this is not just another trend — it is a paradigm shift that will permanently change how organizations build and run AI.

For Thai enterprises, the question is no longer "Should we use Kubernetes?" It is now "When will we start building an AI-ready Kubernetes platform?"

Because the longer you wait, the wider the gap between you and your competitors will become.


Ready to Upgrade Your Organization’s AI Infrastructure?

Enersys has a team specializing in Kubernetes, cloud native architecture, and AI infrastructure, ready to help Thai organizations close the "execution gap" — from platform strategy to deploying AI workloads in production.

Talk to the Enersys team today >>>


References

"Empowering Innovation,
Transforming Futures."

ติดต่อเราเพื่อทำให้โปรเจกต์ของคุณเป็นจริง