Machine learning in production requires more than just a trained model—you need reliable orchestration, repeatable deployments, seamless scaling and robust infrastructure. Kubeflow delivers exactly that by sitting on top of Kubernetes to provide portable, consistent MLOps that work everywhere: your laptop, on-premises servers, or in the cloud. With Seeweb’s reliable cloud-based GPU service, you can serve GPU-accelerated models without having to worry about strenuous set up effort and cost.
In this comprehensive guide, you’ll learn how to set up Kubeflow on Ubuntu using MicroK8s, enable NVIDIA GPU support, and serve a GPU-accelerated model with KServe backed by NVIDIA Triton Inference Server provided by Seeweb.
Why Choose Kubeflow and GPUs?
The Kubeflow Advantage
Kubeflow provides the essential building blocks for production MLOps:
- Pipelines for orchestrating complex, multi-step ML workflows
- Notebooks (Jupyter/VS Code) that run directly on cluster resources
- KServe for production-grade model serving with autoscaling, canary rollouts, and multi-framework support
The GPU Performance Boost
Modern deep learning models thrive on parallel processing, making GPUs essential for inference workloads. Here’s what you gain:
- Lower latency: Milliseconds instead of seconds
- Higher throughput: Process more predictions per second
- Better resource utilization: Maximize compute-intensive model performance
While this tutorial focuses on KServe + Triton for GPU-accelerated serving, you’ll have access to the complete Kubeflow toolbox once your cluster is running.
What You’ll Need
Before we begin, make sure you have:
- Ubuntu 22.04+ (64-bit)
- NVIDIA GPU with recent drivers
- 8+ GB RAM and 4+ CPU cores (more for heavier models)
- ~20 GB free disk space for container images
Pro Tip: If you’re running on bare metal, prepare a small IP range (e.g., 10.64.140.43-10.64.140.49) for MetalLB, the load balancer used by MicroK8s in non-cloud environments.
Step 1: Set Up NVIDIA Drivers and Container Support
Before running GPU-accelerated Kubernetes workloads, we need to ensure your system can recognize and use the GPU properly. This involves installing the correct NVIDIA driver and enabling container runtime GPU support.
Install NVIDIA GPU Drivers
First, let’s identify and install the recommended driver:
sudo apt update
ubuntu-drivers devices
# Install the recommended driver (example shows version 550)
sudo apt install -y nvidia-driver-550
sudo reboot
After rebooting, verify the installation:
nvidia-smi
You should see your GPU details, driver version, and CUDA information. If nothing appears, double-check your Secure Boot settings and driver versions.
NVIDIA Container Toolkit (Optional)
Note: MicroK8s uses containerd and its GPU addon automatically configures the NVIDIA runtime. Only install this toolkit manually if you want Docker containers on the host to access the GPU.
# Add NVIDIA's APT repository and install the toolkitcurl-fsSL | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpgcurl-s -L | \
sed 's#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
# Configure Docker (if you use it locally)
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Quick Docker test
sudo docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
Step 2: Install Kubernetes with MicroK8s
Now we’ll set up a Kubernetes cluster using MicroK8s—a lightweight, fully-featured Kubernetes environment that’s perfect for both local development and small-scale production deployments.
Install MicroK8s and Configure kubectl
sudo snap install microk8s --classic
sudo usermod -a -G microk8s $USER
mkdir -p ~/.kube && chmod 700 ~/.kube
# Re-enter your shell to apply group membership
newgrp microk8s
# Optional: Use your host kubectl instead of microk8s kubectl
microk8s config > ~/.kube/config
Enable Essential Add-ons
Enable the required add-ons, including GPU support. Remember to replace the MetalLB IP range with one appropriate for your network:
microk8s enable dns hostpath-storage metallb:10.64.140.43-10.64.140.49 rbac gpu
Here’s what each add-on provides:
- dns: Service discovery within the cluster
- hostpath-storage: Simple default StorageClass (ideal for single-node/development)
- metallb: External IP addresses for bare metal deployments
- rbac: Authentication and authorization primitives used by Kubeflow
- gpu: NVIDIA GPU Operator, device plugin, and container runtime configuration
Verify GPU Add-on Installation
Check that the GPU components are working correctly:
# The validator should report success
microk8s kubectl logs -n gpu-operator-resources \
-l app=nvidia-operator-validator -c nvidia-operator-validator
# Optional: Run a quick CUDA test in the cluster
microk8s kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata: { name: cuda-vector-add }
spec:
restartPolicy: OnFailure
containers:
- name: cuda
image: k8s.gcr.io/cuda-vector-add:v0.1
resources:
limits:
nvidia.com/gpu: 1
EOF
# Check the test results
microk8s kubectl logs cuda-vector-add
Wait for all pods to become ready:
microk8s kubectl get pods -A -w
Step 3: Deploy Kubeflow
With Kubernetes running, we’ll deploy Charmed Kubeflow (CKF) using Juju. This installation includes all Kubeflow components: Pipelines, Notebooks, KServe, Istio, and authentication.
Install and Configure Juju
# Install Juju
sudo snap install juju --classic
# Bootstrap a controller on MicroK8s
juju bootstrap microk8s microk8s-localhost
# Create the Kubeflow model (must be named exactly 'kubeflow')
juju add-model kubeflow
# Deploy the latest stable CKF bundle
juju deploy kubeflow --trust
# Monitor the deployment progress
juju status --watch 5s
Configure Authentication and Access the Dashboard
Once all applications show as “active” in the juju status:
# Set up basic authentication credentials
juju config dex-auth static-username=admin
juju config dex-auth static-password=admin
# Get the dashboard IP address
microk8s kubectl -n kubeflow \
get svc istio-ingressgateway-workload \
-o jsonpath="{.status.loadBalancer.ingress[0].ip}"
Open the displayed IP address in your browser and log in using the credentials you just configured (admin/admin).
Important: Charmed Kubeflow includes its own Istio ingress gateway, so you don’t need the MicroK8s ingress (NGINX) add-on for the standard installation.
Step 4: Deploy Your First GPU-Accelerated Model
Now for the exciting part! We’ll deploy a sample TorchScript model for CIFAR-10 classification using KServe and NVIDIA Triton Inference Server. This example demonstrates how to request GPU resources and test the inference endpoint.
Create a User Workspace
In Kubeflow, each user should have an isolated Profile, which creates a Kubernetes namespace with proper RBAC and Istio settings.
Recommended approach: From the Kubeflow Dashboard, create a new Profile (e.g., “gputest”).
Alternative approach: If you prefer creating a namespace manually, ensure it has the correct label:
kubectl create namespace gputest
kubectl label namespace gputest \
serving.kubeflow.org/inferenceservice=enabled
Define the Inference Service
Create a file called torchscript-cifar.yaml:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: torchscript-cifar10
namespace: gputest
spec:
predictor:
triton:
# Use a Triton image tag supported by your cluster
runtimeVersion: "20.10-py3"
storageUri: "gs://kfserving-examples/models/torchscript"
env:
- name: OMP_NUM_THREADS
value: "1"
resources:
limits:
nvidia.com/gpu: 1
Deploy the service:
kubectl apply -f torchscript-cifar.yaml
# Monitor the deployment
kubectl get pods -n gputest -w
kubectl get inferenceservice torchscript-cifar10 -n gputest
Wait until the READY status shows “True”—your model is now live and ready for inference!
Test the Model Endpoint
First, gather the necessary connection details:
export INGRESS_HOST=$(kubectl -n kubeflow \
get svc istio-ingressgateway-workload \
-o jsonpath="{.status.loadBalancer.ingress[0].ip}")
export INGRESS_PORT=80
export SERVICE_HOSTNAME=$(kubectl -n gputest \
get inferenceservice torchscript-cifar10 \
-o jsonpath="{.status.url}" | cut -d "/" -f 3)
Now let’s test the model with sample data:
# Download the sample input datacurl-O https://raw.githubusercontent.com/kserve/kserve/master/docs/samples/v1beta1/triton/torchscript/input.json
# Send a prediction request
MODEL_NAME=cifar10curl-v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" \
\
-d @./input.json
You should receive a JSON response containing prediction scores for all 10 CIFAR-10 classes!
Step 5: Monitor GPU Usage and Performance
Real-time GPU Monitoring
Watch GPU utilization in real-time from your host system:
watch -n1 nvidia-smi
During inference, you should see a tritonserver process consuming GPU memory, with utilization spikes when processing requests.
Check Model Server Logs
View detailed logs from your inference service:
kubectl logs -n gputest \
$(kubectl get pods -n gputest \
-l serving.kserve.io/inferenceservice=torchscript-cifar10 \
-o jsonpath="{.items[0].metadata.name}") \
-c kserve-container
Load Testing
Test your model’s performance under load with multiple concurrent requests:
# Send 10 parallel requests
for i in {1..10}; do
curl-s -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" \
\
-d @./input.json &
done; wait
Troubleshooting Common Issues
InferenceService Won’t Start or Deploy
Issue: Service appears stuck or won’t pull the model.
Solutions:
- Avoid deploying into control-plane namespaces (like kubeflow)
- Check for errors: kubectl describe inferenceservice <name> -n <namespace>
- For manually created namespaces, verify the label: serving.kubeflow.org/inferenceservice=enabled
GPU Not Available in Pods
Issue: Pods can’t access the GPU.
Solutions:
- Confirm nvidia-smi works on the host
- Check GPU Operator status: kubectl get pods -n gpu-operator-resources
- Verify pod configuration: kubectl describe pod <pod> -n <namespace> and look for nvidia.com/gpu: 1 in limits
Can’t Access the Model Endpoint
Issue: Unable to reach the inference service.
Solutions:
- Verify MetalLB assigned an external IP: check istio-ingressgateway-workload service in the kubeflow namespace
- Test the gateway directly:
curl-I - As a fallback, use port-forwarding: kubectl -n kubeflow port-forward svc/istio-ingressgateway-workload 8080:80 and access
Production Tips and Next Steps
Now that you have a working GPU-accelerated ML serving setup, consider these enhancements for production use:
User Management: Use Profiles for proper user isolation with correct RBAC and Istio configurations.
Auto-scaling: Leverage Knative’s ability to scale model servers to zero during idle periods and back up under load.
Model Versioning: Implement safe model rollouts using traffic splits and canary deployments.
Monitoring: Integrate Prometheus/Grafana for metrics and set up centralized log aggregation.
Security: Enable HTTPS and consider implementing Istio CNI for enhanced security.
Multi-tenancy: For high-end GPUs like A100/H100, explore MIG (Multi-Instance GPU) for partitioning GPUs across multiple tenants.
UI Management: Use the KServe Models Web App to create and manage InferenceService objects through a graphical interface.
Technical Appendix
Choosing the Right Triton Image Tag
The spec.predictor.triton.runtimeVersion field maps to a specific Triton container tag. Choose a tag that’s compatible with your driver and CUDA stack. This example uses 20.10-py3, which matches the upstream samples for broad compatibility.
GPU Resource Configuration
GPUs are extended resources in Kubernetes. It’s sufficient to set only the limits (e.g., nvidia.com/gpu: 1). If you also specify requests, they must exactly match the limit value.
Storage Considerations
The hostpath storage used in this guide works great for single-node development environments. For high-availability or multi-node clusters, consider using distributed storage backends.
Helpful Resources
News
Berita
News Flash
Blog
Technology
Sports
Sport
Football
Tips
Finance
Berita Terkini
Berita Terbaru
Berita Kekinian
News
Berita Terkini
Olahraga
Pasang Internet Myrepublic
Jasa Import China
Jasa Import Door to Door