Deploying LLMs with KubeAI on k3s (Ubuntu) ‣ Seeweb

  1.  

Section 1: Prepare the Ubuntu Machine

These steps prepare your Ubuntu system with necessary dependencies.

Update System Packages

sudo apt-get update
sudo apt-get upgrade -y

Verify NVIDIA Drivers

Ensure your NVIDIA drivers are installed and the GPU is recognized:

nvidia-smi

If this command fails or doesn’t show your GPU, you must install or troubleshoot your NVIDIA drivers before proceeding.

Install Docker

KubeAI and Kubernetes rely on a container runtime, typically Docker.

# Install prerequisites
sudo apt-get install -y apt-transport-https ca-certificates
curl software-properties-common

# Add Docker's official GPG key
curl -fsSL | sudo apt-key add -

# Install Docker CE
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

Troubleshooting Docker: If sudo systemctl status docker shows issues, check logs with sudo journalctl -xeu docker.service. Common issues include port conflicts or resource limitations.

Install NVIDIA Container Toolkit

This allows Docker (and subsequently Kubernetes) to use your NVIDIA GPUs.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L | sudo apt-key add -
curl -s -L | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Test if Docker can see the GPU:

sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

This command should output the same nvidia-smi details as before, but from within a Docker container.

Install Helm

Helm is a package manager for Kubernetes, used to install KubeAI.

curl  | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null

sudo apt-get install apt-transport-https --yes echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list

sudo apt-get update
sudo apt-get install -y helm helm version

Section 2: Install and Configure k3s Kubernetes Cluster

k3s is a lightweight, certified Kubernetes distribution.

Install k3s

# Install K3s without Traefik (we don't need it)
curl -sfL | INSTALL_K3S_EXEC="--disable=traefik" sh -

# Configure kubectl
mkdir -p $HOME/.kube
cp /etc/rancher/k3s/k3s.yaml $HOME/.kube/config export KUBECONFIG=$HOME/.kube/config echo "export KUBECONFIG=$HOME/.kube/config" >> $HOME/.bashrc

# Verify K3s is running
kubectl get nodes

You should see your node in a Ready state.

Troubleshooting k3s:

If kubectl get nodes fails (e.g., connection refused):

    1. Check k3s service status: sudo systemctl status k3s
    2. View logs: sudo journalctl -u k3s -f
    3. Restart if necessary: sudo systemctl restart k3s

Install NVIDIA GPU Operator

For Kubernetes to effectively manage and schedule workloads on NVIDIA GPUs, the NVIDIA GPU Operator is highly recommended.

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update helm install gpu-operator nvidia/gpu-operator \ --namespace gpu-operator --create-namespace \ --set driver.enabled=false \ --wait

Note: We set driver.enabled=false because the driver is already installed on the host. The operator will handle the rest (container runtime, device plugin, etc.).

Verify the operator pods are running:

kubectl get pods -n gpu-operator

Wait until all pods are in a Running or Completed state. This can take several minutes. Once the operator is running, your nodes with GPUs should be labeled and have allocatable GPU resources.

Check with:

kubectl describe node $(kubectl get nodes -o jsonpath="{.items[0].metadata.name}") | grep nvidia.com/gpu

You should see nvidia.com/gpu listed under Allocatable and Capacity.

Section 3: Install KubeAI with NVIDIA GPU Support

Now we’ll install KubeAI with specific NVIDIA GPU configurations.

# Add the KubeAI helm repository
helm repo add kubeai https://www.kubeai.org
helm repo update # Create namespace for KubeAI
kubectl create namespace kubeai # Download the values file for NVIDIA GPU curl -L -O https://raw.githubusercontent.com/substratusai/kubeai/refs/heads/main/charts/kubeai/values-nvidia-k8s-device-plugin.yaml

# Install KubeAI
helm upgrade --install kubeai kubeai/kubeai \ -f values-nvidia-k8s-device-plugin.yaml \ --namespace kubeai \ --wait

If you need to use Hugging Face models that require authentication, add

--set secrets.huggingface.token=$HF_TOKEN

to the helm command above (after exporting your token with export HF_TOKEN=your-hugging-face-token).

Section 4: Deploy Qwen2.5-7B-Instruct Model

Let’s deploy the Qwen2.5-7B-Instruct model using KubeAI’s Model CRD (Custom Resource Definition).

cat <<EOF | kubectl apply -f -
apiVersion: kubeai.org/v1
kind: Model
metadata:
name: qwen2.5-7b
namespace: kubeai
spec:
features: [TextGeneration]
owner: Qwen
url: hf://Qwen/Qwen2.5-7B-Instruct
engine: VLLM
resourceProfile: nvidia-gpu-l40:1
minReplicas: 1
EOF

Check if the model deployment is in progress:

kubectl get model -n kubeai
kubectl get pods -n kubeai

Wait for the model pod to reach Running state. This may take several minutes as the container downloads the model from Hugging Face.

Section 5: Access the Model Service

To access the model service from your local machine, you’ll need to set up both port forwarding on the server and an SSH tunnel from your local machine:

Step 1: Create SSH Tunnel

On your local machine, create an SSH tunnel to the server:

ssh -i ~/.ssh/your_key -L 8080:localhost:8080 root@your_server_ip

Replace your_key with your SSH key file name and your_server_ip with your server’s IP address.

Step 2: Set Up Port Forwarding on the Server

Once the SSH tunnel is active, on the remote server run:

kubectl -n kubeai port-forward svc/open-webui 8080:80

This forwards the KubeAI service to port 8080 on the server, which is then tunneled to port 8080 on your local machine through SSH.

Section 6: Testing the Model

With the port forwarding active, you can now:

      1. Open your web browser and navigate to
      2. Use the KubeAI web interface to interact with your Qwen2.5-7B-Instruct model

    Alternatively, you can make API calls directly:

    curl /v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "qwen2.5-7b",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 200
    }'

    Conclusion

    You have now successfully deployed a Qwen2.5-7B-Instruct model on your Ubuntu machine using k3s and KubeAI. This setup provides a lightweight yet powerful infrastructure for running AI models locally with GPU acceleration.

    For more advanced configurations, including multiple models, custom resource profiles, or integration with other services, refer to the official KubeAI documentation.

    News Berita News Flash Blog Technology Sports Sport Football Tips Finance Berita Terkini Berita Terbaru Berita Kekinian News Berita Terkini Olahraga Pasang Internet Myrepublic Jasa Import China Jasa Import Door to Door

    Prerequisites

    Before you begin, ensure your Ubuntu machine meets the following requirements:

    1. Ubuntu Linux Machine: A server or desktop running a recent version of Ubuntu (e.g., 20.04 LTS or 22.04 LTS).
    2. SSH Access: You can SSH into the machine.
    3. Sudo Privileges: You have sudo access.
    4. NVIDIA GPU: An NVIDIA GPU suitable for running the chosen Qwen model (Qwen2.5-7B-Instruct might need ~16GB+ VRAM for FP16. Quantized models may require less).
    5. NVIDIA Drivers: NVIDIA drivers correctly installed and nvidia-smi command working and showing your GPU.
    6. Internet Access: For downloading packages, Docker images, and models.
    7. Sufficient Resources: Adequate CPU cores, RAM (32GB-48GB+, depending on model precision and system overhead), and disk space.
    1.  

    Section 1: Prepare the Ubuntu Machine

    These steps prepare your Ubuntu system with necessary dependencies.

    Update System Packages

    sudo apt-get update
    sudo apt-get upgrade -y

    Verify NVIDIA Drivers

    Ensure your NVIDIA drivers are installed and the GPU is recognized:

    nvidia-smi

    If this command fails or doesn’t show your GPU, you must install or troubleshoot your NVIDIA drivers before proceeding.

    Install Docker

    KubeAI and Kubernetes rely on a container runtime, typically Docker.

    # Install prerequisites
    sudo apt-get install -y apt-transport-https ca-certificates
    curl software-properties-common

    # Add Docker's official GPG key
    curl -fsSL | sudo apt-key add -

    # Install Docker CE
    sudo apt-get update
    sudo apt-get install -y docker-ce docker-ce-cli containerd.io

    Troubleshooting Docker: If sudo systemctl status docker shows issues, check logs with sudo journalctl -xeu docker.service. Common issues include port conflicts or resource limitations.

    Install NVIDIA Container Toolkit

    This allows Docker (and subsequently Kubernetes) to use your NVIDIA GPUs.

    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L | sudo apt-key add -
    curl -s -L | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    sudo apt-get update
    sudo apt-get install -y nvidia-container-toolkit
    sudo systemctl restart docker

    Test if Docker can see the GPU:

    sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

    This command should output the same nvidia-smi details as before, but from within a Docker container.

    Install Helm

    Helm is a package manager for Kubernetes, used to install KubeAI.

    curl  | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null

    sudo apt-get install apt-transport-https --yes echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list

    sudo apt-get update
    sudo apt-get install -y helm helm version

    Section 2: Install and Configure k3s Kubernetes Cluster

    k3s is a lightweight, certified Kubernetes distribution.

    Install k3s

    # Install K3s without Traefik (we don't need it)
    curl -sfL | INSTALL_K3S_EXEC="--disable=traefik" sh -

    # Configure kubectl
    mkdir -p $HOME/.kube
    cp /etc/rancher/k3s/k3s.yaml $HOME/.kube/config export KUBECONFIG=$HOME/.kube/config echo "export KUBECONFIG=$HOME/.kube/config" >> $HOME/.bashrc

    # Verify K3s is running
    kubectl get nodes

    You should see your node in a Ready state.

    Troubleshooting k3s:

    If kubectl get nodes fails (e.g., connection refused):

      1. Check k3s service status: sudo systemctl status k3s
      2. View logs: sudo journalctl -u k3s -f
      3. Restart if necessary: sudo systemctl restart k3s

    Install NVIDIA GPU Operator

    For Kubernetes to effectively manage and schedule workloads on NVIDIA GPUs, the NVIDIA GPU Operator is highly recommended.

    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
    helm repo update helm install gpu-operator nvidia/gpu-operator \ --namespace gpu-operator --create-namespace \ --set driver.enabled=false \ --wait

    Note: We set driver.enabled=false because the driver is already installed on the host. The operator will handle the rest (container runtime, device plugin, etc.).

    Verify the operator pods are running:

    kubectl get pods -n gpu-operator

    Wait until all pods are in a Running or Completed state. This can take several minutes. Once the operator is running, your nodes with GPUs should be labeled and have allocatable GPU resources.

    Check with:

    kubectl describe node $(kubectl get nodes -o jsonpath="{.items[0].metadata.name}") | grep nvidia.com/gpu

    You should see nvidia.com/gpu listed under Allocatable and Capacity.

    Section 3: Install KubeAI with NVIDIA GPU Support

    Now we’ll install KubeAI with specific NVIDIA GPU configurations.

    # Add the KubeAI helm repository
    helm repo add kubeai https://www.kubeai.org
    helm repo update # Create namespace for KubeAI
    kubectl create namespace kubeai # Download the values file for NVIDIA GPU curl -L -O https://raw.githubusercontent.com/substratusai/kubeai/refs/heads/main/charts/kubeai/values-nvidia-k8s-device-plugin.yaml

    # Install KubeAI
    helm upgrade --install kubeai kubeai/kubeai \ -f values-nvidia-k8s-device-plugin.yaml \ --namespace kubeai \ --wait

    If you need to use Hugging Face models that require authentication, add

    --set secrets.huggingface.token=$HF_TOKEN

    to the helm command above (after exporting your token with export HF_TOKEN=your-hugging-face-token).

    Section 4: Deploy Qwen2.5-7B-Instruct Model

    Let’s deploy the Qwen2.5-7B-Instruct model using KubeAI’s Model CRD (Custom Resource Definition).

    cat <<EOF | kubectl apply -f -
    apiVersion: kubeai.org/v1
    kind: Model
    metadata:
    name: qwen2.5-7b
    namespace: kubeai
    spec:
    features: [TextGeneration]
    owner: Qwen
    url: hf://Qwen/Qwen2.5-7B-Instruct
    engine: VLLM
    resourceProfile: nvidia-gpu-l40:1
    minReplicas: 1
    EOF

    Check if the model deployment is in progress:

    kubectl get model -n kubeai
    kubectl get pods -n kubeai

    Wait for the model pod to reach Running state. This may take several minutes as the container downloads the model from Hugging Face.

    Section 5: Access the Model Service

    To access the model service from your local machine, you’ll need to set up both port forwarding on the server and an SSH tunnel from your local machine:

    Step 1: Create SSH Tunnel

    On your local machine, create an SSH tunnel to the server:

    ssh -i ~/.ssh/your_key -L 8080:localhost:8080 root@your_server_ip

    Replace your_key with your SSH key file name and your_server_ip with your server’s IP address.

    Step 2: Set Up Port Forwarding on the Server

    Once the SSH tunnel is active, on the remote server run:

    kubectl -n kubeai port-forward svc/open-webui 8080:80

    This forwards the KubeAI service to port 8080 on the server, which is then tunneled to port 8080 on your local machine through SSH.

    Section 6: Testing the Model

    With the port forwarding active, you can now:

      1. Open your web browser and navigate to
      2. Use the KubeAI web interface to interact with your Qwen2.5-7B-Instruct model

    Alternatively, you can make API calls directly:

    curl /v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "qwen2.5-7b",
    "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7,
    "max_tokens": 200
    }'

    Conclusion

    You have now successfully deployed a Qwen2.5-7B-Instruct model on your Ubuntu machine using k3s and KubeAI. This setup provides a lightweight yet powerful infrastructure for running AI models locally with GPU acceleration.

    For more advanced configurations, including multiple models, custom resource profiles, or integration with other services, refer to the official KubeAI documentation.

    News Berita News Flash Blog Technology Sports Sport Football Tips Finance Berita Terkini Berita Terbaru Berita Kekinian News Berita Terkini Olahraga Pasang Internet Myrepublic Jasa Import China Jasa Import Door to Door

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Proudly powered by WordPress | Theme: Hike Blog by Crimson Themes.