Deploy GPU Workload Using vK8s

Objective

This document provides instructions on how to deploy GPU application workloads using Volterra vK8s. To know more about how Volterra distributes application deployment, see Distributed Application Management.

Volterra supports enabling GPU capability through any of the following ways:

  • Through VoltStack Site configuration - The VoltStack sites can be enabled with GPU via configuration provided that the site hardware includes a GPU.
  • Through fleet configuration - The Volterra sites that are part of this fleet are applied with the GPU capability provided that the site hardware includes a GPU.

The GPU applications are then deployed using the Volterra vK8s that is associated with the same virtual site as that of the fleet of sites.

Using the instructions provided in this guide, you can enable GPU capabilities in Volterra sites using VoltStack site or fleet configuration and deploy GPU application workloads with the kubeconfig of the vK8s object.


Prerequisites


Configuration

Deploying GPU applications using vK8s consists of the following sequence of actions:

  • Enabling GPU capability through fleet or VoltStack site configuration.
  • Creating a vK8s object and associating it with virtual site selecting VoltStack sites or the sites that are of the created fleet.
  • Deploying the application using the kubeconfig of the vK8s object.

Perform the following steps in VoltConsole.

Step 1: Enable GPU for sites.

You can enable GPU for sites in any one of the following ways:

VoltStack Site: Create a VoltStack site with GPU enabled.
  • In the System namespace, navigate to Manage -> Site Management and select VoltStack Site from the options.
  • Click Add VoltStack Site and enter a name and other required fields for your site.
  • Go to Advanced Configuration section and enable the Show Advanced Fields option.
  • Select GPU Enabled option for the Enable/Disable GPU field.

vstack site gpu
Figure: VoltStack Site GPU Setting

  • Click Save and Exit.

Note: This step shows only how to enable the GPU in VoltStack site configuration. For detailed information on VoltStack site creation, see Site Management.

Fleet: Create a fleet with GPU enabled.
  • In the System namespace, navigate to Manage -> Site Management and select Fleets from the options.
  • Click Add fleet and enter a name for your fleet in the metadata section.
  • Enter a label in the Fleet Label Value field in the Fleet Configuration section.
  • Go to Advanced Configuration section and enable the Show Advanced Fields option.
  • Select GPU Enabled for the Enable/Disable GPU field.
  • Go to Enable Default Fleet Config Download section and enable the Show Advanced Fields option.
  • Select the Enable Default Fleet Config Download checkbox.

fleet gpu en
Figure: Fleet Configuration Enabled with GPU

  • Click Save and Exit to create the fleet.

Note: For detailed information on fleet configuration and operation, see Fleets and Vsites.

Step 2: Apply fleet label to the sites in case GPU is enabled using fleet.
  • Navigate to Sites -> Site List in the System namespace.
  • Select a site where you want to deploy the GPU application and click ... -> Edit.
  • Click on the Labels field, select ves.io/fleet from the list of keys and assign the value created in previous step.
  • Click Save changes.
  • Repeat the above steps for all the sites of your choice.

Note: In case you apply fleet label to a site which does not have hardware support for GPU, the GPU applications do not get deployed on that site.

For detailed instructions on fleets, see Create Fleet document.

Step 3: Navigate to your vK8s object and download its kubeconfig.

Note: Ensure that the vK8s object is associated with virtual site that groups all the VoltStack sites enabled with GPU or sites that are part of the fleet created in previous steps.

  • Click on the application namespace option on the namespace selector. Select your application namespace from the namespace dropdown list to change to that namespace.
  • Select Applications in the configuration menu and Virtual K8s in the options pane.
  • Click ...-> Download for the created vK8s object to download its kubeconfig file.
  • Set environment variable for the downloaded kubeconfig in your local machine.
export KUBECONFIG=<vK8s-kubeconfig>

Note: In case of deployment using kubectl, setting KUBECONFIG variable deploys the resources to the vK8s.

Step 4: Deploy the workload to the vK8s.

You can deploy the application workload from VoltConsole or using Kubectl. This example shows deploying using Kubectl.

  • Create a file in your local machine and enter the GPU application manifest in JSON or YAML format. Ensure that the manifest resources should specify that a GPU is required. See the following resources sample for NVIDIA GPU:
spec:
containers:
- name: nvidia-pytorch
image: "nvcr.io/nvidia/pytorch:18.05-py3"
resources:
requests:
nvidia.com/gpu: 1
  • Deploy the workload using the application manifest and the kubeconfig downloaded in previous step. Use the following sample command.
kubectl apply -f k8s-app-manifest.yaml --kubeconfig vk8s-kubecfg.yaml

Note: In case of continuous use of GPU such as video monitoring applications, it is recommended to use Kubernetes deployment and in other cases, it is recommended to use a Kubernetes job so that the GPU is released after the task is completed.

Step 5: Verify that the deployment is utilising the GPU.

You can verify that the sites are enabled with GPU and the application processes are consuming the enabled GPU resources.

  • Log into your node and check the GPU processes. This example shows command to monitor the GPU devices of Nvidia EGX.
nvidia-smi

Note: The nvidia-smi command displays the information on the GPU devices and the running processes for that GPU.

  • Log into VoltConsole and navigate to the Sites -> Site List page. Click on a site that is part of the fleet you created. This opens the site dashboard. Click Nodes tab and click on a node to open its dashboard. Click on Metrics to monitor the GPU usage, GPU temperature, and GPU throughput metrics.

Concepts


API References