Table Of Contents
Container Security and Isolation
Voltstack container security secures every site against errant/malicious containers - a critical requirement when running 3rd party apps where an enterprise has no visibility into the container/application.
VoltStack container security and isolation service includes the following key features
Resource Limits per Container
VoltStack protects against an errant container that goes into a while loop consuming high CPU, by ensuring it will be evicted as soon as it exceeds the CPU or memory limit configured for the system. VoltStack uses the Linux kernel feature called Cgroups to restrict resources like CPU and memory per container. In the deployment manifest file, enterprises can define the number of resources requested and maximum limit using the resources: requests and resources: limits fields.
Requests: Amount of resources that are allowed to be used with a strong guarantee of availability. The scheduler will not overcommit requests
Limits: This defines the maximum amount of resource that can be used regardless of guarantees. The scheduler ignores limits.
An example manifest file is shown below. In this example, the container is requesting a memory of 64MiB with a limit of 128Mib and CPU request of 250 milliCPU with a limit of 500 milliCPU.
CPU resource requests and limits:
The CPU resource is measured in CPU units. One CPU is equivalent to
- 1 AWS vCPU
- 1 GCP Core
- 1 Azure vCore
- 1 Hyper thread on a bare-metal Intel processor with Hyperthreading
Fractional values are allowed. A container that requests 0.5 CPU is guaranteed half as much CPU as a Container that requests 1 CPU. You can use the suffix “m” to mean milli. For example, 100m CPU, 100 milliCPU, and 0.1 CPU are all the same. Precision finer than 1m is not allowed. CPU is always requested as an absolute quantity, never as a relative quantity; 0.1 is the same amount of CPU on a single-core, dual-core, or 48-core machine.
CPU requests and limits are associated with Containers, but it is useful to think of a Pod as having a CPU request and limit. The CPU request for a Pod is the sum of the CPU requests for all the Containers in the Pod. Likewise, the CPU limit for a Pod is the sum of the CPU limits for all the Containers in the Pod. Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough CPU resources available to satisfy the Pod CPU request. By configuring the CPU requests and limits of the Containers that run in your cluster, you can make efficient use of the CPU resources available on your cluster Nodes. By keeping a Pod CPU request low, you give the Pod a good chance of being scheduled. By having a CPU limit that is greater than the CPU request, you accomplish two things:
- The Pod can have bursts of activity where it makes use of CPU resources that happen to be available.
- The amount of CPU resources a Pod can use during a burst is limited to some reasonable amount.
Memory resource requests and limits
A Container can exceed its memory request if the Node has memory available. But a Container is not allowed to use more than its memory limit. If a Container allocates more memory than its limit, the Container becomes a candidate for termination. If the Container continues to consume memory beyond its limit, the Container is terminated.
Memory requests and limits are associated with Containers, but it is useful to think of a Pod as having a memory request and limit. The memory request for the Pod is the sum of the memory requests for all the Containers in the Pod. Likewise, the memory limit for the Pod is the sum of the limits of all the Containers in the Pod. Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough available memory to satisfy the Pod’s memory request.
By configuring memory requests and limits for the Containers that run in your cluster, you can make efficient use of the memory resources available on your cluster’s Nodes. By keeping a Pod’s memory request low, you give the Pod a good chance of being scheduled. By having a memory limit that is greater than the memory request, you accomplish two things:
- The Pod can have bursts of activity where it makes use of memory that happens to be available.
- The amount of memory a Pod can use during a burst is limited to some reasonable amount.
Quality of Service Tiers and Eviction rules
VoltStack offers Quality of Service Tiers per container, with clear eviction rules, thereby providing granular container isolation from noisy neighbors.
Guaranteed QoS tier:
In this tier, Pods are considered top-priority and are guaranteed to not be killed until they exceed their limits. For a Pod to be given a QoS class of Guaranteed:
- Every Container in the Pod must have a memory limit and a memory request, and they must be the same.
- Every Container in the Pod must have a CPU limit and a CPU request, and they must be the same.
Burstable QoS tier: Pods have some form of minimal resource guarantee but can use more resources when available. Under system memory pressure, these containers are more likely to be killed once they exceed their requests and no Best-Effort pods exist. A Pod is given a QoS class of Burstable if:
- The Pod does not meet the criteria for QoS class Guaranteed.
- At least one Container in the Pod has a memory or CPU request.
- Best-Effort QoS tier: Pods will be treated as the lowest priority. Processes in these pods are the first to get killed if the system runs out of memory. However, these containers can use any amount of free memory in the node. For a Pod to be given a QoS class of BestEffort, the Containers in the Pod must not have any memory or CPU limits or requests.
Administrators can define a default, minimum and maximum for both requests and limits, for both CPU and memory resources, on a per-tenant basis.
OS Kernel Capabilities
VoltStack uses the kernel capabilities feature to provide granular control on the services allowed for root user. For example, capabilities such as insert/remove kernel modules, system clock manipulation are blocked. In containers, many of the capabilities to manage the network and other services are not actually needed. Many root capabilities include the ability to modify logs, change networking, modify kernel memory are disabled for containers.
VoltStack uses a kernel feature called Seccomp to limit the system calls a process can make based on the specified profile. Examples of system calls include bind, accept, fork, settimeofday, mount, etc.
Security-Enhanced Linux (SELinux) is a mandatory access control system that controls how processes interact with files, each other and network ports. Processes, files, memory, network interfaces and so on are labeled and there is a policy defining the interaction that is administratively set and fixed.
SELinux governs labeling and type enforcement. For a mythical service “foo”, the executable file on the disk might have the label
foo_exec_t. The startup scripts for
foo might have the label
foo_config_t. The log files for
foo might have the label
foo_log_t. The data for
foo might have the label
foo_data_t. When foo is running, the process in memory might have the label
Type enforcement is the rule set that says that when a process running in the
foo_t context tries to access a file on the filesystem with the label
foo_data_t, that access is allowed. When the process with the label
foo_t tries to write to a log file with the label
foo_log_t, that would be allowed as well. Any other access, unless explicitly allowed by the policy, is denied. If the foo process, running in the
foo_t context tries to access, for instance, the directory /home/bar, with the label
user_home_dir_t, even if the permissions are wide open, the policy will stop that access. SELinux labels are stored as extended attributes on the filesystem, or in memory.
Container runtime sandbox
Many enterprises like the isolation capabilities provided by VMs but want to achieve the increased utilization provided by containers. VoltStack offers a container runtime sandbox that provides a VM-like isolation boundary with a container-like resource footprint to ensure higher utilization compared to running VMs. It intercepts system calls, provides visibility into the system calls made by application containers and provides the ability to whitelist system calls to ensure the shared kernel is protected against errant/malicious containers.
Container Vulnerability Scanning
Voltstack container security empowers enterprises to “shift-left security” by scanning containers at the CI/CD layer, for known vulnerabilities in application and shared libraries. This enables enterprises to identify corporate policy violations by scanning containers for secrets, passwords and blacklisted 3rd party libraries.
The following How-to guides are examples of setting resource requests and limits per container:
- How to specify requests and limits per container
- How to set defaults and max limits per container for administrators
The following topics are used by Container Security features. Click on each one to learn more: