Docker and Containerization
Motivation
HelixML leverages Docker to provide a consistent and isolated execution environment for its machine learning models. This section will explain how HelixML utilizes Docker, covering container creation, management, and the use of GPU runners.
Dockerfile
The Dockerfile
found at Dockerfile defines the base image for HelixML containers.
The Dockerfile uses the nvidia/cuda
image as its base, specifically the 11.4.2-cudnn8-devel-ubuntu20.04
variant. This image provides the necessary CUDA and cuDNN libraries for running GPU-accelerated machine learning models.
The Dockerfile further installs Python, the pip
package manager, and the setuptools
package. It then copies the project source code and installs the project dependencies using pip install -r requirements.txt
.
Building a Docker Image
To build a Docker image from the Dockerfile, use the following command:
docker build -t helixml .
This command builds the image with the tag helixml
.
Running a Container
To run a HelixML container, use the following command:
docker run -it --rm helixml bash
This command starts an interactive shell session within a container built from the helixml
image. The -it
flag enables interactive mode and allocates a pseudo-TTY. The --rm
flag removes the container after it exits.
Mounting Volumes
For persistent storage and accessing local files, you can mount volumes to a container. For example, to mount the current directory to /app
inside the container, use the following command:
docker run -it --rm -v $(pwd):/app helixml bash
Running with GPUs
HelixML containers can leverage GPUs for accelerated model training and inference. To run a container with GPU access, use the following command:
docker run -it --rm --gpus all helixml bash
The --gpus all
flag requests access to all available GPUs on the host system.
Attaching GPU Runners
HelixML uses a custom Docker image called helixml-gpu-runner
for managing GPU resources and running training jobs. This image is built from the nvidia/cuda
image and includes the necessary libraries and tools for GPU-accelerated training.
The helixml-gpu-runner
image exposes ports for accessing the GPU runner services. You can use the following command to run a container based on this image:
docker run -it --rm -p 8888:8888 -p 5000:5000 helixml-gpu-runner bash
This command starts a container with port 8888
mapped to the host’s port 8888
and port 5000
mapped to the host’s port 5000
. These ports are used by the GPU runner services.
Conclusion
Docker and containerization provide a powerful tool for managing and deploying HelixML models. This approach ensures a consistent execution environment, enables easy scaling, and facilitates the use of GPU resources for faster training and inference. By understanding the basic concepts outlined in this section, developers can effectively utilize Docker to build, run, and manage HelixML containers.