Docker and Containerization

Motivation

HelixML leverages Docker to provide a consistent and isolated execution environment for its machine learning models. This section will explain how HelixML utilizes Docker, covering container creation, management, and the use of GPU runners.

Dockerfile

The Dockerfile found at Dockerfile defines the base image for HelixML containers.

The Dockerfile uses the nvidia/cuda image as its base, specifically the 11.4.2-cudnn8-devel-ubuntu20.04 variant. This image provides the necessary CUDA and cuDNN libraries for running GPU-accelerated machine learning models.

The Dockerfile further installs Python, the pip package manager, and the setuptools package. It then copies the project source code and installs the project dependencies using pip install -r requirements.txt.

Building a Docker Image

To build a Docker image from the Dockerfile, use the following command:

docker build -t helixml .
          

This command builds the image with the tag helixml.

Running a Container

To run a HelixML container, use the following command:

docker run -it --rm helixml bash
          

This command starts an interactive shell session within a container built from the helixml image. The -it flag enables interactive mode and allocates a pseudo-TTY. The --rm flag removes the container after it exits.

Mounting Volumes

For persistent storage and accessing local files, you can mount volumes to a container. For example, to mount the current directory to /app inside the container, use the following command:

docker run -it --rm -v $(pwd):/app helixml bash
          

Running with GPUs

HelixML containers can leverage GPUs for accelerated model training and inference. To run a container with GPU access, use the following command:

docker run -it --rm --gpus all helixml bash
          

The --gpus all flag requests access to all available GPUs on the host system.

Attaching GPU Runners

HelixML uses a custom Docker image called helixml-gpu-runner for managing GPU resources and running training jobs. This image is built from the nvidia/cuda image and includes the necessary libraries and tools for GPU-accelerated training.

The helixml-gpu-runner image exposes ports for accessing the GPU runner services. You can use the following command to run a container based on this image:

docker run -it --rm -p 8888:8888 -p 5000:5000 helixml-gpu-runner bash
          

This command starts a container with port 8888 mapped to the host’s port 8888 and port 5000 mapped to the host’s port 5000. These ports are used by the GPU runner services.

Conclusion

Docker and containerization provide a powerful tool for managing and deploying HelixML models. This approach ensures a consistent execution environment, enables easy scaling, and facilitates the use of GPU resources for faster training and inference. By understanding the basic concepts outlined in this section, developers can effectively utilize Docker to build, run, and manage HelixML containers.