This documentation outlines the step-by-step guide on how to configure Docker within the development environment for the moj-analytical-services/splink_demos project.

Prerequisites

Before diving into Docker configuration, ensure that Docker is installed on your machine. Follow the official Docker installation guide for your specific operating system.

Steps to Configure Docker

  1. Clone the Repository

    First, clone the splink_demos repository to your local machine:

    git clone <repository_url>
    cd splink_demos
    
  2. Set Up a Virtual Environment

    It is recommended to create a Python virtual environment to manage dependencies. Run the following commands:

    python3 -m venv venv
    source venv/bin/activate
    
  3. Install Python Dependencies

    Use pip to install all required Python packages defined in the requirements.txt file:

    pip3 install -r requirements.txt
    

    This will include dependencies such as jupyterlab, pyspark, and splink.

  4. Add Jupyter Kernel

    To use Jupyter notebooks with the newly created virtual environment, add a corresponding Jupyter kernel:

    python -m ipykernel install --user --name=splink_demos
    
  5. Run Jupyter Lab

    After adding the kernel, you can start Jupyter Lab:

    jupyter lab
    

    This command launches the Jupyter Lab interface in your web browser, allowing you to work with the notebooks.

Running Docker Containers

To utilize Docker effectively in your development environment, you may want to create a Dockerfile for your project or use pre-existing container images if available. Below is an example of how you might set this up:

  1. Create a Dockerfile

    You need to create a Dockerfile in the root directory of your project. Below is a basic Dockerfile example that sets up the necessary environment:

    # Use a base image with Python
    FROM python:3.10
    
    # Set the working directory
    WORKDIR /app
    
    # Copy the requirements file
    COPY requirements.txt .
    
    # Install dependencies
    RUN pip install --no-cache-dir -r requirements.txt
    
    # Copy the rest of your application code
    COPY . .
    
    # Command to run your app (if applicable)
    CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root"]
    
  2. Build the Docker Image

    After creating your Dockerfile, build the Docker image using the following command:

    docker build -t splink_demos_dev .
    
  3. Run the Docker Container

    To run the Docker container, execute the following command:

    docker run -p 8888:8888 splink_demos_dev
    

    This command maps port 8888 from the container to port 8888 on your host, allowing you to access Jupyter Lab through your browser at http://localhost:8888.

Conclusion

With the steps outlined above, you should now have a Dockerized development environment for the moj-analytical-services/splink_demos project set up. This approach allows for more efficient dependency management and a consistent environment for development and testing.

For further exploration and advanced configurations, please refer to the official Docker documentation.