This documentation outlines the step-by-step guide on how to configure Docker within the development environment for the moj-analytical-services/splink_demos
project.
Prerequisites
Before diving into Docker configuration, ensure that Docker is installed on your machine. Follow the official Docker installation guide for your specific operating system.
Steps to Configure Docker
Clone the Repository
First, clone the
splink_demos
repository to your local machine:git clone <repository_url> cd splink_demos
Set Up a Virtual Environment
It is recommended to create a Python virtual environment to manage dependencies. Run the following commands:
python3 -m venv venv source venv/bin/activate
Install Python Dependencies
Use pip to install all required Python packages defined in the
requirements.txt
file:pip3 install -r requirements.txt
This will include dependencies such as
jupyterlab
,pyspark
, andsplink
.Add Jupyter Kernel
To use Jupyter notebooks with the newly created virtual environment, add a corresponding Jupyter kernel:
python -m ipykernel install --user --name=splink_demos
Run Jupyter Lab
After adding the kernel, you can start Jupyter Lab:
jupyter lab
This command launches the Jupyter Lab interface in your web browser, allowing you to work with the notebooks.
Running Docker Containers
To utilize Docker effectively in your development environment, you may want to create a Dockerfile for your project or use pre-existing container images if available. Below is an example of how you might set this up:
Create a Dockerfile
You need to create a
Dockerfile
in the root directory of your project. Below is a basic Dockerfile example that sets up the necessary environment:# Use a base image with Python FROM python:3.10 # Set the working directory WORKDIR /app # Copy the requirements file COPY requirements.txt . # Install dependencies RUN pip install --no-cache-dir -r requirements.txt # Copy the rest of your application code COPY . . # Command to run your app (if applicable) CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root"]
Build the Docker Image
After creating your Dockerfile, build the Docker image using the following command:
docker build -t splink_demos_dev .
Run the Docker Container
To run the Docker container, execute the following command:
docker run -p 8888:8888 splink_demos_dev
This command maps port 8888 from the container to port 8888 on your host, allowing you to access Jupyter Lab through your browser at
http://localhost:8888
.
Conclusion
With the steps outlined above, you should now have a Dockerized development environment for the moj-analytical-services/splink_demos
project set up. This approach allows for more efficient dependency management and a consistent environment for development and testing.
For further exploration and advanced configurations, please refer to the official Docker documentation.