This documentation provides a step-by-step guide for expert developers on how to build and start the moj-analytical-services/splink_demos project. Follow the instructions carefully to ensure a successful setup.

Prerequisites

Before building the project, ensure the following prerequisites are met:

  1. Java Installation: Splink requires Java to be installed in order to run pyspark. Check if Java is installed by running the following command in your terminal:
    java -version
    
    If you do not see details of your Java installation, download the appropriate version of Java for your operating system.

Cloning the Repository

Begin by cloning the splink_demos repository to your local machine. Execute the following command:

git clone <repository-url>

Replace <repository-url> with the actual URL of the repository.

Setting Up the Virtual Environment

Once the repository is cloned, navigate into the project directory and create a virtual environment using the following commands:

cd splink_demos
python3 -m venv venv
source venv/bin/activate

The source command activates the virtual environment, ensuring that all installed packages are contained within this environment.

Installing Required Dependencies

After activating the virtual environment, install the necessary packages by running:

pip3 install -r requirements.txt

This command installs all dependencies required for the project, including pyspark.

Running the Demo Notebooks

The project contains interactive notebooks that provide demonstrations and tutorials for using the Splink record linking library. To start working with these notebooks, you can use Jupyter Notebook or JupyterLab. If you don’t have Jupyter installed, you can add it to your virtual environment by executing:

pip3 install notebook

Then, you can launch Jupyter Notebook with:

jupyter notebook

Once inside the Jupyter interface, you can navigate to the tutorials directory and open the desired notebook, such as 00_Tutorial_Introduction.ipynb, to continue with the tutorials.

Conclusion

By following these steps, you will be able to build and start the splink_demos project effectively. You can now explore the interactive notebooks and familiarize yourself with the functionality of the Splink record linking library.

Source:

  • README.md
  • tutorials/00_Tutorial_Introduction.ipynb
  • tutorials/01_Prerequisites.ipynb