This documentation provides a step-by-step guide for expert developers on how to build and start the moj-analytical-services/splink_demos
project. Follow the instructions carefully to ensure a successful setup.
Prerequisites
Before building the project, ensure the following prerequisites are met:
- Java Installation:
Splink requires Java to be installed in order to run
pyspark
. Check if Java is installed by running the following command in your terminal:
If you do not see details of your Java installation, download the appropriate version of Java for your operating system.java -version
Cloning the Repository
Begin by cloning the splink_demos
repository to your local machine. Execute the following command:
git clone <repository-url>
Replace <repository-url>
with the actual URL of the repository.
Setting Up the Virtual Environment
Once the repository is cloned, navigate into the project directory and create a virtual environment using the following commands:
cd splink_demos
python3 -m venv venv
source venv/bin/activate
The source
command activates the virtual environment, ensuring that all installed packages are contained within this environment.
Installing Required Dependencies
After activating the virtual environment, install the necessary packages by running:
pip3 install -r requirements.txt
This command installs all dependencies required for the project, including pyspark
.
Running the Demo Notebooks
The project contains interactive notebooks that provide demonstrations and tutorials for using the Splink record linking library. To start working with these notebooks, you can use Jupyter Notebook or JupyterLab. If you don’t have Jupyter installed, you can add it to your virtual environment by executing:
pip3 install notebook
Then, you can launch Jupyter Notebook with:
jupyter notebook
Once inside the Jupyter interface, you can navigate to the tutorials
directory and open the desired notebook, such as 00_Tutorial_Introduction.ipynb
, to continue with the tutorials.
Conclusion
By following these steps, you will be able to build and start the splink_demos
project effectively. You can now explore the interactive notebooks and familiarize yourself with the functionality of the Splink record linking library.
Source:
- README.md
- tutorials/00_Tutorial_Introduction.ipynb
- tutorials/01_Prerequisites.ipynb