{
“cells”: [
{
“cell_type”: “markdown”,
“id”: “26e50a28”,
“metadata”: {},
“source”: [
“# Introductory tutorial\n”,
“\n”,
“This is the introduction to a five part tutorial which demonstrates how to de-duplicate a small dataset using simple settings.\n”,
“\n”,
“The aim of the tutorial is to demonstrate core Splink functionality succinctly, rather that comprehensively document all configuration options.\n”,
“\n”,
“The seven parts are:\n”,
“\n”,
“- 1. Data prep pre-requisites\n”,
“\n”,
“- 2. Exploratory analysis\n”,
“\n”,
“- 3. Choosing blocking rules to optimise runtimes\n”,
“\n”,
“- 4. Estimating model parameters\n”,
“\n”,
“- 5. Predicting results\n”,
“\n”,
“- 6. Visualising predictions\n”,
“\n”,
“- 7. Quality assurance\n”,
“\n”,
“\n”,
“Throughout the tutorial, we use the duckdb backend, which is the recommended option for smaller datasets of up to around 1 million records on a normal laptop.\n”,
“\n”,
“You can find these tutorial notebooks in the splink_demos
repo, and you can run them live in your web browser by clicking the following link:\n”,
“\n”,
“
\n”,
“\n”,
“\n”,
“\n”,
“\n”,
“\n”,
“\n”
]
},
{
“cell_type”: “markdown”,
“id”: “33c575ca”,
“metadata”: {},
“source”: [
“## End-to-end demos\n”,
“\n”,
“After following the steps of the tutorial, it might prove useful to have a look at some of the example notebooks that show various use-case scenarios of Splink from start to finish.”
]
}
],
“metadata”: {
“kernelspec”: {
“display_name”: “Python 3 (ipykernel)”,
“language”: “python”,
“name”: “python3”
},
“language_info”: {
“codemirror_mode”: {
“name”: “ipython”,
“version”: 3
},
“file_extension”: “.py”,
“mimetype”: “text/x-python”,
“name”: “python”,
“nbconvert_exporter”: “python”,
“pygments_lexer”: “ipython3”,
“version”: “3.9.2”
},
“vscode”: {
“interpreter”: {
“hash”: “3b53fa520a31e303a9636a08ff10a3bbc14893ee50cb37445791fa59628fc75b”
}
}
},
“nbformat”: 4,
“nbformat_minor”: 5
}