This guide provides an overview of the workflows and best practices for the splink_demos
project, which is hosted on GitHub. The project is designed to be simple, consistent, and repeatable, following the KISS principle. The main dependencies include pyspark
, splink
, pyarrow
, scikit-learn
, and duckdb
.
Project Setup
To get started with the project, follow these steps:
- Download and unzip the project.
- Add products to the project’s
installs
directory. - Run
init.sh
(for Unix) orinit.bat
(for Windows) to install the project.
The project template structure includes:
docs/
: project documentation and screenshots.notebooks/
: Jupyter notebooks for demos and examples.scripts/
: utility scripts and code snippets.data/
: sample datasets for demos and examples.
Jupyter Notebooks and Interactive Widgets
The splink_demos
project makes extensive use of Jupyter Notebooks and JupyterLab for interactive data exploration and analysis. The ipywidgets
library is used to create interactive widgets, which facilitate user input and customization.
Here’s an example of using ipywidgets
to create a simple slider:
import ipywidgets as widgets
from IPython.display import display
slider = widgets.FloatSlider(value=7.5, min=0.1, max=10.0, step=0.1)
display(slider)
Testing and Continuous Integration
The project uses pytest
for testing and nbmake
to build and execute Jupyter notebooks as part of the continuous integration (CI) process. The CI workflow is triggered on every push to the main branch and ensures that the code and examples run as expected.
To run tests and build notebooks locally, use the following commands:
pytest
nbmake
GitOps Workflow
The splink_demos
project follows the GitOps workflow for managing infrastructure and deployments. This approach involves using Git as the single source of truth for both code and infrastructure configuration.
The project includes an example GitOps workflow using Argo CD and Linkerd. This workflow demonstrates how to securely generate and manage Linkerd’s mTLS private keys and certificates using Sealed Secrets and cert-manager. It also shows how to integrate the auto proxy injection feature into the workflow.
To learn more about the GitOps workflow with Linkerd and Argo CD, refer to the official Linkerd documentation.
Browser Testing and WebdriverIO
The project uses WebdriverIO for browser testing and integration tests. This allows for fast and easy sanity-checking of UI changes in a fast-moving dev environment.
Here’s an example of using WebdriverIO to open a webpage and take a screenshot:
const wdio = require('webdriverio');
const opts = {
path: '/wd/hub',
port: 4444,
capabilities: [{
maxInstances: 1,
browserName: 'chrome'
}]
};
const driver = wdio.promiseChainRemote(opts);
driver.init()
.then(() => driver.execute('return window.location.href'))
.then((url) => console.log(`Current URL is ${url}`))
.then(() => driver.takeScreenshot())
.then((screenshot) => console.log(`Screenshot taken`))
.catch((err) => console.error(err))
.finally(() => driver.end());
Best Practices
- Keep it simple: focus on explicit, clear, well-defined, bounded, understandable, and introspectable behavior.
- Minimize resource requirements: Linkerd should impose as minimal a performance and resource cost as possible.
- Just work: Linkerd should not break existing applications, nor should it require complex configuration to get started or to do something simple.
- Use GitOps workflow for managing infrastructure and deployments.
- Use browser testing and WebdriverIO for integration tests and sanity-checking UI changes.
Resources
- Create demo project templates with one script | Opensource.com
- Workshop recap: Running Linkerd in Production | Linkerd
- 7 things I learned from starting an open source project | Opensource.com
- Design Principles | Linkerd
- Using GitOps with Linkerd with Argo CD | Linkerd
- Browser Testing from Scratch: Building Quick and Easy Integration Tests with WebdriverIO and SauceLabs | Linkerd
- Building a Firecracker-Powered Course Platform To Learn Docker and Kubernetes | iximiuz.com
- Adopting GitOps and the Cloud in a Regulated Industry | HashiCorp
- Provenance Information for sigstore-scaffolding-ctlog-verifyfulcio Images | Chainguard Academy
- Do you prefer a live demo to be perfect or broken? | Opensource.com
- Provenance Information for sigstore-scaffolding-tuf-createsecret Images | Chainguard Academy