Parsers and Postprocessors
The Parsers and Postprocessors are responsible for ingesting data from various sources, parsing it into a structured format, and then post-processing it for optimal representation within the Knowledge Graph (KG).
Data Sources and Parsers
The current data sources and their corresponding parsers include:
YAML Files: Used for defining the KG schema and other configuration settings.
yaml.Parser
: Parses YAML files.- Example:
schema: - name: Person properties: - name: name type: string - name: age type: integer
CSV Files: Used for importing data into the KG.
csv.Parser
: Parses CSV files.- Example:
name,age John Doe,30 Jane Doe,25
JSON Files: Used for importing data into the KG.
json.Parser
: Parses JSON files.- Example:
[ { "name": "John Doe", "age": 30 }, { "name": "Jane Doe", "age": 25 } ]
SQL Databases: Used for importing data from relational databases into the KG.
sql.Parser
: Parses SQL queries and extracts data from databases.- Example:
SELECT name, age FROM users;
Postprocessors
Postprocessors are responsible for transforming parsed data into a format suitable for storage and querying within the KG. They perform various operations, including:
- Type Conversion: Converts data types to match the KG schema.
- Entity Resolution: Maps external entities to their corresponding KG entities.
- Relationship Extraction: Extracts relationships between entities based on the parsed data.
- Data Cleaning: Removes inconsistencies, duplicates, and errors from the data.
- Data Enrichment: Adds additional information to the KG, such as timestamps or derived properties.
Example: Processing a CSV File
- Parsing: The
csv.Parser
reads the CSV file and parses its contents into a list of dictionaries. - Postprocessing: The
postprocessor.EntityResolver
maps the parsed data to existing entities in the KG. For example, it might identify “John Doe” as a Person entity with the IDP123
. - KG Integration: The postprocessed data is then integrated into the KG, creating new relationships and updating existing ones.
Implementation Details
parsers.py
: Defines the various parsers for different data formats.postprocessors.py
: Defines the various postprocessors for data transformation and integration.kg_client.py
: Provides an interface for interacting with the KG.utils.py
: Contains utility functions used by the parsers and postprocessors.