Parsers and Postprocessors

The Parsers and Postprocessors are responsible for ingesting data from various sources, parsing it into a structured format, and then post-processing it for optimal representation within the Knowledge Graph (KG).

Data Sources and Parsers

The current data sources and their corresponding parsers include:

  1. YAML Files: Used for defining the KG schema and other configuration settings.

    • yaml.Parser: Parses YAML files.
    • Example:
    schema:
                - name: Person
                  properties:
                    - name: name
                      type: string
                    - name: age
                      type: integer
              
  2. CSV Files: Used for importing data into the KG.

    • csv.Parser: Parses CSV files.
    • Example:
    name,age
              John Doe,30
              Jane Doe,25
              
  3. JSON Files: Used for importing data into the KG.

    • json.Parser: Parses JSON files.
    • Example:
    [
                {
                  "name": "John Doe",
                  "age": 30
                },
                {
                  "name": "Jane Doe",
                  "age": 25
                }
              ]
              
  4. SQL Databases: Used for importing data from relational databases into the KG.

    • sql.Parser: Parses SQL queries and extracts data from databases.
    • Example:
    SELECT name, age FROM users;
              

Postprocessors

Postprocessors are responsible for transforming parsed data into a format suitable for storage and querying within the KG. They perform various operations, including:

  1. Type Conversion: Converts data types to match the KG schema.
  2. Entity Resolution: Maps external entities to their corresponding KG entities.
  3. Relationship Extraction: Extracts relationships between entities based on the parsed data.
  4. Data Cleaning: Removes inconsistencies, duplicates, and errors from the data.
  5. Data Enrichment: Adds additional information to the KG, such as timestamps or derived properties.

Example: Processing a CSV File

  1. Parsing: The csv.Parser reads the CSV file and parses its contents into a list of dictionaries.
  2. Postprocessing: The postprocessor.EntityResolver maps the parsed data to existing entities in the KG. For example, it might identify “John Doe” as a Person entity with the ID P123.
  3. KG Integration: The postprocessed data is then integrated into the KG, creating new relationships and updating existing ones.

Implementation Details

  • parsers.py: Defines the various parsers for different data formats.
  • postprocessors.py: Defines the various postprocessors for data transformation and integration.
  • kg_client.py: Provides an interface for interacting with the KG.
  • utils.py: Contains utility functions used by the parsers and postprocessors.

References