Database Schema and Management

The HelixML database schema is designed to store and manage the data used in the HelixML project. This includes data for different components of the system, such as models, datasets, experiments, and users.

Components:

  • Users: Stores user information, including username, password, email, and role.
  • Models: Stores information about machine learning models, including the model type, hyperparameters, training data, and performance metrics.
  • Datasets: Stores information about datasets used in the project, including the dataset name, description, location, and size.
  • Experiments: Stores information about machine learning experiments, including the model used, the dataset used, the hyperparameters used, and the results.
  • Results: Stores results from machine learning experiments, including metrics such as accuracy, precision, recall, and F1-score.

Relationships:

  • Users can create and manage Models and Datasets.
  • Models are trained on Datasets.
  • Experiments are performed with specific Models and Datasets.
  • Results are generated from Experiments.

Example Schema:

CREATE TABLE Users (
            id INT PRIMARY KEY,
            username VARCHAR(255) NOT NULL,
            password VARCHAR(255) NOT NULL,
            email VARCHAR(255) NOT NULL,
            role VARCHAR(255) NOT NULL
          );
          
          CREATE TABLE Models (
            id INT PRIMARY KEY,
            name VARCHAR(255) NOT NULL,
            type VARCHAR(255) NOT NULL,
            hyperparameters JSON NOT NULL,
            training_data_id INT NOT NULL,
            created_by INT NOT NULL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (training_data_id) REFERENCES Datasets(id),
            FOREIGN KEY (created_by) REFERENCES Users(id)
          );
          
          CREATE TABLE Datasets (
            id INT PRIMARY KEY,
            name VARCHAR(255) NOT NULL,
            description TEXT,
            location VARCHAR(255) NOT NULL,
            size BIGINT NOT NULL,
            created_by INT NOT NULL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (created_by) REFERENCES Users(id)
          );
          
          CREATE TABLE Experiments (
            id INT PRIMARY KEY,
            model_id INT NOT NULL,
            dataset_id INT NOT NULL,
            hyperparameters JSON NOT NULL,
            created_by INT NOT NULL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (model_id) REFERENCES Models(id),
            FOREIGN KEY (dataset_id) REFERENCES Datasets(id),
            FOREIGN KEY (created_by) REFERENCES Users(id)
          );
          
          CREATE TABLE Results (
            id INT PRIMARY KEY,
            experiment_id INT NOT NULL,
            accuracy FLOAT NOT NULL,
            precision FLOAT NOT NULL,
            recall FLOAT NOT NULL,
            f1_score FLOAT NOT NULL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            FOREIGN KEY (experiment_id) REFERENCES Experiments(id)
          );
          

Management:

The database is managed using a combination of SQL queries and a dedicated database management tool. This allows users to interact with the database, query data, and perform data analysis.

Data Integrity:

Data integrity is ensured through the use of constraints, such as primary keys, foreign keys, and data type validation. This helps to prevent data corruption and ensure the consistency of the data.

Security:

The database is protected with strong passwords, access control mechanisms, and regular security audits. This helps to prevent unauthorized access to the data and ensure its confidentiality.