Loader - docker/genai-stack

The GenAI Stack, provided by Docker, is a set of tools and components designed to simplify AI/ML integration for developers. The Loader is a part of this stack, responsible for importing recent Stack Overflow data for certain tags into a Knowledge Graph (KG), embedding questions and answers, and storing them in a vector index. Here are the possible options for the Loader and examples for each:

  1. Choosing tags: Users can specify the tags they want to import data for. For example, to import data for the “Machine Learning” and “Deep Learning” tags, the user can run the Loader with the following command:
./loader --tags "Machine Learning,Deep Learning"
  1. Running imports: The Loader can be run with the ./loader command. To see the progress and some stats of data in the database, the user can add the --verbose flag:
./loader --verbose
  1. Embedding questions and answers: The Loader uses Sentence Transformers for embedding questions and answers. The specific model used can be specified in the Loader’s configuration file. For example, to use the ‘all-MiniLM-L6-v2’ model, the user can add the following to the configuration file:
embedding:
model: "all-MiniLM-L6-v2"
  1. Storing data in a vector index: The Loader uses Langchain for storing the embedded questions and answers in a vector index. The specific index to use can be specified in the Loader’s configuration file. For example, to use an index named ‘ai-ml’, the user can add the following to the configuration file:
vector_index:
index: "ai-ml"

The Loader is built using several key technologies and dependencies, including Docker, Compose, Python, Neo4j, OpenAI, Boto3, FastAPI, Torch, and Sentence Transformers. It adheres to the Buildpack API version 0.8 and can work with various stacks, such as ‘io.buildpacks.stacks.jammy’. The Loader’s source code is available on GitHub.

Sources: