This document outlines the production scaling strategies for the docker/genai-stack project. Focused on code examples and technical configurations, it serves as a comprehensive guide to achieving scalability in production.
Overview
Scaling the docker/genai-stack in a production environment involves efficient resource management, container orchestration, and configuring services to handle increased loads. The following section describes individual services, Docker configurations, and deployment strategies to facilitate production scaling.
Docker Compose Configuration
The primary configuration for scaling the application in production is managed through the docker-compose.yml
file. Each service can be replicated and monitored, ensuring load balancing and redundancy.
Service Configuration
The key services and their configurations enable scalability. Below are details about relevant configurations used for production scaling:
- LLM Services: The
llm
andllm-gpu
services can be configured to run on multiple instances, especially under GPU profiles for handling intensive computations.
llm-gpu:
<<: *llm
profiles: ["linux-gpu"]
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Database Scaling
For databases, replication can be configured to ensure fault tolerance and high availability. The Neo4j instance can benefit from clustering for read-write scaling. Adjust the configurations based on the deployment requirements, utilizing volume persistence.
database:
user: neo4j:neo4j
image: neo4j:5.11
ports:
- 7687:7687
- 7474:7474
volumes:
- $PWD/data:/data
environment:
- NEO4J_AUTH=${NEO4J_USERNAME-neo4j}/${NEO4J_PASSWORD-password}
- NEO4J_PLUGINS=["apoc"]
Load Balancing Container Instances
Using Docker Swarm or Kubernetes is essential for managing multiple instances of services like api
, bot
, and pdf_bot
. Configure settings for scaling and load balancing, similar to the following example:
api:
deploy:
replicas: 3
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 http://localhost:8504/ || exit 1"]
interval: 5s
timeout: 3s
retries: 5
Multi-Container Docker Networking
Utilize the defined network for inter-service communication. All services should be connected to the same network to ensure seamless communication, and service discovery can be enhanced through DNS resolution.
networks:
net:
Environment Variable Management
Dynamic environment variables allow for easy adaptation depending on deployment environments such as staging or production. Use .env
files or Docker secrets for sensitive information.
environment:
- NEO4J_URI=${NEO4J_URI-neo4j://database:7687}
- OPENAI_API_KEY=${OPENAI_API_KEY}
Deployment Strategies
System Resource Allocation
Resource constraints like memory and CPU limits can be defined in the docker-compose.yml
, allowing for effective management without overwhelming the host machine.
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
Horizontal Scaling with Docker Swarm
To enable auto-scaling in a production environment, consider orchestrating services with Docker Swarm. You can scale the service with the following command:
docker service scale <service_name>=<number_of_replicas>
Monitoring and Health Checks
Integrate monitoring solutions (like Prometheus, Grafana) for insights into application health and performance. Health checks within the Docker configuration ensure that unhealthy services are restarted automatically.
healthcheck:
test: ["CMD-SHELL", "wget --no-verbose --tries=1 --spider localhost:7474 || exit 1"]
interval: 15s
timeout: 30s
retries: 10
Conclusion
By following the configurations and strategies outlined in this document, developers can effectively scale the docker/genai-stack in a production environment. Ensure that testing precedes deployment to maintain reliability and performance under load.
Quote Source: Docker Documentation and related scaling resources.