Production monitoring is critical for maintaining the performance and reliability of the project. Below are the steps and key components used to monitor the project in a production environment.

Instrumentation and Metrics

  1. Prometheus Configuration

    Prometheus is utilized for collecting metrics. The configuration is defined in the prometheus.yml file. Here’s an example of the relevant configuration for scraping application metrics:

    scrape_configs:
      - job_name: 'gitlab-app'
        static_configs:
          - targets: ['localhost:8080']
    

    This configuration enables Prometheus to scrape data from the application running on port 8080.

  2. Metrics Collection

    Metrics are exposed via an HTTP endpoint. The code snippet below illustrates how to set up the metrics endpoint within a Ruby on Rails application:

    require 'prometheus/client'
    
    prometheus = Prometheus::Client.registry
    
    # Create a new counter
    request_counter = Prometheus::Client::Counter.new(:http_requests_total, docstring: 'A counter of HTTP requests made.')
    
    prometheus.register(request_counter)
    
    class ApplicationController < ActionController::Base
      def process_action(method_name, *args)
        request_counter.increment(labels: { method: request.method, path: request.path })
        super
      end
    end
    

    The above code ensures that every incoming HTTP request increments the http_requests_total counter.

Log Aggregation

  1. Structured Logging

    Structured logging facilitates log parsing and querying. The following Ruby code demonstrates how to implement structured logging using the logger gem:

    require 'logger'
    
    class CustomLogger
      def initialize
        @logger = Logger.new(STDOUT)
      end
    
      def log_request(request)
        @logger.info(
          {
            time: Time.now,
            method: request.method,
            path: request.path,
            status: response.status
          }.to_json
        )
      end
    end
    

    This implementation logs requests in JSON format, allowing easier integration with log management systems.

  2. Centralized Logging Setup

    For centralized log aggregation, configure a logging service such as Elasticsearch or a logging platform. The following example demonstrates how to configure a Fluentd input:

    <source>
      @type tail
      path /var/log/gitlab/*.log
      pos_file /var/log/gitlab/fluentd.pos
      tag gitlab.*
      format json
    </source>
    

    With this setup, Fluentd will watch for log entries in the specified log files and send the structured logs to the central log management system.

Alerting

  1. Alert Rules in Prometheus

    Alerting rules can be defined within the prometheus.yml file to monitor specific metrics. Here is an example of an alert rule configuration that triggers when the error rate exceeds a threshold:

    groups:
      - name: application_alerts
        rules:
          - alert: HighErrorRate
            expr: rate(http_requests_total{status="500"}[5m]) > 0.1
            for: 10m
            labels:
              severity: critical
            annotations:
              summary: "High error rate detected"
              description: "The error rate is above 10% for the last 10 minutes."
    

    This rule helps to quickly identify and respond to high error rates within the application.

  2. Integrating with Alerting Systems

    Alerts can be routed to tools like PagerDuty or Slack. The following example illustrates how to configure Alertmanager to send alerts to Slack:

    receivers:
      - name: 'slack-notifications'
        slack_configs:
          - api_url: '<YOUR_SLACK_WEBHOOK_URL>'
            channel: '#alerts'
            text: '{{ .CommonAnnotations.summary }}: {{ .CommonAnnotations.description }}'
    

    This configuration allows immediate notification to the specified Slack channel when alerts are triggered.

Health Checks

  1. Application Health Endpoint

    Implementing health checks is essential for monitoring the application’s availability. Below is an example of a health check endpoint in Rails:

    get '/health' do
      status 200
      body 'OK'
    end
    

    This endpoint can be monitored by external tools to ensure that the application is running properly.

  2. Kubernetes Probes

    For applications running on Kubernetes, liveness and readiness probes should be defined. An example Kubernetes deployment configuration might look like this:

    spec:
      containers:
        - name: gitlab-app
          image: gitlab/gitlab-app:latest
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 5
    

    This setup ensures that Kubernetes can automatically manage the container’s lifecycle based on its health status.

These steps outline the key components and configurations for production monitoring, ensuring high performance, reliability, and responsiveness to issues that arise in the application environment.

Source: Internal project documentation.