Monitoring and Logging for gitlab-org/...

Production monitoring is critical for maintaining the performance and reliability of the project. Below are the steps and key components used to monitor the project in a production environment.

Instrumentation and Metrics

Prometheus Configuration

Prometheus is utilized for collecting metrics. The configuration is defined in the prometheus.yml file. Here’s an example of the relevant configuration for scraping application metrics:
```
scrape_configs:
  - job_name: 'gitlab-app'
    static_configs:
      - targets: ['localhost:8080']
```
This configuration enables Prometheus to scrape data from the application running on port 8080.

Metrics Collection

Metrics are exposed via an HTTP endpoint. The code snippet below illustrates how to set up the metrics endpoint within a Ruby on Rails application:

require 'prometheus/client'

prometheus = Prometheus::Client.registry

# Create a new counter
request_counter = Prometheus::Client::Counter.new(:http_requests_total, docstring: 'A counter of HTTP requests made.')

prometheus.register(request_counter)

class ApplicationController < ActionController::Base
  def process_action(method_name, *args)
    request_counter.increment(labels: { method: request.method, path: request.path })
    super
  end
end

The above code ensures that every incoming HTTP request increments the http_requests_total counter.

Log Aggregation

Structured Logging

Structured logging facilitates log parsing and querying. The following Ruby code demonstrates how to implement structured logging using the logger gem:

require 'logger'

class CustomLogger
  def initialize
    @logger = Logger.new(STDOUT)
  end

  def log_request(request)
    @logger.info(
      {
        time: Time.now,
        method: request.method,
        path: request.path,
        status: response.status
      }.to_json
    )
  end
end

This implementation logs requests in JSON format, allowing easier integration with log management systems.

Centralized Logging Setup

For centralized log aggregation, configure a logging service such as Elasticsearch or a logging platform. The following example demonstrates how to configure a Fluentd input:
```
<source>
  @type tail
  path /var/log/gitlab/*.log
  pos_file /var/log/gitlab/fluentd.pos
  tag gitlab.*
  format json
</source>
```
With this setup, Fluentd will watch for log entries in the specified log files and send the structured logs to the central log management system.

Alerting

Alert Rules in Prometheus

Alerting rules can be defined within the prometheus.yml file to monitor specific metrics. Here is an example of an alert rule configuration that triggers when the error rate exceeds a threshold:

groups:
  - name: application_alerts
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status="500"}[5m]) > 0.1
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "The error rate is above 10% for the last 10 minutes."

This rule helps to quickly identify and respond to high error rates within the application.

Integrating with Alerting Systems

Alerts can be routed to tools like PagerDuty or Slack. The following example illustrates how to configure Alertmanager to send alerts to Slack:
```
receivers:
  - name: 'slack-notifications'
    slack_configs:
      - api_url: '<YOUR_SLACK_WEBHOOK_URL>'
        channel: '#alerts'
        text: '{{ .CommonAnnotations.summary }}: {{ .CommonAnnotations.description }}'
```
This configuration allows immediate notification to the specified Slack channel when alerts are triggered.

Health Checks

Application Health Endpoint

Implementing health checks is essential for monitoring the application’s availability. Below is an example of a health check endpoint in Rails:
```
get '/health' do
  status 200
  body 'OK'
end
```
This endpoint can be monitored by external tools to ensure that the application is running properly.

Kubernetes Probes

For applications running on Kubernetes, liveness and readiness probes should be defined. An example Kubernetes deployment configuration might look like this:

spec:
  containers:
    - name: gitlab-app
      image: gitlab/gitlab-app:latest
      livenessProbe:
        httpGet:
          path: /health
          port: 8080
        initialDelaySeconds: 30
        periodSeconds: 10
      readinessProbe:
        httpGet:
          path: /health
          port: 8080
        initialDelaySeconds: 15
        periodSeconds: 5

This setup ensures that Kubernetes can automatically manage the container’s lifecycle based on its health status.

These steps outline the key components and configurations for production monitoring, ensuring high performance, reliability, and responsiveness to issues that arise in the application environment.

Source: Internal project documentation.