Production Monitoring in Screenly Anthias

Overview

Monitoring the Anthias application in production involves several steps using various tools and techniques. This document outlines how to effectively monitor the application containers, check application uptime, and validate system performance using logging and watchdog mechanisms.

Container Monitoring with Docker

To monitor the Docker containers running Anthias, you can access the logs for individual services. The following command retrieves logs for the server container, which provides insights into application-level events and errors:

$ docker logs -f screenly-anthias-server-1

Replace screenly-anthias-server-1 with the name of the desired container to monitor logs for different components such as nginx, viewer, celery, etc.

Monitoring Application Uptime

Uptime is a critical metric for monitoring application availability. The following Python function can be utilized to get the system uptime:

def get_uptime():
    with open('/proc/uptime', 'r') as f:
        uptime_seconds = float(f.readline().split()[0])
    return uptime_seconds

This function reads the uptime from the /proc/uptime file, which provides how long the system has been running.

Watchdog Implementation

The watchdog mechanism helps monitor that the Anthias application is running. It notifies if the application is alive by updating a specific “watchdog” file. The implementation in Python can be seen here:

def watchdog():
    """Notify the watchdog file to be used with the watchdog-device."""
    if not path.isfile(WATCHDOG_PATH):
        open(WATCHDOG_PATH, 'w').close()
    else:
        utime(WATCHDOG_PATH, None)

This function checks for the existence of a defined WATCHDOG_PATH file, creates it if it doesn’t exist, or updates its modification time if it does. This can be run at regular intervals to ensure that the application is still active.

Testing Watchdog Functionality

Automatic tests should be in place to verify that the watchdog operates correctly. Example unit tests in Python might look like this:

class TestWatchdog(ViewerTestCase):
    def test_watchdog_should_create_file_if_not_exists(self):
        try:
            os.remove(self.u.WATCHDOG_PATH)
        except OSError:
            pass
        self.u.watchdog()
        self.assertEqual(os.path.exists(self.u.WATCHDOG_PATH), True)

    def test_watchdog_should_update_mtime(self):
        self.u.watchdog()
        mtime = os.path.getmtime(self.u.WATCHDOG_PATH)

        # Python is too fast?
        sleep(0.01)

        self.u.watchdog()
        mtime2 = os.path.getmtime(self.u.WATCHDOG_PATH)
        self.assertGreater(mtime2, mtime)

These tests ensure that the watchdog file creates properly if it does not exist and that it updates its last modification time when the watchdog() function is called.

Summary of Monitoring Steps

  1. Access Container Logs: Use Docker logs to monitor application activity and debug issues.
  2. Check Uptime: Use the get_uptime() function to track the application’s running time.
  3. Implement and Test Watchdog: Regularly check that the application responds correctly and is operational using a watchdog file.

This structured approach enables effective monitoring of the Anthias application in a production environment.

References