(metrics) add prometheus metrics endpoint (#329)

* (backend) add base django-prometheus metrics

* (backend) add prometheus metrics, add the message by status metric

* (backend) Add more metrics on attachement + add 0 by default for statuses

* 🚨(backend) ignore import-outside-toplevel for collector

* 🚨(backend) add all docstring + don't access private properties

* 🔒️(backend) add basic auth to /metrics route

* 🚸(backend) use labels instead of hard-coding statuses in metric

* 📝(backend) document prometheus

* 🎨(backend) lint

* (backend) add first tests for prometheus

* 🚚(backend) move prometheus endpoint to api/<ver>/prometheus/metrics, change middlewarename

* (backend) add attachment tests for prometheus metrics

* 🐛(backend) actually allow disabling prometheus

* 🎨(backend) improve logic in mw

* 🎨(backend) small review fixes

* 🚨(backend) lint

* 🐛(backend) Remove sha256 from factory

* 🎨(backend) nitpicks + linter fixes

* 🐛(poetry) remove duplicate prometheus package in dev deps

* 🐛(poetry) add missing dep

* 🐛(prom) make tests more reliable, disable by default + misc fixes

* 📝(prom) fix docstring in tests

---------

Co-authored-by: Stanislas Bruhiere <stanislas@bruhiere.fr>
This commit is contained in:
Sylvain Zimmer
2025-09-05 12:22:54 +02:00
committed by GitHub
parent 647abfb2eb
commit d64679af5c
13 changed files with 535 additions and 14 deletions

View File

@@ -206,13 +206,12 @@ The application uses a new environment file structure with `.defaults` and `.loc
| `LOGGING_LEVEL_LOGGERS_APP` | `INFO` | Application logger level | Optional |
| `LOGGING_LEVEL_HANDLERS_CONSOLE` | `INFO` | Console handler level | Optional |
## API Configuration
### Prometheus
| Variable | Default | Description | Required |
|----------|---------|-------------|----------|
| `API_USERS_LIST_LIMIT` | `5` | Default limit for user list API | Optional |
| `API_USERS_LIST_THROTTLE_RATE_SUSTAINED` | `180/hour` | Sustained throttle rate | Optional |
| `API_USERS_LIST_THROTTLE_RATE_BURST` | `30/minute` | Burst throttle rate | Optional |
| `ENABLE_PROMETHEUS` | `False` | Enable Prometheus monitoring | Optional |
| `PROMETHEUS_API_KEY` | None | Bearer token required to access metrics. If unset, the endpoint is public. Set this in production. | Optional |
### OpenAPI Schema

49
docs/prometheus.md Normal file
View File

@@ -0,0 +1,49 @@
# Exported Prometheus Metrics
This service exposes the following custom Prometheus metrics via the `api/v1.0/prometheus/metrics` endpoint, in addition to all metrics exported via [django-prometheus](https://github.com/django-commons/django-prometheus).
Custom metrics are collected from the database using Django ORM and are available when the application is running.
## Metrics
### Message Status Counts
This metric is exported with labels corresponding to each possible message delivery status.
**Metric:**
```
message_status_count{status="<status>"}
```
**Example:**
- `message_status_count{status="retry"}`
- `message_status_count{status="sent"}`
**Description:**
Number of messages with the given delivery status. If no messages exist for a status, the value is `0`.
---
### Attachment Count
**Metric:**
```
attachment_count
```
**Description:**
Total number of attachments in the database.
---
### Attachments Total Size
**Metric:**
```
attachments_total_size_bytes
```
**Description:**
Total size (in bytes) of all attachments, summed over the `blob.size` field.

View File

@@ -17,6 +17,10 @@ LOGGING_LEVEL_HANDLERS_CONSOLE=INFO
LOGGING_LEVEL_LOGGERS_ROOT=INFO
LOGGING_LEVEL_LOGGERS_APP=INFO
# Prometheus
ENABLE_PROMETHEUS=0
PROMETHEUS_API_KEY=ExamplePrometheusApiKey
# Python
PYTHONPATH=/app

View File

@@ -2,6 +2,7 @@
import hashlib
import logging
import secrets
from django.conf import settings
from django.contrib.auth import get_user_model
@@ -50,7 +51,7 @@ class MTAJWTAuthentication(authentication.BaseAuthentication):
# Validate email hash if there's a body
if request.body:
body_hash = hashlib.sha256(request.body).hexdigest()
if body_hash != payload["body_hash"]:
if not secrets.compare_digest(body_hash, payload["body_hash"]):
raise jwt.InvalidTokenError("Invalid email hash")
service_account = User()

View File

@@ -12,8 +12,17 @@ class CoreConfig(AppConfig):
verbose_name = _("messages core application")
def ready(self):
"""Register signal handlers when the app is ready."""
# Import signal handlers to register them
"""Register signal handlers and prometheus collector when the app is ready."""
# pylint: disable=unused-import, import-outside-toplevel
from django.conf import settings
if settings.ENABLE_PROMETHEUS:
from prometheus_client.core import REGISTRY
from .metrics import CustomDBPrometheusMetricsCollector
REGISTRY.register(CustomDBPrometheusMetricsCollector())
# Import signal handlers to register them
# pylint: disable=unused-import, import-outside-toplevel
import core.signals # noqa

View File

@@ -215,3 +215,44 @@ class LabelFactory(factory.django.DjangoModelFactory):
if isinstance(extracted, (list, tuple)):
for thread in extracted:
self.threads.add(thread)
class BlobFactory(factory.django.DjangoModelFactory):
"""A factory to random blobs for testing purposes."""
class Meta:
model = models.Blob
raw_content = factory.LazyAttribute(lambda o: b"Blob content")
content_type = factory.LazyAttribute(lambda o: "application/octet-stream")
size = factory.LazyAttribute(lambda o: len(o.raw_content))
mailbox = factory.SubFactory(MailboxFactory)
class AttachmentFactory(factory.django.DjangoModelFactory):
"""A factory to random attachments for testing purposes."""
class Meta:
model = models.Attachment
mailbox = factory.SubFactory(MailboxFactory)
name = factory.Sequence(lambda n: f"attachment{n}.txt")
blob_size = 1500
@factory.lazy_attribute
def blob(self):
"""Create a blob with specified size for the attachment."""
raw_content = b"x" * self.blob_size
return BlobFactory(
mailbox=self.mailbox, size=self.blob_size, raw_content=raw_content
)
@classmethod
def _adjust_kwargs(cls, **kwargs):
"""
Adjust the keyword arguments before passing them to the model.
"""
# Remove blob_size from kwargs before passing to model
kwargs = dict(kwargs)
kwargs.pop("blob_size", None)
return kwargs

View File

@@ -0,0 +1,87 @@
"""
Custom Prometheus metrics collector for the messages core application.
This module defines a collector that exposes database-related metrics
(such as message counts by status, attachment counts, and total attachment size)
to Prometheus via the /metrics endpoint.
"""
from django.apps import apps
from django.db import models
from prometheus_client.core import GaugeMetricFamily
from .enums import MessageDeliveryStatusChoices
from .models import Attachment, MessageRecipient
class CustomDBPrometheusMetricsCollector:
"""
Prometheus collector for custom database metrics.
"""
def get_messages_with_status(self):
"""
Yields a GaugeMetricFamily for each possible message delivery status,
with the count of messages for that status. If no messages exist for a status,
the count is 0.
"""
messages_statuses_count = MessageRecipient.objects.values(
"delivery_status"
).annotate(count=models.Count("id"))
status_count_map = {
row["delivery_status"]: row["count"] for row in messages_statuses_count
}
gauge = GaugeMetricFamily(
"message_status_count",
"Number of messages by delivery status",
labels=["status"],
)
for status in MessageDeliveryStatusChoices:
label = status.label
count = status_count_map.get(status.value, 0)
gauge.add_metric([label], count)
yield gauge
def get_draft_attachments_count(self):
"""
Yields a GaugeMetricFamily with the total number of draft attachments.
"""
attachments_count = Attachment.objects.count()
yield GaugeMetricFamily(
"draft_attachment_count",
"Number of draft attachments",
value=attachments_count,
)
def get_draft_attachments_total_size(self):
"""
Yields a GaugeMetricFamily with the total size (in bytes) of all draft attachments.
"""
total_size = (
Attachment.objects.aggregate(models.Sum("blob__size"))["blob__size__sum"]
or 0
)
yield GaugeMetricFamily(
"draft_attachments_total_size_bytes",
"Total size of all draft attachments in bytes",
value=total_size,
)
def collect(self):
"""
Entrypoint for Prometheus metric collection.
Yields all custom metrics if Django apps are ready and the 'core' app is installed.
This ensures that we only collect metrics when the application is in a valid state,
e.g. not during migrations.
"""
# Only run if apps are ready and model is migrated
if not apps.ready or not apps.is_installed("core"):
return
yield from self.get_messages_with_status()
yield from self.get_draft_attachments_count()
yield from self.get_draft_attachments_total_size()

View File

@@ -0,0 +1,26 @@
"""Custom Django middlewares"""
from secrets import compare_digest
from django.conf import settings
from django.http import HttpResponse
class PrometheusAuthMiddleware:
"""
Middleware to enforce authentication via Bearer token for Prometheus metrics endpoint.
"""
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
if request.path.startswith(f"/api/{settings.API_VERSION}/prometheus"):
if settings.PROMETHEUS_API_KEY:
if not compare_digest(
request.headers.get("Authorization") or "",
f"Bearer {settings.PROMETHEUS_API_KEY}",
):
return HttpResponse("Unauthorized", status=401)
return self.get_response(request)

View File

@@ -0,0 +1,259 @@
"""Tests for the Prometheus metrics endpoint."""
# pylint: disable=redefined-outer-name, unused-argument
import sys
from importlib import import_module, reload
from django.test import override_settings
from django.urls import clear_url_caches, reverse
import pytest
from prometheus_client.parser import text_string_to_metric_families
from core.enums import MessageDeliveryStatusChoices
from core.factories import AttachmentFactory, MessageRecipientFactory
@pytest.fixture
def url():
"""
Fixture to return the URL for the Prometheus metrics endpoint.
Returns:
str: The URL for the Prometheus metrics endpoint.
"""
return reverse("prometheus-django-metrics")
def response_to_metrics_dict(response, with_label=None):
"""Convert a response to a dictionary of metrics"""
d = {}
for family in text_string_to_metric_families(response.content.decode("utf-8")):
for sample in family.samples:
if with_label:
d[(sample.name, sample.labels.get(with_label))] = sample.value
else:
d[sample.name] = sample.value
return d
class TestPrometheusMetrics:
"""
Test suite for the Prometheus metrics endpoint.
This class contains tests to verify authentication, message status metrics,
attachment count metrics, and attachment size metrics as reported by the
Prometheus /metrics endpoint.
"""
@pytest.fixture(autouse=True)
def configure_settings(self):
"""Run before each test"""
self.reload_urls()
def reload_urls(self):
"""Reload the Django URL router"""
clear_url_caches()
if "messages.urls" in sys.modules:
reload(sys.modules["messages.urls"])
else:
import_module("messages.urls")
@pytest.mark.django_db
def test_metrics_endpoint_requires_auth(self, api_client, settings, url):
"""
Test that the metrics endpoint requires authentication.
Asserts that requests without or with invalid authentication are rejected (401),
and requests with the correct API key are accepted (200).
"""
# Test without authentication
response = api_client.get(url)
assert response.status_code == 401
# Test with invalid authentication
response = api_client.get(url, HTTP_AUTHORIZATION="Bearer invalid_token")
assert response.status_code == 401
# Test with authentication
response = api_client.get(
url, HTTP_AUTHORIZATION=f"Bearer {settings.PROMETHEUS_API_KEY}"
)
assert response.status_code == 200
@override_settings(ENABLE_PROMETHEUS=False)
@pytest.mark.django_db
def test_metrics_endpoint_prometheus_disabled(self, api_client, settings, url):
"""
Test that the metrics endpoint is disabled when ENABLE_PROMETHEUS is False.
Asserts that requests are rejected (404).
"""
self.reload_urls()
# Test with authentication
response = api_client.get(
url, HTTP_AUTHORIZATION=f"Bearer {settings.PROMETHEUS_API_KEY}"
)
assert response.status_code == 404
@pytest.mark.django_db
def test_get_messages_with_status_count_zero(self, api_client, settings, url):
"""
Test that message status metrics are zero when there are no messages.
Asserts that all message status counts are reported as zero.
"""
response = api_client.get(
url, HTTP_AUTHORIZATION=f"Bearer {settings.PROMETHEUS_API_KEY}"
)
metrics = response_to_metrics_dict(response, with_label="status")
assert (
metrics[("message_status_count", MessageDeliveryStatusChoices.SENT.label)]
== 0
)
assert (
metrics[
("message_status_count", MessageDeliveryStatusChoices.INTERNAL.label)
]
== 0
)
assert (
metrics[("message_status_count", MessageDeliveryStatusChoices.FAILED.label)]
== 0
)
assert (
metrics[("message_status_count", MessageDeliveryStatusChoices.RETRY.label)]
== 0
)
@pytest.mark.django_db
def test_get_messages_with_status_count(self, api_client, settings, url):
"""
Test that message status metrics reflect the correct count for each status.
Asserts that the metrics endpoint reports the correct count for each
MessageDeliveryStatusChoices value.
"""
statuses_to_count = {
MessageDeliveryStatusChoices.SENT: 1,
MessageDeliveryStatusChoices.INTERNAL: 2,
MessageDeliveryStatusChoices.FAILED: 3,
MessageDeliveryStatusChoices.RETRY: 4,
}
for status, count in statuses_to_count.items():
MessageRecipientFactory.create_batch(size=count, delivery_status=status)
response = api_client.get(
url, HTTP_AUTHORIZATION=f"Bearer {settings.PROMETHEUS_API_KEY}"
)
metrics = response_to_metrics_dict(response, with_label="status")
for status, count in statuses_to_count.items():
assert metrics[("message_status_count", status.label)] == count
@pytest.mark.django_db
def test_get_attachments_count_zero(self, api_client, settings, url):
"""
Test that the attachment count metric is zero when there are no attachments.
Asserts that the 'draft_attachment_count' metric is reported as zero.
"""
response = api_client.get(
url, HTTP_AUTHORIZATION=f"Bearer {settings.PROMETHEUS_API_KEY}"
)
metrics = response_to_metrics_dict(response)
assert metrics[("draft_attachment_count")] == 0
@pytest.mark.parametrize("attachment_count", [0, 1, 10])
@pytest.mark.django_db
def test_get_attachments_count(self, api_client, settings, url, attachment_count):
"""
Test that the attachment count metric matches the number of created attachments.
Args:
attachment_count (int): The number of attachments to create.
Asserts that the 'draft_attachment_count' metric equals the number of created attachments.
"""
AttachmentFactory.create_batch(size=attachment_count)
response = api_client.get(
url, HTTP_AUTHORIZATION=f"Bearer {settings.PROMETHEUS_API_KEY}"
)
metrics = response_to_metrics_dict(response)
assert metrics[("draft_attachment_count")] == attachment_count
@pytest.mark.django_db
def test_get_attachments_size_no_attachment(self, api_client, settings, url):
"""
Test that the total attachment size metric is zero when there are no attachments.
Asserts that the 'draft_attachments_total_size_bytes' metric is reported as zero.
"""
response = api_client.get(
url, HTTP_AUTHORIZATION=f"Bearer {settings.PROMETHEUS_API_KEY}"
)
metrics = response_to_metrics_dict(response)
assert metrics[("draft_attachments_total_size_bytes")] == 0
@pytest.mark.parametrize("blob_size", [0, 150, 1000])
@pytest.mark.django_db
def test_get_attachments_size_one_attachment(
self, api_client, settings, url, blob_size
):
"""
Test that the total attachment size metric matches the size of a single attachment.
Args:
blob_size (int): The size of the blob to create.
Asserts that the 'draft_attachments_total_size_bytes' metric equals the blob size.
"""
AttachmentFactory(blob_size=blob_size)
response = api_client.get(
url, HTTP_AUTHORIZATION=f"Bearer {settings.PROMETHEUS_API_KEY}"
)
metrics = response_to_metrics_dict(response)
assert metrics[("draft_attachments_total_size_bytes")] == blob_size
@pytest.mark.parametrize("blob_sizes", [[0, 0], [0, 150, 1000], [1, 2, 3, 4, 5]])
@pytest.mark.django_db
def test_get_attachments_size_multiple_attachments(
self, api_client, settings, url, blob_sizes
):
"""
Test that the total attachment size metric matches the sum of multiple attachments.
Args:
blobs_size (list): List of blob sizes to create.
Asserts that the 'draft_attachments_total_size_bytes' metric equals the sum of blob sizes.
"""
for blob_size in blob_sizes:
AttachmentFactory(blob_size=blob_size)
response = api_client.get(
url, HTTP_AUTHORIZATION=f"Bearer {settings.PROMETHEUS_API_KEY}"
)
metrics = response_to_metrics_dict(response)
assert metrics[("draft_attachments_total_size_bytes")] == sum(blob_sizes)

View File

@@ -607,6 +607,14 @@ class Base(Configuration):
environ_prefix=None,
)
ENABLE_PROMETHEUS = values.BooleanValue(
default=False, environ_name="ENABLE_PROMETHEUS", environ_prefix=None
)
PROMETHEUS_API_KEY = values.Value(
None, environ_name="PROMETHEUS_API_KEY", environ_prefix=None
)
# AI
AI_API_KEY = values.Value(None, environ_name="AI_API_KEY", environ_prefix=None)
AI_BASE_URL = values.Value(None, environ_name="AI_BASE_URL", environ_prefix=None)
@@ -658,12 +666,6 @@ class Base(Configuration):
},
}
API_USERS_LIST_LIMIT = values.PositiveIntegerValue(
default=5,
environ_name="API_USERS_LIST_LIMIT",
environ_prefix=None,
)
# External services
# Settings related to the interoperability with external services
# that messages is able to use
@@ -676,6 +678,18 @@ class Base(Configuration):
"api_url": "/api/v1.0",
}
# pylint: disable=invalid-name
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
if self.ENABLE_PROMETHEUS:
self.INSTALLED_APPS += ["django_prometheus"]
self.MIDDLEWARE = [
"core.middlewares.PrometheusAuthMiddleware",
"django_prometheus.middleware.PrometheusBeforeMiddleware",
*self.MIDDLEWARE,
"django_prometheus.middleware.PrometheusAfterMiddleware",
]
# pylint: disable=invalid-name
@property
def ENVIRONMENT(self):
@@ -769,6 +783,9 @@ class Development(Base):
SESSION_COOKIE_NAME = "st_messages_sessionid"
ENABLE_PROMETHEUS = True
PROMETHEUS_API_KEY = "local_api_key"
USE_SWAGGER = True
SESSION_CACHE_ALIAS = "session"
CACHES = {
@@ -832,6 +849,9 @@ class Test(Base):
SCHEMA_CUSTOM_ATTRIBUTES_USER = {}
SCHEMA_CUSTOM_ATTRIBUTES_MAILDOMAIN = {}
ENABLE_PROMETHEUS = True
PROMETHEUS_API_KEY = "test_api_key"
# pylint: disable=invalid-name
def __init__(self):
super().__init__()

View File

@@ -52,3 +52,12 @@ if settings.USE_SWAGGER or settings.DEBUG:
name="redoc-schema",
),
]
if settings.ENABLE_PROMETHEUS:
urlpatterns += [
path(
f"api/{settings.API_VERSION}/prometheus/",
include("django_prometheus.urls"),
name="prometheus-metrics",
),
]

View File

@@ -961,6 +961,22 @@ files = [
[package.dependencies]
Django = ">=2.2"
[[package]]
name = "django-prometheus"
version = "2.4.1"
description = "Django middlewares to monitor your application with Prometheus.io."
optional = false
python-versions = "*"
groups = ["main"]
files = [
{file = "django_prometheus-2.4.1-py2.py3-none-any.whl", hash = "sha256:7fe5af7f7c9ad9cd8a429fe0f3f1bf651f0e244f77162147869eab7ec09cc5e7"},
{file = "django_prometheus-2.4.1.tar.gz", hash = "sha256:073628243d2a6de6a8a8c20e5b512872dfb85d66e1b60b28bcf1eca0155dad95"},
]
[package.dependencies]
Django = ">=4.2,<6.0"
prometheus-client = ">=0.7"
[[package]]
name = "django-redis"
version = "5.4.0"
@@ -3563,4 +3579,4 @@ dev = ["django-extensions", "drf-spectacular-sidecar", "flower", "pip-audit", "p
[metadata]
lock-version = "2.1"
python-versions = ">=3.13,<4.0"
content-hash = "ced1bc89461f209d7494200a3588b66759efc44593d3be4c6a5cb22ea707e140"
content-hash = "9a1d9a65ec7eb98c920df96ada8ea6b918eaf7c195ae8c29e36f446006e4072a"

View File

@@ -36,6 +36,7 @@ dependencies = [
"django-fernet-encrypted-fields==0.3.0",
"django-filter==24.3",
"django-parler==2.3",
"django-prometheus==2.4.1",
"django-redis==5.4.0",
"django-storages==1.14.6",
"django-timezone-field==7.1",