Article

Configuring Scalable Alertmanager

Configure Cortex Alertmanager with authenticated APIs and a new user interface

Mar 304 min readNahum Wild, Nick Parker

a shark fin in a bathtub

Until now, we've primarily focused our effort on the Opstrace CLI tooling, so we are excited today to talk about new functionality in the browser-based UI. We have added support for configuring the Cortex Scalable Alertmanager from both the CLI and the UI.

If you're familiar with Prometheus, then you're familiar with Alertmanager. It works great, despite some rough edges. But you probably also know that it works with a single standalone Prometheus instance and that its endpoints are not secured by default. This can be a fine configuration until you want to start scaling up or protecting it. Additionally, there is little-to-no tooling to make Alertmanager easy to work with.

To solve these problems, we use the Cortex Scalable Alertmanager that provides HTTP endpoints to configure the Alertmanager and alert rules per tenant. While the new Opstrace endpoints are mostly just a pass-through to these Cortex APIs, they provide authentication support and TLS by default and are a great starting point to build on. We are also excited to announce our beginning of UI for managing all of this.

While making these developments available, we are also setting the groundwork for future improvements, such as enhanced validation feedback to the user. Stay tuned to our blog for more, and read on to see what we have today.

Usage Overview

Via the Browser UI

The Alertmanager Configuration page is available in the navigation sidebar under the associated tenant:

alertmanager configuration ui example

Here you can paste in your existing Prometheus Alertmanager Configuration file and publish it to Cortex. We perform some basic validation checks in the browser and disable the publish button if we know it'll be rejected. The UI doesn't support template files yet, but that is coming.

For more information, including limitations, see the full User Guide.

Via the HTTP API

Opstrace also exposes authenticated-by-default HTTP endpoints—with TLS by default—for Alertmanager and Ruler configuration APIs served by Cortex. When sending the request to Opstrace, the request must include the tenant auth token. The request will route to the correct Cortex tenant specified in that token.

$ echo '
alertmanager_config: |
route:
receiver: 'default-receiver'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
group_by: [cluster, alertname]
receivers:
- name: default-receiver
' | curl -H "Authorization: Bearer $(cat tenant-api-token-dev)" --data-binary @- https://me.opstrace.io/api/v1/alerts

For more information, see the full User Guide.

Implementation Overview

For delivering configuration updates to underlying Cortex Alertmanager APIs, Opstrace deploys an API frontend. This frontend serves requests on two ports, one serving internal requests from the UI via Hasura Actions and the other serving public requests to /api/v1/*.

api configuration workflow

Before the Opstrace React application uses this Hasura Actions interface, it does a validation check before enabling the publish button. To achieve this, we've created an extensive set of Yup validations These cover pretty much all aspects of the Alertmanager configuration schema except for xor style situations like the http_config or SMTP global configuration have.

Outlook

This work is the first step to provide a better user experience around creating and updating Alertmanager configuration. This initial foray is intentionally straightforward and basic—pasting your YAML into a text area field in the browser and saving it—as the initial focus has been on getting all the parts of the process set up and working.

The direct HTTP API likewise is kept simple. It mainly supports cases where direct access to the underlying Cortex configuration APIs is preferable or when the alert configuration is applied via CI/CD or similar automation.

We plan to continue investment in the UI-based approach. We’ve started working on validation and intend to continue that work. We will eventually move away from YAML as the primary interface. Instead, we will present a smart form-based solution where the YAML configuration is dynamically built just-in-time when publishing to Cortex. Doing it this way will allow us to have global definitions for templates, receivers, routes, etc., that multiple tenants can then reference. E.g., you need to only change a route’s email address once for the whole installation rather than having to go in and hand-edit the YAML for each tenant using the email address.

Nahum & Nick

References: