Since our initial release in December and our HackerNews launch in February, we've talked to many people. In that time, we have refined how we talk about what Opstrace is by looking for a clear analogy for our strategy. Recently, the lightbulb went off: we're bundling open source projects just like in a distribution. While the analogy is not perfect, we think it's helpful.
Linux distributions package, maintain, and simplify open source software to make them usable by a wide range of users. They do the hard work of building, testing, and patching software into one coherent upgradable and maintainable system. Some of them choose defaults and build tools in an opinionated way to deliver value for specific use cases. For example, Ubuntu takes a base like Debian and makes it more user-friendly with installers and management UIs. CoreOS, RIP, bundled a mainline kernel for running containers, heavily tested, in a fast-paced environment, so you could confidently stay up-to-date.
Opstrace is the Open Source Observability Distribution. Like Linux distributions, we simplify the packaging, installation and maintenance of open source observability projects that are otherwise a highly complex stack for you to operate. For example, we’re building tools to improve the critical alert creation/management workflow and to test upgrades so you can confidently—and regularly—upgrade versions of projects to stay up-to-date with security patches and features. We believe that teams big and small need and deserve a better open source experience. Our open source distribution is a place everyone can come to participate in that vision.
To achieve this, we started with the basics—with nothing but our installer and your cloud credentials, you can install a standalone observability platform in GCP or AWS. Out of the box, the system has features that you don’t get from the underlying open source projects (and that people rarely bother to build for themselves): API authentication and TLS. The intent here is that you can expose it over the Internet just like you would do with a SaaS provider. (Of course, you don’t need to do that.)
Observability provides critical insight into your systems and applications, so it can be pretty scary to upgrade that platform, especially when it consists of multiple fast-moving open source projects that may or may not work well together. The unfortunate reality is that most teams don’t upgrade it until it’s absolutely necessary. In the spirit of CoreOS, where upgradability was a primary feature of the distribution, we test upgrades in CI under various load thresholds. This model of testing and releasing increases confidence for users so they can keep software up to date, safely using the latest technologies as soon as possible while keeping the system reliable.
The open source observability distribution must go further to bridge the gap with—and eventually surpass—SaaS vendors. We need to raise the bar when it comes to providing an out-of-the-box user experience that competes with the likes of Datadog or Splunk.
One fundamental problem to solve is the end-to-end onboarding experience—once Grafana/Prometheus/Cortex/Loki/etc. have been deployed, what’s next? Good observability requires good collection. The scope for data collection is vast, including everything such as Kubernetes/ECS clusters, raw machines, serverless functions, containers, cloud services like RDS, and your application code itself. Anybody who has had to configure Prometheus relabel configs and Fluentd plugins to achieve their goals understands the problem. Simplifying the collection story is necessary to make the open source distribution effective. We’ll share more about this soon.
UIs for alert management are also important. Going beyond listing existing rules and showing what has fired is table stakes. We need higher-level tools to make SLOs, Error Budgets, and overall "better" alerting available to all engineers—Prometheus experts and non-experts alike.
There are many more problems to solve; as with great open source communities (like Kubernetes!), we can solve them all, better, together.
We are deeply committed to open source software. Building together with a community allows us all to create a better platform than any one team or company can build by itself. It can be far more featureful, cost-effective, and optimized by, for example, eventually running entirely on preemptible/spot instances and/or have supported releases for infrastructure like AWS Graviton or on-prem data centers. This kind of long-term focus goes beyond what single companies are motivated or have the time to do. It is what communities do best.
Companies often have a fraught relationship with open source software. The tension between the value of a community and the need for revenue is challenging to navigate at best and violates trust at worst. We feel that we can navigate it successfully by acknowledging and planning for this tension from the beginning.
So why should you trust Opstrace?
First, we are committed to building all features in the open distribution. We are not an open core company. We produce open software that is typically guarded behind a paywall. For example, as mentioned earlier, we’ve made security open and enabled by default. Our UI is being built in the open, too, which will implement paid features from other vendors.
Second, to generate revenue, we will offer a subscription plan where we will operate the platform for companies in their own cloud account with 24x7 customer support, reducing the operational burden even more by extending your existing team’s capability. We aim to be as forthcoming and transparent as possible in this regard to earn your trust. If you have any questions whatsoever, please do contact us (see below).
By borrowing the term “distribution,” we hope it will provide a familiar mental model to most engineers and help clarify what we’re doing at Opstrace. Linux distributions simplify complex project landscapes so that users can accomplish what they want. This is our mission: make open source observability available to everybody, not just those with the time and resources to build it themselves.
What other characteristics of a distribution do you think are important for an observability platform?
Seb & The Opstrace Team