# A Case for Simple Monitoring

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1666370434515/ldXpWUfWF.png align="left")

**Observability** is a very attractive market for operators and investors ([Datanami](https://www.datanami.com/2021/03/04/whos-winning-in-the-17b-aiops-and-observability-market/) estimates a $17B market 💰) and it's expanding at an extremely fast rate. 

As organisations of all sizes shift from _in-house hosting of monolithic applications_ to more modular architectures on _some sort of cloud_, they become more and more reliant on complex infrastructures to serve their _“always on”_ applications. 

These applications and the infrastructures hosting them run, in a countless variety of configurations, on a very delicate balance of parts. To serve these apps properly, very complex systems need to work like a clock. The slightest hiccup of a container, a function or even a silly S3 bucket misfiring and suddenly latency becomes sub-par… an alarm goes off, someone is not going to be happy 😡. 

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1666367702953/CXhHLKw_I.png align="left")

**Monitoring** and his _hotter brother_ Observability allow trained specialists to maintain these systems stable and under control. Thanks to the combined power of **logs, events and metrics**, SREs, DevOps and Troubleshooters can _investigate, resolve and prevent issues_. The lucky ones even have powerful AI tirelessly working to ensure proper provisioning for the sake of the user experience. 
This is all well and good, we are lucky to have found a way to tackle such a complex issue but, in my opinion, the solutions offered by observability providers are actually _becoming part of the problem_. 

> [Some say that to be able to store and analyse logs and metrics you will have to spend more than it takes to actually run the application.](https://twitter.com/ElanHasson/status/1512408672147673090?s=20&t=4P_8fE09epabriqsE3AlfQ)   

Surely you could reduce the retention period, aggregate data, lower the resolution or use a more efficient storage, but these solutions are not always an option and to me they sound more like a compromise. 

Anyway it is not only a _cost problem_, and yes, **observability is expensive**, data is growing like crazy and you need it in real time, so you got to pay for the tools. What really bothers me is the way these solutions are structured and presented. 

Everyone is positioning themselves as the _“undisputed leader”_ of the space _(ah, those analyst reports must cost a fortune)_ and everyone offers the ultimate end-to-end solution, everyone is full-stack, everyone has that special angle - _how boring..._ 🥱

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1666368260267/At6Lgy6wQ.png align="left")

The reality is that Observability is practically a commodity at this point, do you know how many providers offer to store and query logs and metrics?<br>
I don’t know… _there's too many to count!_ 🧮

In the 2010s I worked with the author of the [syslog-ng](https://en.wikipedia.org/wiki/Syslog-ng) (hell of a 🧠 Balász) and I thought I had seen it all in relation to logs - _I was wrong_.

The point is, even if they provide a valuable service, observability suppliers are promoting complexity to differentiate themselves not only to bring value. For sure they help us all chasing incidents and fixing misconfigurations, but they also tend to lock-in their customers by “unifying” all data in shiny (black-box) data stores, by imposing proprietary formats and by making it hard, _if not impossible_, to “port” data and schemas somewhere else. 

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1666368630854/udeJ8RTj0.png align="left")

I don't even want to start talking about those _“light”_ agents and their proprietary ingestion pipelines 🥴. _Do you know what happens to your data once ingested? How many times is it replicated? Where does it end up? If and how is it protected?_ What if you want _(or legally need)_ to remove a single log from the index?_

You better don’t ask these questions, unless you want to spend the next 5 hours retrieving a single log file… using command line _(sorry no UI for this)_ from the datastore’s index of your $100K/year minimum, full-stack leader 🎖️

![image.png](https://cdn.hashnode.com/res/hashnode/image/upload/v1666369675685/HheoLr3Bi.png align="left")

Instead of adding layers of useful complexity _(AI, automation, contextualisation etc)_ why don’t we try to produce solutions that **give customers options and control** in terms of what resources to use and how? 

A former boss of mine used to tell customers at the end of the project delivery meeting “*observability isn’t really complicated until you have achieved it*”. Basically to control complexity you need complex controls. 

Well, I disagree, I see a lot of ways we can work to dial down the complexity and the costs of observability while retaining the benefits. For one you can adopt simpler solutions, they exist. **Nope**, you won’t find them in fancy quadrants… _but it doesn’t mean they are garbage_. 

Simpler does not mean dumb, when simple works _we call that sophistication_ 😉.

Not every solution requires you to install new, dedicated agents, maybe you can use the ones already in place. Less time spent installing and maintaining an element that although necessary is a potential data-loss/leakage liability.<br>

_Simpler data capture? Good_. 

There are solutions that use data stores that are actually open and accessible, like databases… actually our datastore is a DB 🙃, we use [Clickhouse](https://clickhouse.com/) and it’s awesome!<br>

_Now I can query AND actually interact with the datastore? Great._

### K.I.S.S.

Simpler for us means _open, flexible, efficient and affordable_ and we believe it to be the right approach, we also believe in performance and you would be surprised how robust and performant simple can be. 

_Let me know what you think!_ 





