LogSense Blog

See everything, even before it happens.

How to Improve Observability with Metrics, Traces & Logs

Jul 29, 2019 3:23:36 PM |     Przemek Maciołek, PhD

It takes less effort for development teams to optimize and fix problems in systems that have a high level of observability. What we mean by "observability" is the availability of information in a way that lets developers and analysts easily identify problems and their causes for faster troubleshooting.

A system that has only a disorganized collection of log files may theoretically have all the information needed to fix a problem, but it's not very useful without some manual work. It's hard to locate the source of an error or poor performance. Improving the observability makes it possible to locate problems, as well as to determine exactly what caused them and make the necessary fixes.

Three pillars of observability

Observability consists of three pillars - metrics, traces, and logs. Drawing conclusions from any one of these pillars alone is difficult. Observability means bringing the information from all three together in a coordinated way toward finding bugs and bottlenecks. For example:

  • Metrics are measures of the system's performance. Examples include response time, transactions per second, memory usage, and uptime. The information they provide is general but important. Any investigation needs to start by identifying an issue at the macro level. Is performance slower than it should be? Are users unable to log in at times? Metrics let the software team determine the existence and severity of problems and set priorities.
  • Tracing can take many forms depending on the tools used. Enabling a trace may or may not require modifying the code. It may cover only the most significant events or may record them all. Unlike logging, tracing is systematic and requires little effort by developers. A ton of information can be generated, but enabling it selectively (e.g., per source file) can make it more manageable and useful.
  • Logs contain the most detailed information, but logs are the hardest to manage. Most of their content comes from logging statements in the code. Developers usually write human-readable messages, such as "no match found for key" so logs consist predominantly of unstructured data. The volume of log data can be vast, especially when many instances of a service are running concurrently. Unstructured data is one of the biggest challenges.

The challenges to observability

Cloud environments commonly run thousands of instances. They appear and disappear as the workload demands. This means the metrics are complex and the amount of trace and log data is vast. There is a lot of redundancy, and problems that appear only occasionally represent a tiny fraction of the available data. It's the proverbial needle in a haystack.

Logs are the hardest part to deal with for many. The amount of log data is overwhelming. Identifying the parts associated with a specific event can be difficult. As the amount of unstructured data grows, the difficulty grows even faster.

Analysts can extract information with searches using regular expressions or specialized search tools, but this is tedious work. LogSense turns unstructured logs into useful information with automated pattern discovery. It makes the low-observability log data into structured information, which points more directly at the source of a problem.

To further eliminate irrelevant data, logs need to be associated with traces. A typical testing or production environment has many logs, but most of them are irrelevant to a particular analysis. LogSense supports matching logs with traces, thus eliminating a large amount of data which isn't useful for the issue at hand.

The value of observability to a development team

The analysis of a problem starts with identifying it at a high level. The next steps are to localize it in the code and then to find out exactly why it's happening. For example, a metric might reveal that a certain type of query is consistently slow. A trace will show the code path which is running inefficiently, and perhaps even the exact function where there's a long delay. The hardest part is determining exactly why there's a bottleneck in that function and what changes are necessary.

Sometimes just reading the code will give the necessary insight, but often more information is needed. The log files contain the answer, but not in a way that is easy to find. Using a solution like LogSense benefits developers tremendously: first, our automated pattern discovery capabilities will put the information into a more useful form, and machine learning will help to pinpoint the source of the inefficiency.

When the system offers high observability to developers and analysts, they can find and fix performance problems and bugs more quickly. The code works better, and users complain less. Overall, life is better.

High observability means having not just a mountain of data, but rather high-quality information on any ongoing issues. LogSense provides intelligent processing of the vast amounts of diagnostic data, making it more usable -- and thus, making life better for our developer, analyst, and DevOps customers. As software deployments scale up to even higher usage volumes, it is increasingly important for software maintenance.

To learn more about how LogSense boosts overall observability, let's talk. We'd love to give you a demo or set up a free trial so you can see for yourself.


New call-to-action



Topics: Observability

Want more of the LogSense Blog? You got it.
Subscribe to our newsletter.