TLDR; The author argues that free-form logging is quite useless/expensive to use. They also argue that structured logging is less effective than tracing b/c of mainly the difficulty of inferring timelines and causality.
I find the arguments very plausible.
In fact I very rarely use logs produced by several services b/c most of the times they just confuse me. The only time that I heavily use logs is troubleshooting a single service and looking at its stdout (or kubectl log
.)
However I have very little experience w/ tracing (I've used it in my hobby projects but, obviously, they never represent the reality of complex distributed systems.)
Have you got real world experience w/ tracing in larger systems? Care to share your take on the topic?
Some thoughts from my side (coming from another domain - more embedded):
In my opinion log levels sure make sense, but it may vary wildly depending on what you're doing. We run our software in different environments:
And it's run by different sets of people:
Depending on the combination of where / who you get different requirements.
I get that Logging is hard and often you get messages with a wrong log level or you're missing a message at a crucial point etc. But tracing is not better in every way - they should complement each other.
Thanks for sharing your insights.
Thinking out loud here...
In my experience with traditional logging and distributed systems, timestamps and request IDs do store the information required to partially reconstruct a timeline:
For example, consider the following relatively simple diagram.
Reconstructing the causality and join/fork relations between the executions nodes is almost impossible using traditional logs whereas a tracing solution will turn this into a nice visual w/ all the spans and sub-spans.
That said, logs do shine when things go wrong; when you start your investigation by using a stacktrace in the logs as a clue. That (stacktrace) is something that I'm not sure a tracing solution will be able to provide.
Yes! You nailed it 💯
Logs are indispensable for troubleshooting (and potentially nothing else) while tracers are great for, well, tracing the data/request throughout the system and analyse the mutations.