from Hacker News

Grepping logs is terrible

by _5csa on 5/6/15, 6:15 AM with 102 comments

by onion2k on 5/6/15, 7:51 AM
Binary logs are opaque! Just as much as text logs.
I don't agree with the second assertion there. Text logs are only opaque as far as the format is concerned, but not so much as far as the content goes. Using the example in the article;
```
    127.0.0.1 - - [04/May/2015:16:02:53 +0200] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0"
```
You can read a lot of information without knowing the format, the application that generated it, or even which file it was in - you know it's something to do with localhost, you know when it happened, you know the protocol, from which you can infer the "304" means Not Modified, and you know it came from a Mozilla agent. That's a lot more information than you could get from a binary log without any tools.
That isn't necessarily an argument against binary logging, but the notion that text log files are opaque in the same way as binary logs isn't really true.
by ghshephard on 5/6/15, 7:41 AM
If your logs aren't text, and it's a small system, I'm not going to look at them. Therefore they don't exist. That's one reason why people don't like binary logs - they are effectively useless.
On the flip side, if the system is huge - then we can use tools like splunk.
grep/tail/awk are the first three tools I use on any system - if you create logs that I can't manipulate with those three tools, then you haven't created logs for your system that I can use.
by moonshinefe on 5/6/15, 7:34 AM
Yes, grepping logs is terrible if "you have 100Gb of logs a day". I'm not sure why the author is thinking his use case is anything near the norm or why he's shocked in most use cases people prefer text files.
I'm also not getting why he just doesn't use scripts to parse the logs and insert them into a database at that point. Why use some ad-hoc logging binary format if you're doing complex queries that SQL would be better suited for anyway, on proven db systems?
Maybe I'm missing something.
by dsr_ on 5/6/15, 10:39 AM
Change for the sake of change is anti-engineering. It is anti-productive. Your changes must be improvements, and they must not cost more than they save or generate in a reasonable period of time.
Many organizations have a fully functional, well-debugged logging infrastructure. The basic design happened years ago, was implemented years ago, and was expected to be useful basically forever. Growth was planned for. Ongoing expenses expected to be small.
That's what happens when you build reliable systems on technologies that are as well understood as bricks and mortar. You get multiple independent implementations which are generally interoperable. You get robustness. And you get cost-efficiency, because any changes you decide to make can be incremental.
Where are the rsyslogd and syslog-ng competitors to systemd's journald? Where is the interoperability? Where is the smooth, useful upgrade mechanism?
Short term solutions are generally non-optimal in the long term. Using AWS, Google Compute and other instant-service cloud mechanisms trades money, security and control for speed of deployment. An efficient mature company may well wish to trade in the opposite direction: reducing operating costs by planning, understanding growth and making investments instead of paying rent.
Forcing a major incompatible change in basic infrastructure rather than offering it as an option to people who want to take advantage of it is an anti-pattern.
by blueskin_ on 5/6/15, 7:40 AM
People don't want it because it's binary, not because you can't grep it.
* you need to use a new proprietary tool to interact with them
* all scripts relating to logs are now broken
* binary logs are easy to corrupt, e.g. if they didn't get closed properly.
>You can have a binary index and text logs too! / You can. But what's the point?
The point is having human-readable logs without having to use a proprietary piece of crap to read them. A binary index would actually be a perfect solution - if you're worried about the extra space readable logs take, just .gz/.bz2 them; on decent hardware, the performance penalty for reading is almost nonexistent.
If you generate 100GB/day, you should be feeding them into logstash and using elasticsearch to go through them (or use splunk if $money > $sense), not keeping them as files. Grepping logs can't do all the stuff the author wants anyway, but existing tools can, that are compatible with rsyslog, meaning there is no need for the monstrosity that is systemd.
by datenwolf on 5/6/15, 8:38 AM
```
    > Embedded systems don't have the resources!
    > ...
    > I'd still use a binary log storage, because
    > I find that more efficient to write and parse,
    > but the indexing part is useless in this case.
```
This is yet again a case of a programmer completely misjudging how an actual implementation will perform in the real world.
When I wrote the logging system for this thing http://optores.com/index.php/products/1-1310nm-mhz-fdml-lase... I first fell for the very same misjudgement: "This is running on a small, embedded processor: Binary will probably be much more efficient and simpler."
So I actually did first implement a binary logging system. Not only logging, but also the code to retrieve and display the logs via the front panel user interface. And the performance was absolutely terrible. Also the code to manage the binary structure in the round robin staging area, working in concert with the storage dump became an absolute mess; mind you the whole thing is thread safe, so this also means that logging can cause inter thread synchronization on a device that puts hard realtime demands on some threads.
Eventually I came to the conclusion to go back and try a simple, text only log dumper with some text pattern matching for the log retrieval. Result: The text based logging system code is only about 35% of the binary logging code and it's about 10 times faster because it doesn't spend all these CPU cycles structuring the binary. And even that text pattern matching is faster than walking the binary structure.
Like so often... premature optimization.
by alephnil on 5/6/15, 7:36 AM
I guess that much of the resistance against the binary logs of systemd is the unfamiliarity and to some extent lack of well known tools for dealing with them. Sysadmins that have years of experience with traditional Unix tools now suddenly have to start almost from scratch when it comes to everyday tools for examining the system. Not only that, programmers are also most familiar with text based formats, and libraries for handling these formats have to become more available in the most popular programming languages and become familiar for programmers that develop tools for analysing systems. Until that happens, sysadmins feel that they are set back by the introduction of binary logs, even if binary logs are technically superior.
by leni536 on 5/6/15, 7:42 AM
I don't have experience with binary logs. I think the fragility of binary logs is not baseless though. AFAIK there was (is?) a problem in systemd's journal where a local corruption of the log could cause a global unavailability of the logged data.
People like text logs because local corruptions remain local. Some lines could be gibberish, but that's all. I'm not suggesting that this couldn't be done with binary logs, but you have to carefully design your binary logging format to keep this property.
Otherwise I agree with the author that we shouldn't be afraid of binary formats in general, we need much more general formats and tools though (grep, less equivalents).
I'm not fond of "human readable" tree formats like XML or JSON either. bencode could be equally "human readable" as an utf-8 text if one has a less equivalent for bencode.
by tatterdemalion on 5/6/15, 7:42 AM
This applies more generally than just to logs. I love Unix, but "everything is text" is not actually great. It's better that Unix utils output arbitrary ASCII than that they output arbitrary binary data, but it's obvious why people don't do serious IPC 'the Unix way.' Imagine if instead of exchanging JSON, or ProtoBufs, or whatever, your programs all exchanged text you had to regex into some sort of adhoc structure. So why do we manage our logs and our pipelines that way? There's no actual reason that the terminal couldn't interpret structured data into text for us so that, in the world of intercommunicating processes on the other side of the TTY, everything is well-structured, semantically comprehensible data.
by bigbugbag on 5/6/15, 9:29 AM
The title is misleading, I was expecting to discover a better way of dealing with logs in the general case. Instead I got served an attempt of the author to generalize its way as if his quite specific use case could apply to the outside world.
Reading this was a waste of my time.
Being a universal open format text is a better format than binary, unless you don't care about being able to read your data in the future. There's already enough issue with filesystems and storage media, no need to add more complexity to the issue.
by halayli on 5/6/15, 7:52 AM
logs should be in text. The last thing you want is to find out that your binary format cannot be decoded due to a bug in the logging or because file got corrupted. Not to mention that you won't be able to integrate with a lot of log systems like Splunk and friends.
On the other hand, if you have logs, you need to store them in a centralized place and have an aging policy, etc... Grepping is definitely not the answer. Systems like Splunk exist for a reason.
by agjmills on 5/6/15, 8:49 AM
The greatest thing that I've found recently was fluentd and elasticsearch - we have fluentd on all of our nodes that aggregate logs to a central fluentd search which dumps all of the data into elastic search, then we use kibana as a graphical frontend to elasticsearch
It took a while to get developers to use it, but now it's indispensable - particularly when someone asks me 'what happened to the 1000 emails I sent last month'
I now know, as previously, the data would have been logrotated
by jeady on 5/6/15, 8:06 AM
I think the author is conflating several problems here. There are several ways logs can be used, and efficiency is a scale. For example, if I receive a bug report, I like to be able to locate the textual logs from when the incident occurred and actually just sit and read what was happening at the time. On the other hand, if I'm doing higher-level analysis such as what features do users use most, clearly it's more efficient to have some sort of structure format because you're interested in the logs in aggregate. The author makes it sound like they're advocating optimizing for the aggregate use case at the expense of other use cases. I think that the declaration that textual logs are terrible is an oversimplification of the considerations in play.
Also, if the author has a 5-node cluster producing 100Gbs of logs a day, the logs may also be too verbose or poorly organized. I work on a system that produces 100s of Gbs of logs a day but with proper organization they're perfectly manageable.
I think that a more nuanced solution is to log things that are useful to manual examination in text form, but high-frequency events that are not particularly useful could reasonably be logged elsewhere (e.g. a database or binary log that is asynchronously fed into a database).
In conclusion, as is frequently the case with engineering, I think the author oversimplifies the problem here and tries to present a one-size-fits-all solution instead of taking a more pragmatic solution. Textual logs are useful when meant for human consumption (debugging) and when they can be organized such that the logs of interest at any time are limited in size, and some other binary-based format is useful for aggregate higher-level analysis.
by henrik_w on 5/6/15, 8:05 AM
One solution to the problem of too much logging data can be what I call "session-based logging" (also known as tracing). You can enable logging on a single session (e.g. a phone call), and for that call you get a lot of logging data, much more than a typical logging system.
This obviously only works when you are trouble shooting a specific issue, not when you need to investigate something that happened in the past (where the logging for the session wasn't enabled). However, it has proven to be an excellent tool for troubleshooting issues in the system.
I have used session-based logging both when I worked at Ericsson (the AXE system), and at Symsoft (the Nobill system), and both were excellent. However, I get a feeling that they are not in widespread use (may be wrong on that though), so that's why I wrote a description of them: http://henrikwarne.com/2014/01/21/session-based-logging/
by hxn on 5/6/15, 8:06 AM
Text logs let me do all the things I want to do.
Grep them, tail them, copy and paste, search, transform them, look at them in less, open them in any editor. I love two write little bash oneliners that answer questions about logs. I can use these onliners everywhere anytime.
I dont have any of the efficiency problems the author talks about.
by AceJohnny2 on 5/6/15, 7:35 AM
The author's use of logs is sophisticated and proactive. Sadly, most Linux installations I've dealt with are lazy and reactive, where logs are kept around "just in case" for future forensics (hah!).
by webhat on 5/6/15, 11:31 AM
I think binary logging is the wrong word to use. As far as I can tell it's not binary he means, but database logging. Storing things in a database sounds far less scary than binary.
At best it's a NUL separated database structure where the fields are not compressed, which IS greppable just use \x00 in your regexp. At worst he might mean BER, which is an ASN.1 data encoding structure.
http://en.wikipedia.org/wiki/X.690#BER_encoding
by pdkl95 on 5/6/15, 1:13 PM
So some people want a log format that is more structured than plain text lines. That is going to require some sort of specialized tool. So if a dependency is allowable (instead of leaving the log in a format that is already readable by ~everything), why can't the specialized tool generate an efficient index?
A traditional log with a parallel index would be completely backwards compatible, the query tool should work the same way, and you could even treat the index file as a rebuildable cache which can be useful. The interface presented by a specialized tool doesn't have to depend on any specific storage method.
Really, this recent fad of trying to remove old formats in the believe the old format was somehow preventing any new format from working in parallel reminds me of JWZ's recommendations[1] on mbox "summary files" over the complexity of an actual database. Sometimes you can get the features you want without sacrificing performance or compatibility.
[1] http://www.jwz.org/doc/mailsum.html
by regularfry on 5/6/15, 8:38 AM
This is all well and good if you want to, and can, spend time up front figuring out how to parse each and every log line format which might appear in syslog so you can drop it in your structured store.
The alternative is to leave everything unstructured, and understand the formats minimally and lazily. Laziness is a virtue, right?
by zimbatm on 5/6/15, 9:51 AM
What binary logging solution is the author using if he's not using the systemd journal ?
by erikb on 5/6/15, 10:22 AM
Look at a first year computer science student. He will already put prints in his programs and if he is smart and has a bigger assignment he might already start to write other programs to parse that output. You can't beat that, because it is nearly impossible for a newbie to even know that there might be a problem with text logging and that binary logging might be a solution. In fact he might not even know that what he does is called logging. But he is already doing it!
So even if binary logging is way better (I can't say, not enough experience) you simply can't beat text logging, because text logging is natural. It just happens.
print("Hello World!")
by babuskov on 5/6/15, 8:45 AM
If you need to grep logs on regular basis, you're doing it wrong.
Store important data in the database so that you can query it efficiently.
Keep logs for random searches when something unexpected happens. I log gigabytes per day, but only grep maybe once-twice a year.
by 616c on 5/6/15, 8:14 AM
On a slightly unrelated note, as a largely amateur Linux user: have people made systems that instead of grepping for info, use machine learning do detect normal patterns of a log file (like what type of events, similar, at different intervals) and report the anomalous output via email or report to an admin?
I was thinking this would be a cool area of research for me to try programming again, but it seems so daunting I am not sure where to start.
by michipili on 5/10/15, 2:07 PM
Of course grepping log is terrible! Grep is a generic tool, why shouldn't it be defeated by specialised tools?
http://unix-workstation.blogspot.de/2015/05/of-course-greppi...