by _5csa on 5/6/15, 6:15 AM with 102 comments
by onion2k on 5/6/15, 7:51 AM
I don't agree with the second assertion there. Text logs are only opaque as far as the format is concerned, but not so much as far as the content goes. Using the example in the article;
127.0.0.1 - - [04/May/2015:16:02:53 +0200] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0"
You can read a lot of information without knowing the format, the application that generated it, or even which file it was in - you know it's something to do with localhost, you know when it happened, you know the protocol, from which you can infer the "304" means Not Modified, and you know it came from a Mozilla agent. That's a lot more information than you could get from a binary log without any tools.That isn't necessarily an argument against binary logging, but the notion that text log files are opaque in the same way as binary logs isn't really true.
by ghshephard on 5/6/15, 7:41 AM
On the flip side, if the system is huge - then we can use tools like splunk.
grep/tail/awk are the first three tools I use on any system - if you create logs that I can't manipulate with those three tools, then you haven't created logs for your system that I can use.
by moonshinefe on 5/6/15, 7:34 AM
I'm also not getting why he just doesn't use scripts to parse the logs and insert them into a database at that point. Why use some ad-hoc logging binary format if you're doing complex queries that SQL would be better suited for anyway, on proven db systems?
Maybe I'm missing something.
by dsr_ on 5/6/15, 10:39 AM
Many organizations have a fully functional, well-debugged logging infrastructure. The basic design happened years ago, was implemented years ago, and was expected to be useful basically forever. Growth was planned for. Ongoing expenses expected to be small.
That's what happens when you build reliable systems on technologies that are as well understood as bricks and mortar. You get multiple independent implementations which are generally interoperable. You get robustness. And you get cost-efficiency, because any changes you decide to make can be incremental.
Where are the rsyslogd and syslog-ng competitors to systemd's journald? Where is the interoperability? Where is the smooth, useful upgrade mechanism?
Short term solutions are generally non-optimal in the long term. Using AWS, Google Compute and other instant-service cloud mechanisms trades money, security and control for speed of deployment. An efficient mature company may well wish to trade in the opposite direction: reducing operating costs by planning, understanding growth and making investments instead of paying rent.
Forcing a major incompatible change in basic infrastructure rather than offering it as an option to people who want to take advantage of it is an anti-pattern.
by blueskin_ on 5/6/15, 7:40 AM
* you need to use a new proprietary tool to interact with them
* all scripts relating to logs are now broken
* binary logs are easy to corrupt, e.g. if they didn't get closed properly.
>You can have a binary index and text logs too! / You can. But what's the point?
The point is having human-readable logs without having to use a proprietary piece of crap to read them. A binary index would actually be a perfect solution - if you're worried about the extra space readable logs take, just .gz/.bz2 them; on decent hardware, the performance penalty for reading is almost nonexistent.
If you generate 100GB/day, you should be feeding them into logstash and using elasticsearch to go through them (or use splunk if $money > $sense), not keeping them as files. Grepping logs can't do all the stuff the author wants anyway, but existing tools can, that are compatible with rsyslog, meaning there is no need for the monstrosity that is systemd.
by datenwolf on 5/6/15, 8:38 AM
> Embedded systems don't have the resources!
> ...
> I'd still use a binary log storage, because
> I find that more efficient to write and parse,
> but the indexing part is useless in this case.
This is yet again a case of a programmer completely misjudging how an actual implementation will perform in the real world.When I wrote the logging system for this thing http://optores.com/index.php/products/1-1310nm-mhz-fdml-lase... I first fell for the very same misjudgement: "This is running on a small, embedded processor: Binary will probably be much more efficient and simpler."
So I actually did first implement a binary logging system. Not only logging, but also the code to retrieve and display the logs via the front panel user interface. And the performance was absolutely terrible. Also the code to manage the binary structure in the round robin staging area, working in concert with the storage dump became an absolute mess; mind you the whole thing is thread safe, so this also means that logging can cause inter thread synchronization on a device that puts hard realtime demands on some threads.
Eventually I came to the conclusion to go back and try a simple, text only log dumper with some text pattern matching for the log retrieval. Result: The text based logging system code is only about 35% of the binary logging code and it's about 10 times faster because it doesn't spend all these CPU cycles structuring the binary. And even that text pattern matching is faster than walking the binary structure.
Like so often... premature optimization.
by alephnil on 5/6/15, 7:36 AM
by leni536 on 5/6/15, 7:42 AM
People like text logs because local corruptions remain local. Some lines could be gibberish, but that's all. I'm not suggesting that this couldn't be done with binary logs, but you have to carefully design your binary logging format to keep this property.
Otherwise I agree with the author that we shouldn't be afraid of binary formats in general, we need much more general formats and tools though (grep, less equivalents).
I'm not fond of "human readable" tree formats like XML or JSON either. bencode could be equally "human readable" as an utf-8 text if one has a less equivalent for bencode.
by tatterdemalion on 5/6/15, 7:42 AM
by bigbugbag on 5/6/15, 9:29 AM
Reading this was a waste of my time.
Being a universal open format text is a better format than binary, unless you don't care about being able to read your data in the future. There's already enough issue with filesystems and storage media, no need to add more complexity to the issue.
by halayli on 5/6/15, 7:52 AM
On the other hand, if you have logs, you need to store them in a centralized place and have an aging policy, etc... Grepping is definitely not the answer. Systems like Splunk exist for a reason.
by agjmills on 5/6/15, 8:49 AM
It took a while to get developers to use it, but now it's indispensable - particularly when someone asks me 'what happened to the 1000 emails I sent last month'
I now know, as previously, the data would have been logrotated
by jeady on 5/6/15, 8:06 AM
Also, if the author has a 5-node cluster producing 100Gbs of logs a day, the logs may also be too verbose or poorly organized. I work on a system that produces 100s of Gbs of logs a day but with proper organization they're perfectly manageable.
I think that a more nuanced solution is to log things that are useful to manual examination in text form, but high-frequency events that are not particularly useful could reasonably be logged elsewhere (e.g. a database or binary log that is asynchronously fed into a database).
In conclusion, as is frequently the case with engineering, I think the author oversimplifies the problem here and tries to present a one-size-fits-all solution instead of taking a more pragmatic solution. Textual logs are useful when meant for human consumption (debugging) and when they can be organized such that the logs of interest at any time are limited in size, and some other binary-based format is useful for aggregate higher-level analysis.
by henrik_w on 5/6/15, 8:05 AM
This obviously only works when you are trouble shooting a specific issue, not when you need to investigate something that happened in the past (where the logging for the session wasn't enabled). However, it has proven to be an excellent tool for troubleshooting issues in the system.
I have used session-based logging both when I worked at Ericsson (the AXE system), and at Symsoft (the Nobill system), and both were excellent. However, I get a feeling that they are not in widespread use (may be wrong on that though), so that's why I wrote a description of them: http://henrikwarne.com/2014/01/21/session-based-logging/
by hxn on 5/6/15, 8:06 AM
Grep them, tail them, copy and paste, search, transform them, look at them in less, open them in any editor. I love two write little bash oneliners that answer questions about logs. I can use these onliners everywhere anytime.
I dont have any of the efficiency problems the author talks about.
by AceJohnny2 on 5/6/15, 7:35 AM
by webhat on 5/6/15, 11:31 AM
At best it's a NUL separated database structure where the fields are not compressed, which IS greppable just use \x00 in your regexp. At worst he might mean BER, which is an ASN.1 data encoding structure.
by pdkl95 on 5/6/15, 1:13 PM
A traditional log with a parallel index would be completely backwards compatible, the query tool should work the same way, and you could even treat the index file as a rebuildable cache which can be useful. The interface presented by a specialized tool doesn't have to depend on any specific storage method.
Really, this recent fad of trying to remove old formats in the believe the old format was somehow preventing any new format from working in parallel reminds me of JWZ's recommendations[1] on mbox "summary files" over the complexity of an actual database. Sometimes you can get the features you want without sacrificing performance or compatibility.
by regularfry on 5/6/15, 8:38 AM
The alternative is to leave everything unstructured, and understand the formats minimally and lazily. Laziness is a virtue, right?
by zimbatm on 5/6/15, 9:51 AM
by erikb on 5/6/15, 10:22 AM
So even if binary logging is way better (I can't say, not enough experience) you simply can't beat text logging, because text logging is natural. It just happens.
print("Hello World!")
by babuskov on 5/6/15, 8:45 AM
Store important data in the database so that you can query it efficiently.
Keep logs for random searches when something unexpected happens. I log gigabytes per day, but only grep maybe once-twice a year.
by 616c on 5/6/15, 8:14 AM
I was thinking this would be a cool area of research for me to try programming again, but it seems so daunting I am not sure where to start.
by michipili on 5/10/15, 2:07 PM
http://unix-workstation.blogspot.de/2015/05/of-course-greppi...