by ath0 on 9/30/22, 10:08 AM with 103 comments
by twunde on 9/30/22, 1:48 PM
Github project for CLP: https://github.com/y-scope/clp
The interesting part about the article isn't that structured data is easier to compress and store, its that there's a relatively new way to efficiently transform unstructured logs to structured data. For those shipping unstructured logs to an observability backend this could be a way to save significant money
by hobs on 9/30/22, 12:51 PM
by SkeuomorphicBee on 9/30/22, 1:47 PM
Apparently the Uber site noticed I'm not in the USA and automatically redirects to a localized version, which doesn't exist. If their web-development capabilities are any indication I'll skip their development tips.
by taftster on 9/30/22, 5:15 PM
This is a very complicated and sophisticated architecture that leverages the JVM to the hilt. The "big data" architecture that Java and the JVM ecosystem present is really something to be admired, and it can definitely move big data.
I know that competition to this architecture must exist in other frameworks or platforms. But what exactly would replace the HDFS, Spark, Yarn configuration described by the article? Are there equivalents of this stack in other non-JVM deployments, or to other big data projects, like Storm, Hive, Flink, Cassandra?
And granted, Hadoop is somewhat "old" at this point. But I think it (and Google's original map-reduce paper) significantly moved the needle in terms of architecture. Hadoop's Map-Reduce might be dated, but HDFS is still being used very successfully in big data centers. Has the cloud and/or Kubernetes completely replaced the described style of architecture at this point?
Honest questions above, interested in other thoughts.
by otikik on 9/30/22, 4:07 PM
I clicked, found out, and was dissapointed that this wasn't about wood.
Maybe I should start that woodworking career change already.
by bcjordan on 9/30/22, 12:51 PM
I'm curious, are there any managed services / simple to use setups to take advantage of something like this for massive log storage and search? (Most hosted log aggregators I've looked at charge by the raw text GB processed)
by shrubble on 9/30/22, 5:44 PM
Compressing logs has been a thing since the mid-1990s.
Minimizing writes to disk, or setting up a way to coalesce the writes, has also been around for as long as we have had disk drives. If you don't have enough RAM on your system to buffer the writes so that more of the writes get turned into sequential writes, your disk performance will suffer - this too has been known since the 1990s.
by prionassembly on 9/30/22, 1:12 PM
by kazinator on 9/30/22, 3:12 PM
by qualudeheart on 9/30/22, 11:42 PM
by tomgs on 9/30/22, 3:43 PM
There is another way to tackle the problem for most normal, back-end applications: Dynamic Logging[0].
Instead of adding a large of amount of logs during development (and then having to deal with compressing and transforming them later) one can instead choose to only add the logs required at runtime.
This is a workflow shift, and as such should be handled with care. But for the majority of logs used for troubleshooting, it's actually a saner approach: Don't make a priori assumptions about what you might need in production, then try and "massage" the right parts out of it when the problem rears its head.
Instead, when facing an issue, add logs where and when you need them to almost "surgically" only get the bits you want. This way, logging cost reduction happens naturally - because you're never writing many of the logs to begin with.
Note: we're not talking about removing logs needed for compliance, forensics or other regulatory reasons here, of course. We're talking about those logs that are used by developers to better understand what's going on inside the application: the "print this variable" or "show this user's state" or "show me which path the execution took" type logs, the ones you look at once and then forget about (while their costs piles on and on).
We call this workflow "Dynamic Logging", and have a fully-featured version of the product available for use at the website with up to 3 live instances.
On a personal - albeit obviously biased - note, I was an SRE before I joined the company, and saw an early demo of the product. I remember uttering a very verbal f-word during the demonstration, and thinking that I want me one of these nice little IDE thingies this company makes. It's a different way to think about logging - I'll give you that - but it makes a world of sense to me.
by jbergens on 9/30/22, 2:33 PM
Sounds interesting, now I want to read up on CLP. Not that we have much log texts to worry about.
by xani_ on 9/30/22, 3:25 PM
by dathinab on 9/30/22, 11:58 PM
Says the person who at work just added structured logging to our new product.