by GaiusCoffee on 4/27/15, 5:11 AM with 109 comments
How do you guys log application events in such a way that extracting information from it is easy, but still keep the size of the logs manageable?
by thaumaturgy on 4/27/15, 5:29 PM
Application logging has been a solved problem for decades now. syslog or direct-to-disk in a reasonable format, let logrotate do the job it's faithfully done for years and let the gzipped old files get picked up by the offsite backups that you're surely running, and use the standard collection of tools for mining text files: grep, cut, tail, etc.
I'm a little weirded out that "my logs are too big" is still a thing, and that the most common answer to this is "glue even more complexity together".
by thedevopsguy on 4/27/15, 9:09 AM
The most commont logging formats I've come across in production environments are:
1.log4j(java) or nlog(.NET)
2.json
3.syslog
Tools that I've used to search ,visualize and analyse log data have been:
1.Elasticsearch, Logstash and Kibana (ELK) stack
2.splunk (commercial)
3.Logscape (commercial)
Changes to the fields representing your data with the database approach is expensive because you are locked in by the schema. The database schema will never fully represent your full understanding of the data. With the tools I've mentioned above you have the option to extract ad-hoc fields at runtime.
Hope this helps.
by neilh23 on 4/27/15, 8:09 AM
by MichaelGG on 4/27/15, 8:39 AM
Just as an example of how awesome Elasticsearch is, you can trivially segment your storage tiers (say, SSD versus HDD) and then easily move older data to other storage, with a single command.
They have a log-specific handler called Logstash, and a dashboard system called Kibana (which is sorta neat but the UI seems a big laggy in my brief experience). Apparently some folks use Logstash/Elasticsearch to record millions and millions of events per day and ES does a great job.
If you want hosted, check out Stackify. I'm totally blown away with the product (no affiliation other than being a new user). You can send log info to them and they'll sort it all out, similar to Splunk, but not ridiculously priced and no dealing with terrible sales teams. But it gets better - they offer all sorts of ways to define app-specific data and metrics, so you can get KPIs and dashboards just adding a line or two of code here and there. It's a lot easier than running your own system, and it looks like it can make ops a ton easier.
Another hosted service is SumoLogic. I only used them for logging, but it seemed to work well enough.
by eloycoto on 4/27/15, 9:32 AM
I used graphite and now I'm using influxdb, in the other hand kibana+logstash+ES.
With statsd and influxdb you can measure all the events in a database, it's pretty easy and you have statsd libraries in some languages. I measure all the events in my products, from response timings, database queries, logins, sign-ups, calls, all go to statsd.
Logs are good to debug, but if you want to measure all events in your platform, statsd+influxdb+grafana are your best friends, and your manages will be happy with that ;-)
A few weeks ago I gave a talk about this, you can see the slides here + a few examples+ deploy in docker:
http://acalustra.com/statsd-talk-at-python-vigo-meetup.html
Regards ;-)
by chupy on 4/27/15, 10:25 AM
Logstash + graylog / elasticsearch - mostly for monitoring application error logs and easy ad hoc querying and debugging.
statsd+graphite+ nagios/pagerduty - general monitoring/alerting and performance stats
zeromq (in the process of changing now to kafka) + storm and redis for real time events analytics dashboards. We are also writing it to hdfs and running batch jobs over the data for more in depth processing.
We also have a legacy sql server in which we save events / logs which is still maintained so maybe this could help you. Just FYI we analyse more than 500 million records / day and we had to do some optimisations there:
-if the database allows then partition the table by date. -create different tables for different applications and / or different events -1 table / day which is then at the start of the new day getting merged in a different monthly table in a separate read only database. -create daily summary tables which are used for analytics -if you actually need to query all the data then use union on the monthly tables or the summary tables -I want to also say this, I know it's a given but if you have large amounts of data batch and then use bulk inserts..
I suggest you take a couple of steps back and think hard about exactly how you want to access and query the data and think what the best tool for you in the long run is.
by Someone on 4/27/15, 6:12 AM
If log entries take up too much disk space, switching to a different system will not help; you will have to do something with the data. You can either archive old years (export in some way, compress, put in cold storage) or throw them away, either partially or fully (do you need to keep debug logging around forever?). Using partitions can help here, as it makes it faster to drop older data (http://www.postgresql.org/docs/current/interactive/ddl-parti...)
You also may consider compressing some fields inside the database (did you normalize logType and userAgent or are they strings? Can your database compress descriptions?), but that may affect logging performance (that's a _may_. There's extra work to do, but less data to write)
If, on the other hand, indexes take up too much space or querying gets too slow, consider using a partial index (http://en.m.wikipedia.org/wiki/Partial_index). You won't be able to efficiently query older data, but if you do that only rarely, that may be sufficient.
by k1w1 on 4/27/15, 1:40 PM
If you want to use your logs for troubleshooting (e.g. ad-hoc queries to find error messages) or ad-hoc analytics it is ideal. Hundreds of gigabytes can be searched or analyzed in 5-6 seconds per query.
Fluentd can be used to collect log data and send to BigQuery.
by vindmi on 4/27/15, 8:05 AM
Custom NLog renderer which implements SysLog protocol and NLog target which pushes logs to RabbitMQ.
by webjunkie on 4/27/15, 7:42 AM
by therealkay on 4/27/15, 7:39 AM
It's similar in spirit to elasticsearch + logstash + kibana, but more integrated.
Disclaimer: I work on it, so I'm not going say what's better, just giving another pointer.
by bra-ket on 4/27/15, 3:50 PM
2) hbase+phoenix: http://phoenix.apache.org/
3) opentsdb: http://opentsdb.net/
by dorfsmay on 4/27/15, 3:16 PM
• open source solution require a lot of work
• commercial solution get very expensive very quickly
If you can narrow down how much logs you want to keep, then the commercial solutions are amazing, but as you need (or think you need) to keep them longer and longer, they become prohibitevely expensive.
The next time I have to tackle this issue, specifically keeping the log forever, I will give the hadoop stores (HBase, Impala etc...) a try. Hadoop solutions work really well for very large set of write-once only data, which is what logs are.
by Sir_Cmpwn on 4/27/15, 7:22 AM
Just use grep to query recent logs, zgrep if you have to dig a little.
by myrryr on 4/27/15, 7:09 AM
After a week, it goes out of cache. After a month, we no longer keep multiple copies around. After 3 months, we gather stats from it, and push it to some tar.xz files, which we store. So its out of the database.
We can still do processing runs over it, and do... but it is no longer indexed, so they take longer.
After 3 years, the files are deleted.
by fscof on 4/27/15, 3:25 PM
by ccleve on 4/27/15, 3:35 PM
The important feature for us is S3 archiving. They'll keep your logs online for a certain period of time, and then copy the old ones to S3. You don't have to get rid of anything, and you're still able to keep costs under control.
by sirtopas on 4/27/15, 8:20 AM
It works well for us, nice accessible UI if you need it and a solid database behind it. Also RSS/Email alerts if you need it. We've got thousands of entries in there and even on the old SQL2005 box we use, it seems to work just fine.
by nightTrevors on 4/27/15, 11:51 AM
by lucb1e on 4/27/15, 9:17 AM
Like others here said, extract what you want to keep (unique visitors per day or so) and throw the rest out after a few weeks.
by michaelmcmillan on 4/27/15, 7:07 AM
Winston is a node/iojs library though, but I guess you could find something equivalent in any other stack. The Pushbullet part is really useful.
Edit: I run a pretty small site however (http://littlist.no). I don't think I would enable the Pushbullet part if I had several hundred thousand visitors per day.
by RBerenguel on 4/27/15, 7:55 AM
by mkhpalm on 4/27/15, 7:05 AM
by buro9 on 4/27/15, 9:22 AM
Why are you keeping all of the logs? Are you doing anything with it?
Are the old logs relevant at all? If your program structure has changed, then anything logged before that point isn't even applicable.
My advice: If what you is working, but only failed because of volume of data, apply a retention policy and delete data older than some point in time.
An example: Nuke all data older than 1 month for starters, and if you find that you really don't use even that much (perhaps you only need 7 days to provide customer support and debug new releases) then be more aggressive and store less.
by youknowjack on 5/1/15, 3:01 PM
Blog: http://engineering.indeed.com/blog/2012/11/logrepo-enabling-...
Talk: http://engineering.indeed.com/talks/logrepo-enabling-data-dr...
tl; dr: a human-readable log format that uses a sortable UID and arbitrary types/fields, captured via a log4j syslog-ng adapter, and aggregated to a central server for manual access and processing
by YorickPeterse on 4/27/15, 11:08 AM
by troels on 4/27/15, 8:39 AM
We have web server logs going 30 days back, on disk, managed by logrotate. Then we have error logging in Sentry. For user level events, we track in Analytics, but we also have our own database-backed event logging for certain events. Currently this is in the same db as everything else, but we have deliberately factored the tables such that there are no key constraints/joins across these tables and the rest of the schema, which means it should be trivial to shard it out in its own db in time.
by KaiserPro on 4/27/15, 7:12 AM
For performance metrics we use graphite/stats-d This allows us to log hits/access times for many things, all without state handling code inside the app.
This allows us to get rid of a lot of logs after only a few days. As we're not doing silly things like shipping verbose logs for processing.
However in your usercase this might not be appropriate. As other people have mentioned, truncing the tables and shipping out to cold storage is a good idea if you really need three years of full resolution data.
by OhHeyItsE on 4/27/15, 4:25 PM
by fasfawefaw on 4/27/15, 7:06 PM
That's what I would do.
> However, after three and a half years of continued use, the table is now way too large.
Yeah, that's what happens...
There are many ways to handle this issue. The simplest is to start archiving your records ( i.e. dumping your old records into archival tables ).
Do you have access to a DBA or a data team? They should be able to help you out with this if you have special requirements.
by halayli on 4/27/15, 7:48 AM
With splunk, you just output your logs in this format:
<timestamp> key1=value key2=value key3=value
install splunk agent on your machines, and splunk takes care of everything from there. You can search, filter, graph, create alerts etc...
Splunk indexer allows you to age your logs, and keeps the newer ones in hot buckets for fast access.
by edsiper2 on 4/27/15, 2:28 PM
Here is a good presentation of Fluentd about it design and general capabilities:
https://www.youtube.com/watch?v=sIVGsQgMHIo
note: it's good to mention that Fluentd have more than 300 plugins to interact with different sources and outputs.
by znq on 4/27/15, 2:16 PM
Disclosure: I'm on of the co-founders. We've a couple of other related tools in the pipeline, but the BF remote logger was the first we built, mostly to solve our own need at Mobile Jazz.
by imperialWicket on 4/27/15, 12:59 PM
I agree with many comments that this isn't ideal, but setting up weekly/monthly partitions might buy you plenty of time to think through and implement an alternative solution.
by jmickey on 4/27/15, 9:41 AM
We use them for all our apps and have not seen any issues so far. It can be a bit tricky to set up, but once the logging works, it's hassle free from then on. Pricing is also very affordable.
by thejosh on 4/27/15, 8:09 AM
Also NewRelic if you want to spend the money (or get it throuhg Amazon/Rackspace for free)
by abhimskywalker on 4/27/15, 11:41 AM
This is very convenient for decently complex querying and analysis at great speeds.
by perbu on 4/27/15, 7:20 AM
This has the benefit of making logging more or less asynchronous. You still need to handle the logs coming out of this, of course.
by gtrubetskoy on 4/27/15, 3:15 PM
by brandonjlutz on 4/27/15, 8:46 PM
If you go this route, use a capped collection. I generally don't care about my old logs anyway.
by true_religion on 4/27/15, 3:10 PM
If I had a lot of logging to do though, I'd use elasticsearch since that's what I run for my main DB. It handles sharding beautifully.
by xyby on 4/27/15, 8:19 AM
by matrix on 4/27/15, 3:03 PM
by afshinmeh on 4/27/15, 7:28 AM
For instance, we use a log file for HTTP access logs but I store all of errors and warnings from MongoDB. However, I clean the log storages every month.
We use NodeJS and MongoDB in www.floatalk.com
by jakozaur on 4/27/15, 5:51 PM
Works in cloud. Easy to setup and very scalable.
Free tier: 500 MB/day, 7 day retention
Disclosure: I work there.
by buf on 4/27/15, 8:17 PM
As it grows into a seemingly useable feature, I might move it to GA or Mixpanel.
When it gets to be large and stable, then it goes into syslog
by polskibus on 4/27/15, 6:55 AM
by nargella on 4/27/15, 12:10 PM
At least this is what we're moving to at work.
by lmm on 4/27/15, 8:54 AM
by TheSandyWalsh on 4/27/15, 12:11 PM
by blooberr on 4/27/15, 3:22 PM
Very easy to setup.
by ninjakeyboard on 4/27/15, 12:16 PM
by enedil on 4/27/15, 1:19 PM
by jtfairbank on 4/27/15, 5:21 AM
by ratheeshkr on 5/4/15, 11:04 AM
by SFjulie1 on 4/27/15, 12:35 PM
You never fill table with non capped/infinitly growing records (capped = collections with an upper limit).
You use at best rotating collections (like circular buffer ring). But anyway, if you have success the log flow should always grow more than your number of customers (coupling) thus, it grows more than linearly. So the upper limit will grow too.
Tools have software complexity in retrieving, inserting and deleting. There is not a tool that can be log(n) for all cases and be ACID.
The big data fraud is about letting business handling growing set of datas that are inducing diminishing returns in OPEX.
In software theory the more data, the more resource you need that is a growing function of size of your data. Size that grows more than your customers, and linearly other time.
The more customers you have, the longer you keep them, the more they cost you. It is in terms of business stupid.
Storing ALL your logs is like being an living being that refuses to poo. It is not healthy.
Solutions lies in sampling or reducing datas after an amount of time and scheme like round robin databases.