from Hacker News

Ask HN: How do you log application events?

by GaiusCoffee on 4/27/15, 5:11 AM with 109 comments

We are currently inserting our logs in an sql database, with timestamp, logType, userId, userAgent and description columns. It makes it trivial for us to debug any event by just querying the db. However, after three and a half years of continued use, the table is now way too large.

How do you guys log application events in such a way that extracting information from it is easy, but still keep the size of the logs manageable?

by thaumaturgy on 4/27/15, 5:29 PM
Ehm, the contrast between my answer and everyone else's here makes me feel surprisingly greybearded, but...
Application logging has been a solved problem for decades now. syslog or direct-to-disk in a reasonable format, let logrotate do the job it's faithfully done for years and let the gzipped old files get picked up by the offsite backups that you're surely running, and use the standard collection of tools for mining text files: grep, cut, tail, etc.
I'm a little weirded out that "my logs are too big" is still a thing, and that the most common answer to this is "glue even more complexity together".
by thedevopsguy on 4/27/15, 9:09 AM
Log analytics is a big topic so I'll hit the main points. The approach you take to logging depends on the analysis you want to do after the log event has been recorded. The value of the logs diminishes rapidly as the age of the events get older. Most places want to keep the logs hot for a period ranging from a day to week. After that,the logs are compressed using gzip or Google snappy compression. Even though they are in a compressed form they should still be searchable.
The most commont logging formats I've come across in production environments are:
1.log4j(java) or nlog(.NET)
2.json
3.syslog
Tools that I've used to search ,visualize and analyse log data have been:
1.Elasticsearch, Logstash and Kibana (ELK) stack
2.splunk (commercial)
3.Logscape (commercial)
Changes to the fields representing your data with the database approach is expensive because you are locked in by the schema. The database schema will never fully represent your full understanding of the data. With the tools I've mentioned above you have the option to extract ad-hoc fields at runtime.
Hope this helps.
by neilh23 on 4/27/15, 8:09 AM
Do you really need to debug events from 3 and half years ago? Full logs only really need to stick around as long as you're likely to want to debug them. Log rotation is a must (I've seen debug logs nobody reads sitting in the gigabytes ...) Past that, you can cherry pick and store metadata about the events (e.g. X hits from userAgent Y on this day) with enough information you'll need to do trend analysis, although it's generally a good idea to keep backups of old full logs in case you need to reload the logs to find out that one thing you forgot to add to your metadata ... If you do genuinely need all of the data back that far, you should look at partitioning the data so you're not indexing over millions of rows - how you do that depends how you're intending on using the data.
by MichaelGG on 4/27/15, 8:39 AM
Elasticsearch is amazing. It lives up to the hype. It's perfect for rolling over logs, and they have lots of documentation on how to make it work just right.
Just as an example of how awesome Elasticsearch is, you can trivially segment your storage tiers (say, SSD versus HDD) and then easily move older data to other storage, with a single command.
They have a log-specific handler called Logstash, and a dashboard system called Kibana (which is sorta neat but the UI seems a big laggy in my brief experience). Apparently some folks use Logstash/Elasticsearch to record millions and millions of events per day and ES does a great job.
If you want hosted, check out Stackify. I'm totally blown away with the product (no affiliation other than being a new user). You can send log info to them and they'll sort it all out, similar to Splunk, but not ridiculously priced and no dealing with terrible sales teams. But it gets better - they offer all sorts of ways to define app-specific data and metrics, so you can get KPIs and dashboards just adding a line or two of code here and there. It's a lot easier than running your own system, and it looks like it can make ops a ton easier.
Another hosted service is SumoLogic. I only used them for logging, but it seemed to work well enough.
by eloycoto on 4/27/15, 9:32 AM
Hi,
I used graphite and now I'm using influxdb, in the other hand kibana+logstash+ES.
With statsd and influxdb you can measure all the events in a database, it's pretty easy and you have statsd libraries in some languages. I measure all the events in my products, from response timings, database queries, logins, sign-ups, calls, all go to statsd.
Logs are good to debug, but if you want to measure all events in your platform, statsd+influxdb+grafana are your best friends, and your manages will be happy with that ;-)
A few weeks ago I gave a talk about this, you can see the slides here + a few examples+ deploy in docker:
http://acalustra.com/statsd-talk-at-python-vigo-meetup.html
Regards ;-)
by chupy on 4/27/15, 10:25 AM
At the place where I work we use a couple of different tools for logging events:
Logstash + graylog / elasticsearch - mostly for monitoring application error logs and easy ad hoc querying and debugging.
statsd+graphite+ nagios/pagerduty - general monitoring/alerting and performance stats
zeromq (in the process of changing now to kafka) + storm and redis for real time events analytics dashboards. We are also writing it to hdfs and running batch jobs over the data for more in depth processing.
We also have a legacy sql server in which we save events / logs which is still maintained so maybe this could help you. Just FYI we analyse more than 500 million records / day and we had to do some optimisations there:
-if the database allows then partition the table by date. -create different tables for different applications and / or different events -1 table / day which is then at the start of the new day getting merged in a different monthly table in a separate read only database. -create daily summary tables which are used for analytics -if you actually need to query all the data then use union on the monthly tables or the summary tables -I want to also say this, I know it's a given but if you have large amounts of data batch and then use bulk inserts..
I suggest you take a couple of steps back and think hard about exactly how you want to access and query the data and think what the best tool for you in the long run is.
by Someone on 4/27/15, 6:12 AM
Why do you feel the log is way too large?
If log entries take up too much disk space, switching to a different system will not help; you will have to do something with the data. You can either archive old years (export in some way, compress, put in cold storage) or throw them away, either partially or fully (do you need to keep debug logging around forever?). Using partitions can help here, as it makes it faster to drop older data (http://www.postgresql.org/docs/current/interactive/ddl-parti...)
You also may consider compressing some fields inside the database (did you normalize logType and userAgent or are they strings? Can your database compress descriptions?), but that may affect logging performance (that's a _may_. There's extra work to do, but less data to write)
If, on the other hand, indexes take up too much space or querying gets too slow, consider using a partial index (http://en.m.wikipedia.org/wiki/Partial_index). You won't be able to efficiently query older data, but if you do that only rarely, that may be sufficient.
by k1w1 on 4/27/15, 1:40 PM
Here is another solution that hasn't been mentioned yet, but has by far the best price/performance if it matches your use-case. Google BigQuery isn't advertised as being for log search, but in practice it works phenomenally well. It provides exceptionally low storage costs, combined with a powerful query language and reasonable query costs. The counter-intuitive part is that the query performance, even on tens or hundreds of gigabytes of data is amazing, and better in practice than many purpose built inverted index log search systems.
If you want to use your logs for troubleshooting (e.g. ad-hoc queries to find error messages) or ad-hoc analytics it is ideal. Hundreds of gigabytes can be searched or analyzed in 5-6 seconds per query.
Fluentd can be used to collect log data and send to BigQuery.
by vindmi on 4/27/15, 8:05 AM
ElasticSearch + Logstash + Kibana.
Custom NLog renderer which implements SysLog protocol and NLog target which pushes logs to RabbitMQ.
by webjunkie on 4/27/15, 7:42 AM
I really like Sentry (https://github.com/getsentry/sentry) for exception tracking. It's easy to set up, supports different platforms, and looks great.
by therealkay on 4/27/15, 7:39 AM
You could also take a look at Graylog (https://www.graylog.org/), it supports structured data in a variety of formats and can send alerts as well.
It's similar in spirit to elasticsearch + logstash + kibana, but more integrated.
Disclaimer: I work on it, so I'm not going say what's better, just giving another pointer.
by bra-ket on 4/27/15, 3:50 PM
1) elasticsearch +kibana: https://www.elastic.co/products/kibana
2) hbase+phoenix: http://phoenix.apache.org/
3) opentsdb: http://opentsdb.net/
by dorfsmay on 4/27/15, 3:16 PM
My experience is that:
• open source solution require a lot of work
• commercial solution get very expensive very quickly
If you can narrow down how much logs you want to keep, then the commercial solutions are amazing, but as you need (or think you need) to keep them longer and longer, they become prohibitevely expensive.
The next time I have to tackle this issue, specifically keeping the log forever, I will give the hadoop stores (HBase, Impala etc...) a try. Hadoop solutions work really well for very large set of write-once only data, which is what logs are.
by Sir_Cmpwn on 4/27/15, 7:22 AM
I run services that log to plaintext files and I use logrotate to periodically gzip and rotate them out for archival.
Just use grep to query recent logs, zgrep if you have to dig a little.
by myrryr on 4/27/15, 7:09 AM
We stage stuff out.
After a week, it goes out of cache. After a month, we no longer keep multiple copies around. After 3 months, we gather stats from it, and push it to some tar.xz files, which we store. So its out of the database.
We can still do processing runs over it, and do... but it is no longer indexed, so they take longer.
After 3 years, the files are deleted.
by fscof on 4/27/15, 3:25 PM
My company uses pretty basic logging functionality (no third party services yet), but one thing we've done that's helpful when reading logs is adding a context id to help us track down API calls as they travel through our system - I wrote up a quick blog post about it here: https://www.cbinsights.com/blog/error-logging-context-identi...
by ccleve on 4/27/15, 3:35 PM
https://logentries.com/ has worked out well for us, at least at the small scale we're using it now. Pricing is reasonable.
The important feature for us is S3 archiving. They'll keep your logs online for a certain period of time, and then copy the old ones to S3. You don't have to get rid of anything, and you're still able to keep costs under control.
by sirtopas on 4/27/15, 8:20 AM
We use elmah (https://code.google.com/p/elmah/) for logging our ASP.NET/MVC apps.
It works well for us, nice accessible UI if you need it and a solid database behind it. Also RSS/Email alerts if you need it. We've got thousands of entries in there and even on the old SQL2005 box we use, it seems to work just fine.
by nightTrevors on 4/27/15, 11:51 AM
I'm probably the only one doing it outside a bank or hedge fund, but since kdb+ opened up their 32-bit license for free, it's been amazing working with. Log files and splayed tables are stored neatly on disk so backing up to aws nightly is a breeze. It is a great solution for high tick rate logging of homogeneous data, especially when that data needs to be highly available in business applications.
by lucb1e on 4/27/15, 9:17 AM
You didn't specify your location, but in some counties like the Netherlands, it's not legal to store PI (personally identifiable) data that long. There is no reason to keep access logs for 3+ years. What are you ever going to do with that data?
Like others here said, extract what you want to keep (unique visitors per day or so) and throw the rest out after a few weeks.
by michaelmcmillan on 4/27/15, 7:07 AM
I use a logging library called Winston (https://github.com/winstonjs/winston). I have it hooked up to Pushbullet with Winston-Pushbullet (https://github.com/michaelmcmillan/winston-pushbullet) so that when an unhandled exception or error is thrown I get an instant notification on my Nexus 5 and MacBook.
Winston is a node/iojs library though, but I guess you could find something equivalent in any other stack. The Pushbullet part is really useful.
Edit: I run a pretty small site however (http://littlist.no). I don't think I would enable the Pushbullet part if I had several hundred thousand visitors per day.
by RBerenguel on 4/27/15, 7:55 AM
I just log to a file, rotating/deleting when/if needed
by mkhpalm on 4/27/15, 7:05 AM
We generally run them through central syslog servers or directly to a logstash tcp or udp input. One way or another all logs from around the world end up in an elasticsearch cluster where we either query for things manually or use kibana to interact with them. Works pretty well actually.
by buro9 on 4/27/15, 9:22 AM
> It makes it trivial for us to debug any event by just querying the db. However, after three and a half years of continued use, the table is now way too large.
Why are you keeping all of the logs? Are you doing anything with it?
Are the old logs relevant at all? If your program structure has changed, then anything logged before that point isn't even applicable.
My advice: If what you is working, but only failed because of volume of data, apply a retention policy and delete data older than some point in time.
An example: Nuke all data older than 1 month for starters, and if you find that you really don't use even that much (perhaps you only need 7 days to provide customer support and debug new releases) then be more aggressive and store less.
by youknowjack on 5/1/15, 3:01 PM
Back in 2012, we talked about our foundation for this at Indeed:
Blog: http://engineering.indeed.com/blog/2012/11/logrepo-enabling-...
Talk: http://engineering.indeed.com/talks/logrepo-enabling-data-dr...
tl; dr: a human-readable log format that uses a sortable UID and arbitrary types/fields, captured via a log4j syslog-ng adapter, and aggregated to a central server for manual access and processing
by YorickPeterse on 4/27/15, 11:08 AM
Syslog + Logentries for raw logging (e.g. "User Alice created X"). New Relic APM for performance monitoring, New Relic Insights for statistics (e.g. tracking downloads, page views, API requests, etc).
by troels on 4/27/15, 8:39 AM
What kind of log data do you mean exactly? E.g. what's the granularity?
We have web server logs going 30 days back, on disk, managed by logrotate. Then we have error logging in Sentry. For user level events, we track in Analytics, but we also have our own database-backed event logging for certain events. Currently this is in the same db as everything else, but we have deliberately factored the tables such that there are no key constraints/joins across these tables and the rest of the schema, which means it should be trivial to shard it out in its own db in time.
by KaiserPro on 4/27/15, 7:12 AM
It depends on what type of data you are logging.
For performance metrics we use graphite/stats-d This allows us to log hits/access times for many things, all without state handling code inside the app.
This allows us to get rid of a lot of logs after only a few days. As we're not doing silly things like shipping verbose logs for processing.
However in your usercase this might not be appropriate. As other people have mentioned, truncing the tables and shipping out to cold storage is a good idea if you really need three years of full resolution data.
by OhHeyItsE on 4/27/15, 4:25 PM
Well-solved via SaaS. Logentries, Loggly, Papertrail, amongst others.
by fasfawefaw on 4/27/15, 7:06 PM
> We are currently inserting our logs in an sql database, with timestamp, logType, userId, userAgent and description columns.
That's what I would do.
> However, after three and a half years of continued use, the table is now way too large.
Yeah, that's what happens...
There are many ways to handle this issue. The simplest is to start archiving your records ( i.e. dumping your old records into archival tables ).
Do you have access to a DBA or a data team? They should be able to help you out with this if you have special requirements.
by halayli on 4/27/15, 7:48 AM
I am biased, but you should look into a logging system like splunk. You shouldn't be using an RDBMS for your logs. Your logs don't have a schema.
With splunk, you just output your logs in this format:
<timestamp> key1=value key2=value key3=value
install splunk agent on your machines, and splunk takes care of everything from there. You can search, filter, graph, create alerts etc...
Splunk indexer allows you to age your logs, and keeps the newer ones in hot buckets for fast access.
by edsiper2 on 4/27/15, 2:28 PM
Using a fast, scalable and flexible tool called Fluentd:
http://www.fluentd.org
Here is a good presentation of Fluentd about it design and general capabilities:
https://www.youtube.com/watch?v=sIVGsQgMHIo
note: it's good to mention that Fluentd have more than 300 plugins to interact with different sources and outputs.
by znq on 4/27/15, 2:16 PM
Specifically for mobile logging and remote debugging you might wanna check out Bugfender's remote logger: http://bugfender.com/
Disclosure: I'm on of the co-founders. We've a couple of other related tools in the pipeline, but the BF remote logger was the first we built, mostly to solve our own need at Mobile Jazz.
by imperialWicket on 4/27/15, 12:59 PM
ELK and others have been mentioned and are great tools, but if you want a more simple solution within the Sql realm postgresql with table partitions works well for that particular problem.
I agree with many comments that this isn't ideal, but setting up weekly/monthly partitions might buy you plenty of time to think through and implement an alternative solution.
by jmickey on 4/27/15, 9:41 AM
Surprised no-one has mentioned Papertrail yet - https://papertrailapp.com/
We use them for all our apps and have not seen any issues so far. It can be a bit tricky to set up, but once the logging works, it's hassle free from then on. Pricing is also very affordable.
by thejosh on 4/27/15, 8:09 AM
Rollbar has been pretty fantastic for us.
Also NewRelic if you want to spend the money (or get it throuhg Amazon/Rackspace for free)
by abhimskywalker on 4/27/15, 11:41 AM
Elasticsearch, Logstash and Kibana (ELK) stack.
This is very convenient for decently complex querying and analysis at great speeds.
by perbu on 4/27/15, 7:20 AM
We push data into shared memory. Then we have clients that can read the memory and present it. This makes it possible to log millions of lines per second with a very limited cost.
This has the benefit of making logging more or less asynchronous. You still need to handle the logs coming out of this, of course.
by gtrubetskoy on 4/27/15, 3:15 PM
If you're only looking to debug with the data, then something like Splunk ($$$) or Elasticsearch should work. However, if this is for some kind of an Analyticial/Data Scienc-y use, then you'd be better off with a format like Avro and keeping it in Hadoop/Hive.
by brandonjlutz on 4/27/15, 8:46 PM
One of my java projects I use logback with a mongodb appender. This allows me to structure the logs for easy querying plus I have access to all stacktraces from all servers in one spot.
If you go this route, use a capped collection. I generally don't care about my old logs anyway.
by true_religion on 4/27/15, 3:10 PM
I log to redis and scrape the logs to SQL for long-term storage. Memory is fairly cheap now adays so it works out for my app.
If I had a lot of logging to do though, I'd use elasticsearch since that's what I run for my main DB. It handles sharding beautifully.
by xyby on 4/27/15, 8:19 AM
I track them via the analytics event tracking API. It is really useful and full of surprises when you look at the stats:
https://news.ycombinator.com/item?id=9444862
by matrix on 4/27/15, 3:03 PM
Piggy-backing on this topic: does anyone successfully use Amazon S3 as the log store for application event logging? The low cost is attractive, but at first glance it seems like the latency is too high for it work well.
by afshinmeh on 4/27/15, 7:28 AM
It depends on the priority and importance of events.
For instance, we use a log file for HTTP access logs but I store all of errors and warnings from MongoDB. However, I clean the log storages every month.
We use NodeJS and MongoDB in www.floatalk.com
by jakozaur on 4/27/15, 5:51 PM
Sumo Logic (https://www.sumologic.com/)
Works in cloud. Easy to setup and very scalable.
Free tier: 500 MB/day, 7 day retention
Disclosure: I work there.
by buf on 4/27/15, 8:17 PM
When I'm hacking something together, I log things in... Slack.
As it grows into a seemingly useable feature, I might move it to GA or Mixpanel.
When it gets to be large and stable, then it goes into syslog
by polskibus on 4/27/15, 6:55 AM
Could you expand on how do you use the log data ? How often do you query it, what time periods do you query, have you considered building a data warehouse for your analytics?
by nargella on 4/27/15, 12:10 PM
Zabbix to monitor hardware, logstash/elasticsearch (kibana for UI) to monitor service logs, Sentry for application level logs
At least this is what we're moving to at work.
by lmm on 4/27/15, 8:54 AM
Exactly that, but rotating after three years. If it's three years ago it probably doesn't matter any more.
by TheSandyWalsh on 4/27/15, 12:11 PM
http://www.stacktach.com/
by blooberr on 4/27/15, 3:22 PM
Fluentd + Elasticsearch.
Very easy to setup.
by ninjakeyboard on 4/27/15, 12:16 PM
We're using ELK stack - it's pretty nice.
by enedil on 4/27/15, 1:19 PM
Text file + grep + awk
by jtfairbank on 4/27/15, 5:21 AM
Checkout segment.io
by ratheeshkr on 5/4/15, 11:04 AM
Test
by SFjulie1 on 4/27/15, 12:35 PM
Okay, You never log logs in DB in the first place.
You never fill table with non capped/infinitly growing records (capped = collections with an upper limit).
You use at best rotating collections (like circular buffer ring). But anyway, if you have success the log flow should always grow more than your number of customers (coupling) thus, it grows more than linearly. So the upper limit will grow too.
Tools have software complexity in retrieving, inserting and deleting. There is not a tool that can be log(n) for all cases and be ACID.
The big data fraud is about letting business handling growing set of datas that are inducing diminishing returns in OPEX.
In software theory the more data, the more resource you need that is a growing function of size of your data. Size that grows more than your customers, and linearly other time.
The more customers you have, the longer you keep them, the more they cost you. It is in terms of business stupid.
Storing ALL your logs is like being an living being that refuses to poo. It is not healthy.
Solutions lies in sampling or reducing datas after an amount of time and scheme like round robin databases.