Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!.POSTED!not-for-mail From: Lew Newsgroups: comp.lang.java.programmer Subject: Re: analysis of java application logs Date: Mon, 23 May 2011 09:11:22 -0400 Organization: albasani.net Lines: 41 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.albasani.net 9a8UAOu+hPclBBx0zTfhS7aP4aEG2gor6r8ljR2ZH6emevPZLFzkzNakT+CnSPCcdOjVrd1WwSWA2TLsXMn1kXUXjxSk40mXtSvOPlIVUo7Ou+2oGIHNhrMOjLNdmAFA NNTP-Posting-Date: Mon, 23 May 2011 13:11:09 +0000 (UTC) Injection-Info: news.albasani.net; logging-data="b87Tqic+XOqmXWH9f7brHcMFYcKM30GiAM78Witz/Xxtx91Oc0IhT1vqFf/hQbC/GI3rTu9raJDizZV9kRYLqoRCEh3OosFmjlqysW8P0PUI2o625tea8bZHJIzOkLXU"; mail-complaints-to="abuse@albasani.net" User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 In-Reply-To: Cancel-Lock: sha1:TPUtkdWoQvk1WA0OdrwlKppe/To= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:4447 Ulrich Scholz wrote: > I'm looking for an approach to the problem of analyzing application > log files. > > I need to analyse Java log files from applications (i.e., not logs of > web servers). These logs contain Java exceptions, thread dumps, and > free-form log4j messages issued by log statements inserted by > programmers during development. Right now, these man-made log entries > do not have any specific format. > > What I'm looking for is a tool and/or strategy that supports in lexing/ > parsing, tagging, and analysing the log entries. Because there is only > little defined syntax and grammar - and because you might not know > what you are looking for - the task requires the quick issuing of > queries against the log data base. Some sort of visualization would be > nice, too. > > Pointers to existing tools and approaches as well as appropriate tools/ > algorithms to develop the required system would be welcome. It helps if you have a logging strategy that mandates a consistent logging format, specific information in particular positions or marked by particular markup, logging levels and other such so that your analysis tool isn't faced with a completely open-ended input. What you describe requires a general text-analysis approach, as you indicate that you can make no guarantees about the format. Based on that, your best tool is "less" or equivalent text-file reader. What is a tool supposed to do, read your mind? It's really hard to extract information from a garbage can where people just randomly dumped whatever they individually felt like dumping without regard for operational needs. You can't build a skyscraper on a bad foundation, and you can't build a good log analysis off a crappy log. Fix the logging system, then the analysis problem will be tractable. -- Lew Honi soit qui mal y pense. http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg