Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!weretis.net!feeder1.news.weretis.net!news.solani.org!.POSTED!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: Convert AWK regex to Python Followup-To: comp.lang.python Date: Mon, 16 May 2011 13:36:01 +0200 Organization: None Lines: 65 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: solani.org 1305545691 7444 eJwFwYEBACAEBMCVCO8bB9X+I3QXBsWkI+Dx4mVD5J7RLJDoXYFFDTdy6Owqou4+q9rF8AEQXxCr (16 May 2011 11:34:51 GMT) X-Complaints-To: abuse@news.solani.org NNTP-Posting-Date: Mon, 16 May 2011 11:34:51 +0000 (UTC) X-User-ID: eJwFwQkBwDAIA0BLo7yVA6HxL2F3riGBtPAwp/PpcLuv7raYBs68setnvJiJmmByigH5xLFFPNvclCNQ/IadFrc= Cancel-Lock: sha1:ahBo/ACHFWjViobmC4sWcIml4bA= X-NNTP-Posting-Host: eJwNy9sBADEEBMCWLh6Lcgj6L+HyP6OMg2sChejqtm/VGaKhDluUrzl7fJdwF4lSOnlrWh77OrMi2NIn7RXtJvahIGHIeP0IlBrw Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:5513 J wrote: > Hello Peter, Angelico, > > Ok lets see, My aim is to filter out several fields from a log file and > write them to a new log file. The current log file, as I mentioned > previously, has thousands of lines like this:- 2011-05-16 09:46:22,361 > [Thread-4847133] PDU D CC_SMS_SERVICE_51408_656-ServerThread- VASPSessionThread-7ee35fb0-7e87-11e0-a2da-00238bce423b-TRX > - 2011-05-16 09:46:22 - OUT - (submit_resp: (pdu: L: 53 ID: 80000004 > Status: 0 SN: 25866) 98053090-7f90-11e0-a2da-00238bce423b (opt: ) ) > > > All the lines in the log file are similar and they all have the same > length (same amount of fields). Most of the fields are separated by > spaces except for couple of them which I am processing with AWK (removing > " evaluate each line in the log file and break them down into fields which I > can call individually and write them to a new log file (for example > selecting only fields 1, 2 and 3). > > I hope this is clearer now Not much :( It doesn't really matter whether there are 100, 1000, or a million lines in the file; the important information is the structure of the file. You may be able to get away with a quick and dirty script consisting of just a few regular expressions, e. g. import re filename = ... def get_service(line): return re.compile(r"[(](\w+)").search(line).group(1) def get_command(line): return re.compile(r"