Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Piet van Oostrum Newsgroups: comp.lang.python Subject: Re: Re for Apache log file format Date: Wed, 09 Oct 2013 13:33:14 -0400 Lines: 62 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: individual.net fvb19yS+629IbjoGvBBECgZozwhEvLjCPDLSC8ihy+bwF1HyyZ Cancel-Lock: sha1:/mLrEouIvl8hy6e1m2C7kI5nqOk= sha1:5t+k0+arsmABIohVrhL1X4/unXc= User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (darwin) Xref: csiph.com comp.lang.python:56506 Sam Giraffe writes: > Hi, > > I am trying to split up the re pattern for Apache log file format and seem to be having some > trouble in getting Python to understand multi-line pattern: > > #!/usr/bin/python > > import re > > #this is a single line > string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0" 302 276 "-" "check_http/ > v1.4.16 (nagios-plugins 1.4.16)"' > > #trying to break up the pattern match for easy to read code > pattern = re.compile(r'(?P\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+' >                      r'(?P\-)\s+' >                      r'(?P\-)\s+' >                      r'(?P\[(.*?)\])\s+' >                      r'(?P\"(.*?)\")\s+' >                      r'(?P\d{3})\s+' >                      r'(?P\d+)\s+' >                      r'(?P\"\")\s+' >                      r'(?P\((.*?)\))') > > match = re.search(pattern, string) > > if match: >     print match.group('ip') > else: >     print 'not found' > > The python interpreter is skipping to the 'math = re.search' and then the 'if' statement right > after it looks at the , instead of moving onto and so on. Although you have written the regexp as a sequence of lines, in reality it is a single string, and therefore pdb will do only a single step, and not go into its "parts", which really are not parts. > > mybox:~ user$ python -m pdb /Users/user/Documents/Python/apache.py >> /Users/user/Documents/Python/apache.py(3)() > -> import re > (Pdb) n >> /Users/user/Documents/Python/apache.py(5)() > -> string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0" 302 276 "-" > "check_http/v1.4.16 (nagios-plugins 1.4.16)"' > (Pdb) n >> /Users/user/Documents/Python/apache.py(7)() > -> pattern = re.compile(r'(?P\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+' > (Pdb) n >> /Users/user/Documents/Python/apache.py(17)() > -> match = re.search(pattern, string) > (Pdb) Also as Andreas has noted the r'(?P\"\")\s+' part is wrong. It should probably be r'(?P\".*?\")\s+' And the r'(?P\((.*?)\))') will also not match as there is text outside the (). Should probably also be r'(?P\".*?\")') or something like it. -- Piet van Oostrum WWW: http://pietvanoostrum.com/ PGP key: [8DAE142BE17999C4]