Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Neil Cerutti Newsgroups: comp.lang.python Subject: Re: Re for Apache log file format Date: 8 Oct 2013 12:50:22 GMT Organization: Norwich University Lines: 43 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: individual.net osLhG04VczgMZv7R/5vnCwCFHwO5mQLhwZ01krD91JNI8EyHrs Cancel-Lock: sha1:mERCaNdVZASsM2Vz9nOYUbJ0n0w= User-Agent: slrn/0.9.9p1/mm/ao (Win32) Xref: csiph.com comp.lang.python:56392 On 2013-10-08, Sam Giraffe wrote: > > Hi, > > I am trying to split up the re pattern for Apache log file format and seem > to be having some trouble in getting Python to understand multi-line > pattern: > > #!/usr/bin/python > > import re > > #this is a single line > string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0" > 302 276 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"' > > #trying to break up the pattern match for easy to read code > pattern = re.compile(r'(?P\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+' > r'(?P\-)\s+' > r'(?P\-)\s+' > r'(?P\[(.*?)\])\s+' > r'(?P\"(.*?)\")\s+' > r'(?P\d{3})\s+' > r'(?P\d+)\s+' > r'(?P\"\")\s+' > r'(?P\((.*?)\))') I recommend using the re.VERBOSE flag when explicating an re. It'll make your life incrementally easier. pattern = re.compile( r"""(?P\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+ (?P\-)\s+ (?P\-)\s+ (?P\[(.*?)\])\s+ # You can even insert comments. (?P\"(.*?)\")\s+ (?P\d{3})\s+ (?P\d+)\s+ (?P\"\")\s+ (?P\((.*?)\))""", re.VERBOSE) -- Neil Cerutti