Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.albasani.net!news.stack.nl!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'skip:[ 20': 0.04; 'interpreter': 0.05; 'subject:file': 0.07; 'string': 0.09; 'received:209.85.219': 0.09; 'python': 0.11; 'apache': 0.15; '"-"': 0.16; '#this': 0.16; '(pdb)': 0.16; '->': 0.16; 'match:': 0.16; 'pdb': 0.16; 'skipping': 0.16; 'string)': 0.16; 'subject:Apache': 0.16; 'subject:format': 0.16; 'subject:log': 0.16; 'trying': 0.19; 'split': 0.19; 'import': 0.22; 'print': 0.22; 'skip:\xa0 20': 0.24; 'looks': 0.24; '>': 0.26; 'statement': 0.30; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'getting': 0.31; "skip:' 10": 0.31; 'skip:r 60': 0.31; 'file': 0.32; 'skip:# 10': 0.33; 'trouble': 0.34; 'received:209.85': 0.35; 'received:google.com': 0.35; 'hi,': 0.36; 'received:209': 0.37; 'skip:& 10': 0.38; 'thank': 0.38; 'to:addr :python-list': 0.38; 'skip:& 20': 0.39; 'moving': 0.39; '\xa0\xa0\xa0': 0.39; 'to:addr:python.org': 0.39; 'read': 0.60; 'easy': 0.60; 'break': 0.61; 'you.': 0.62; 'skip:r 40': 0.68; 'skip:r 30': 0.69; '8bit%:100': 0.72; 'skip:/ 30': 0.84 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=U/ndK3ys7FhvEi2xcOfTIJshU3TNkNNIwqB/ybMU5Po=; b=b0xMsEOSBI1VnRTO412i90wwBbxMmOjT6wXHTp9Tu0IjzLSIc9cw5Q2t5cRl+TXh40 x59SGgmD2l1ow/50zh+U+c66mmz8YFojNWvZ4yxxNbW1lqPRjsrT6QzcMaCywIrBi64E w65WkB0G7cPvqqE7h47rjX/xq1oGP2qYrw7g8tHxBHkV/Y+jIKK0UEN5n+VmymEPJPLZ SKF3JobYX1gMPqYZ8Oo7zEPo4noixZxv27RcWNHm9I+19Jihid/aNXnmYyfgeRmAPEfo 1zFiktpeL0BTbDmbsTi8IEsF6q9dqvXrRjWI3rQw0upvn9+XtQ2s6A22PnDeez7713ZQ yT7g== X-Gm-Message-State: ALoCoQnaqNpIps7aqydPQB4XhaWT5bkfEXUDbX/gTvdlt4s+bcHPZO4GlEgMutjJp5bTv1JtxS06 MIME-Version: 1.0 X-Received: by 10.182.148.69 with SMTP id tq5mr38890obb.97.1381214011449; Mon, 07 Oct 2013 23:33:31 -0700 (PDT) X-Originating-IP: [98.234.114.38] Date: Mon, 7 Oct 2013 23:33:31 -0700 Subject: Re for Apache log file format From: Sam Giraffe To: python-list@python.org Content-Type: multipart/alternative; boundary=089e013a0ad082ef5704e834f21f X-Mailman-Approved-At: Tue, 08 Oct 2013 09:06:17 +0200 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 103 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1381215979 news.xs4all.nl 15950 [2001:888:2000:d::a6]:51486 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:56356 --089e013a0ad082ef5704e834f21f Content-Type: text/plain; charset=ISO-8859-1 Hi, I am trying to split up the re pattern for Apache log file format and seem to be having some trouble in getting Python to understand multi-line pattern: #!/usr/bin/python import re #this is a single line string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0" 302 276 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"' #trying to break up the pattern match for easy to read code pattern = re.compile(r'(?P\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+' r'(?P\-)\s+' r'(?P\-)\s+' r'(?P\[(.*?)\])\s+' r'(?P\"(.*?)\")\s+' r'(?P\d{3})\s+' r'(?P\d+)\s+' r'(?P\"\")\s+' r'(?P\((.*?)\))') match = re.search(pattern, string) if match: print match.group('ip') else: print 'not found' The python interpreter is skipping to the 'math = re.search' and then the 'if' statement right after it looks at the , instead of moving onto and so on. mybox:~ user$ python -m pdb /Users/user/Documents/Python/apache.py > /Users/user/Documents/Python/apache.py(3)() -> import re (Pdb) n > /Users/user/Documents/Python/apache.py(5)() -> string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0" 302 276 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"' (Pdb) n > /Users/user/Documents/Python/apache.py(7)() -> pattern = re.compile(r'(?P\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+' (Pdb) n > /Users/user/Documents/Python/apache.py(17)() -> match = re.search(pattern, string) (Pdb) Thank you. --089e013a0ad082ef5704e834f21f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi,

I am trying to split up th= e re pattern for Apache log file format and seem to be having some trouble = in getting Python to understand multi-line pattern:

#!/usr/bin/pytho= n

import re

#this is a single line
string= =3D '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1= .0" 302 276 "-" "check_http/v1.4.16 (nagios-plugins 1.4= .16)"'

#trying to break up the pattern match for easy to read code<= br>
pattern =3D re.compile(r'(?P<ip>\d{1,3}\.\d{1,3}\.\= d{1,3}\.\d{1,3})\s+'
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0 r'(?P<ident>\-)\s+'
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 r'(?P<u= sername>\-)\s+'
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0 r'(?P<TZ>\[(.*?)\])\s+'
=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 r'(?P<url>\"(.*?)\&q= uot;)\s+'
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 r'(?P<httpcode>\d{3})\s+'
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 r'(?P<s= ize>\d+)\s+'
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 r'(?P<referrer>\"\")\s+'
=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 r'(?P<agent>\((.= *?)\))')

match =3D re.search(pattern, string)

if match:
=A0=A0=A0 print match.group('ip')
else:
=A0= =A0=A0 print 'not found'

The python interpreter is ski= pping to the 'math =3D re.search' and then the 'if' stateme= nt right after it looks at the <ip>, instead of moving onto <ident= > and so on.

mybox:~ user$ python -m pdb /Users/user/Documents/Python/apache.py
&= gt; /Users/user/Documents/Python/apache.py(3)<module>()
-> impo= rt re
(Pdb) n
> /Users/user/Documents/Python/apache.py(5)<modul= e>()
-> string =3D '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "= GET / HTTP/1.0" 302 276 "-" "check_http/v1.4.16 (nagios= -plugins 1.4.16)"'
(Pdb) n
> /Users/user/Documents/Python= /apache.py(7)<module>()
-> pattern =3D re.compile(r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.= \d{1,3})\s+'
(Pdb) n
> /Users/user/Documents/Python/apache.py(= 17)<module>()
-> match =3D re.search(pattern, string)
(Pdb)<= br>
Thank you.

--089e013a0ad082ef5704e834f21f--