Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!1.eu.feeder.erje.net!lightspeed.eweka.nl!lightspeed.eweka.nl!newsfeed.xs4all.nl!newsfeed4a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.023 X-Spam-Evidence: '*H*': 0.95; '*S*': 0.00; 'subject:Python': 0.06; 'from:addr:mrabarnett.plus.com': 0.16; 'from:addr:python': 0.16; 'from:name:mrab': 0.16; 'line)': 0.16; 'message- id:@mrabarnett.plus.com': 0.16; 'received:192.168.1.4': 0.16; 'wrote:': 0.18; 'split': 0.19; 'header:User-Agent:1': 0.23; 'van': 0.27; 'header:In-Reply-To:1': 0.27; 'easier': 0.31; 'regular': 0.32; 'subject:from': 0.34; 'list': 0.37; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'to:addr:python.org': 0.39; 'pick': 0.64; 'want:': 0.84 X-CM-Score: 0.00 X-CNFS-Analysis: v=2.1 cv=DZWZq5dW c=1 sm=1 tr=0 a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=0Bzu9jTXAAAA:8 a=SY9aFQKHIdAA:10 a=IkcTkHD0fZMA:10 a=EBOSESyhAAAA:8 a=J6cfrdm9wWVM9dhmxMIA:9 a=QEXdDO2ut3YA:10 X-AUTH: mrabarnett@:2500 Date: Wed, 29 Apr 2015 22:28:08 +0100 From: MRAB User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Python re to extract useful information from each line References: <220dafbc-25f0-48a7-b37a-c8a77a6f2ffa@googlegroups.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 14 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1430342891 news.xs4all.nl 2884 [2001:888:2000:d::a6]:48791 X-Complaints-To: abuse@xs4all.nl X-Received-Bytes: 3277 X-Received-Body-CRC: 3227909996 Xref: csiph.com comp.lang.python:89579 On 2015-04-29 22:22, Emile van Sebille wrote: > On 4/29/2015 1:49 PM, Kashif Rana wrote: >> pol_elements = re.compile('id\s(?P.+?)(?:\sname\s(?P.+?))?\sfrom\s(?P.+?)\sto\s(?P.+?)\s{2}(?P[^\s]+?)\s(?P[^\s]+?)\s(?P[^\s]+?)(?:(?P\snat)\s(?P[^\s]+?)(?P\sdip-id\s[^\s]+?)?)?\s(?P[^\s]+?)(?:\sschedule\s(?P[^\s]+?))?(?P\slog)?$' >> ) > > > ... and that's why we avoid regular expressions... it makes my head hurt > just looking at that line noise. > It might just be easier to split it into a list of fields and then pick out the ones you want: fields = re.findall(r'"[^"]+"|\S+', line)