Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; '8bit%:68': 0.09; 'closed.': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:80.91.229.12': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:lo.gmane.org': 0.09; 'skip:$ 40': 0.09; 'skip:\\ 10': 0.09; 'subject:parsing': 0.09; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'perl.': 0.16; 'this:': 0.16; 'perl': 0.19; 'header:In-Reply-To:1': 0.22; 'code': 0.24; 'string': 0.26; 'skip:( 40': 0.28; "wasn't": 0.28; "didn't": 0.29; '(even': 0.29; 'stefan': 0.29; 'pattern': 0.30; 'skip:( 60': 0.30; "skip:' 10": 0.32; 'closing': 0.33; 'usually': 0.33; 'to:addr :python-list': 0.34; 'header:X-Complaints-To:1': 0.34; 'header :User-Agent:1': 0.34; 'there': 0.34; 'but': 0.37; 'received:org': 0.38; 'subject:: ': 0.38; 'header:Mime-Version:1': 0.39; 'to:addr:python.org': 0.39; 'solid': 0.40; 'received:188': 0.69; 'skip:\xe2 10': 0.71; 'subjectcharset:utf-8': 0.72; 'skip:\xc2 10': 0.74; 'done!': 0.84; 'subject:little': 0.91; '8bit%:27': 0.93 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Stefan Behnel Subject: Re: a little parsing challenge =?UTF-8?B?4pi6?= Date: Mon, 18 Jul 2011 09:24:21 +0200 References: <36037253-086b-4467-a1db-9492d3772e78@r5g2000prf.googlegroups.com> <2396442.Iyju66GlRV@PointedEars.de> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Gmane-NNTP-Posting-Host: ppp-188-174-84-188.dynamic.mnet-online.de User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 55 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1310973876 news.xs4all.nl 23907 [2001:888:2000:d::a6]:48259 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:9785 Rouslan Korneychuk, 18.07.2011 09:09: > I don't know why, but I just had to try it (even though I don't usually use > Perl and had to look up a lot of stuff). I came up with this: > > /(?| > (\()(?&matched)([\}\]”›»】〉》」』]|$) | > (\{)(?&matched)([\)\]”›»】〉》」』]|$) | > (\[)(?&matched)([\)\}”›»】〉》」』]|$) | > (“)(?&matched)([\)\}\]›»】〉》」』]|$) | > (‹)(?&matched)([\)\}\]”»】〉》」』]|$) | > («)(?&matched)([\)\}\]”›】〉》」』]|$) | > (【)(?&matched)([\)\}\]”›»〉》」』]|$) | > (〈)(?&matched)([\)\}\]”›»】》」』]|$) | > (《)(?&matched)([\)\}\]”›»】〉」』]|$) | > (「)(?&matched)([\)\}\]”›»】〉》』]|$) | > (『)(?&matched)([\)\}\]”›»】〉》」]|$)) > (?(DEFINE)(?(?: > \((?&matched)\) | > \{(?&matched)\} | > \[(?&matched)\] | > “(?&matched)” | > ‹(?&matched)› | > «(?&matched)» | > 【(?&matched)】 | > 〈(?&matched)〉 | > 《(?&matched)》 | > 「(?&matched)」 | > 『(?&matched)』 | > [^\(\{\[“‹«【〈《「『\)\}\]”›»】〉》」』]++)*+)) > /sx; > > If the pattern matches, there is a mismatched bracket. $1 is set to the > mismatched opening bracket. $-[1] is its location. $2 is the mismatched > closing bracket or '' if the bracket was never closed. $-[2] is set to the > location of the closing bracket or the end of the string if the bracket > wasn't closed. > > > I didn't write all that manually; it was generated with this: > > my @open = ('\(','\{','\[','“','‹','«','【','〈','《','「','『'); > my @close = ('\)','\}','\]','”','›','»','】','〉','》','」','』'); > > '(?|'.join('|',map > {'('.$open[$_].')(?&matched)(['.join('',@close[0..($_-1),($_+1)..$#close]).']|$)'} > (0 .. $#open)).')(?(DEFINE)(?(?:'.join('|',map > {$open[$_].'(?&matched)'.$close[$_]} (0 .. > $#open)).'|[^'.join('',@open,@close).']++)*+))' That's solid Perl. Both the code generator and the generated code are unreadable. Well done! Stefan