Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.dougwise.org!nntpfeed.proxad.net!proxad.net!feeder2-2.proxad.net!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail Content-Type: text/plain; charset="UTF-8" Message-ID: <2075498.BysXYHRu7a@PointedEars.de> From: Thomas 'PointedEars' Lahn Reply-To: Thomas 'PointedEars' Lahn Organization: PointedEars Software (PES) Date: Mon, 18 Jul 2011 23:59:06 +0200 User-Agent: KNode/4.4.11 Content-Transfer-Encoding: 8Bit Subject: Re: a little parsing challenge =?UTF-8?B?4pi6?= Newsgroups: comp.lang.python References: <36037253-086b-4467-a1db-9492d3772e78@r5g2000prf.googlegroups.com> Followup-To: comp.lang.python MIME-Version: 1.0 Lines: 72 NNTP-Posting-Date: 18 Jul 2011 23:59:06 CEST NNTP-Posting-Host: 5c7aa520.newsspool1.arcor-online.net X-Trace: DXC=SeZ_RemZ71Y\9P[:DUn00Qic==]BZ:af^4Fo<]lROoRQ<`=YMgDjhgRD^`i3[Q Billy Mays wrote: >> I gave it a shot. It doesn't do any of the Unicode delims, because let's >> face it, Unicode is for goobers. > > Uh, okay... > > Your script also misses the requirement of outputting the index or row > and column of the first mismatched bracket. Thanks to Python's expressiveness, this can be easily remedied (see below). I also do not follow Billy's comment about Unicode. Unicode and the fact that Python supports it *natively* cannot be appreciated enough in a globalized world. However, I have learned a lot about being pythonic from his posting (take those generator expressions, for example!), and the idea of looking at the top of a stack for reference is a really good one. Thank you, Billy! Here is my improvement of his code, which should fill the mentioned gaps. I have also reversed the order in the report line as I think it is more natural this way. I have tested the code superficially with a directory containing a single text file. Watch for word-wrap: # encoding: utf-8 ''' Created on 2011-07-18 @author: Thomas 'PointedEars' Lahn , based on an idea of Billy Mays <81282ed9a88799d21e77957df2d84bd6514d9af6@myhashismyemail.com> in ''' import sys, os pairs = {u'}': u'{', u')': u'(', u']': u'[', u'”': u'“', u'›': u'‹', u'»': u'«', u'】': u'【', u'〉': u'〈', u'》': u'《', u'」': u'「', u'』': u'『'} valid = set(v for pair in pairs.items() for v in pair) if __name__ == '__main__': for dirpath, dirnames, filenames in os.walk(sys.argv[1]): for name in filenames: stack = [' '] # you can use chardet etc. instead encoding = 'utf-8' with open(os.path.join(dirpath, name), 'r') as f: reported = False chars = ((c, line_no, col) for line_no, line in enumerate(f) for col, c in enumerate(line.decode(encoding)) if c in valid) for c, line_no, col in chars: if c in pairs: if stack[-1] == pairs[c]: stack.pop() else: if not reported: first_bad = (c, line_no + 1, col + 1) reported = True else: stack.append(c) print '%s: %s' % (name, ("good" if len(stack) == 1 else "bad '%s' at %s:%s" % first_bad)) -- PointedEars Bitte keine Kopien per E-Mail. / Please do not Cc: me.