Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.dougwise.org!nntpfeed.proxad.net!proxad.net!feeder2-2.proxad.net!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail
Content-Type: text/plain; charset="UTF-8"
Message-ID: <2075498.BysXYHRu7a@PointedEars.de>
From: Thomas 'PointedEars' Lahn <PointedEars@web.de>
Reply-To: Thomas 'PointedEars' Lahn <usenet@PointedEars.de>
Organization: PointedEars Software (PES)
Date: Mon, 18 Jul 2011 23:59:06 +0200
User-Agent: KNode/4.4.11
Content-Transfer-Encoding: 8Bit
Subject: Re: a little parsing challenge =?UTF-8?B?4pi6?=
Newsgroups: comp.lang.python
References: <36037253-086b-4467-a1db-9492d3772e78@r5g2000prf.googlegroups.com> <j01ph6$knt$1@speranza.aioe.org> <mailman.1223.1311012663.1164.python-list@python.org>
Followup-To: comp.lang.python
MIME-Version: 1.0
Lines: 72
NNTP-Posting-Date: 18 Jul 2011 23:59:06 CEST
NNTP-Posting-Host: 5c7aa520.newsspool1.arcor-online.net
X-Trace: DXC=SeZ_RemZ71Y\9P[:DUn00Qic==]BZ:af^4Fo<]lROoRQ<`=YMgDjhgRD^`i3[Q<Q2^DZm8W4\YJN\T<8F<]0D<`YQ7i`Hj3GG7XQZE1IB^DThT
X-Complaints-To: usenet-abuse@arcor.de
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:9837

Ian Kelly wrote:

> Billy Mays wrote:
>> I gave it a shot.  It doesn't do any of the Unicode delims, because let's
>> face it, Unicode is for goobers.
> 
> Uh, okay...
> 
> Your script also misses the requirement of outputting the index or row
> and column of the first mismatched bracket.

Thanks to Python's expressiveness, this can be easily remedied (see below).  

I also do not follow Billy's comment about Unicode.  Unicode and the fact 
that Python supports it *natively* cannot be appreciated enough in a 
globalized world.

However, I have learned a lot about being pythonic from his posting (take 
those generator expressions, for example!), and the idea of looking at the 
top of a stack for reference is a really good one.  Thank you, Billy!

Here is my improvement of his code, which should fill the mentioned gaps.
I have also reversed the order in the report line as I think it is more 
natural this way.  I have tested the code superficially with a directory 
containing a single text file.  Watch for word-wrap:

# encoding: utf-8
'''
Created on 2011-07-18

@author: Thomas 'PointedEars' Lahn <PointedEars@web.de>, based on an idea of
Billy Mays <81282ed9a88799d21e77957df2d84bd6514d9af6@myhashismyemail.com>
in <news:j01ph6$knt$1@speranza.aioe.org> 
'''
import sys, os

pairs = {u'}': u'{', u')': u'(', u']': u'[',
         u'”': u'“', u'›': u'‹', u'»': u'«',
         u'】': u'【', u'〉': u'〈', u'》': u'《',
         u'」': u'「', u'』': u'『'}
valid = set(v for pair in pairs.items() for v in pair)

if __name__ == '__main__':
    for dirpath, dirnames, filenames in os.walk(sys.argv[1]):
        for name in filenames:
            stack = [' ']

            # you can use chardet etc. instead 
            encoding = 'utf-8'

            with open(os.path.join(dirpath, name), 'r') as f:
                reported = False
                chars = ((c, line_no, col) for line_no, line in enumerate(f) 
for col, c in enumerate(line.decode(encoding)) if c in valid)
                for c, line_no, col in chars:
                    if c in pairs:
                        if stack[-1] == pairs[c]:
                            stack.pop()
                        else:
                            if not reported:
                                first_bad = (c, line_no + 1, col + 1)
                                reported = True
                    else:
                        stack.append(c)

            print '%s: %s' % (name, ("good" if len(stack) == 1 else "bad 
'%s' at %s:%s" % first_bad))

-- 
PointedEars

Bitte keine Kopien per E-Mail. / Please do not Cc: me.