Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #9705

Re: a little parsing challenge ☺

Message-ID <2396442.Iyju66GlRV@PointedEars.de> (permalink)
From Thomas 'PointedEars' Lahn <PointedEars@web.de>
Organization PointedEars Software (PES)
Date 2011-07-17 16:15 +0200
Subject Re: a little parsing challenge ☺
Newsgroups comp.lang.python
References <36037253-086b-4467-a1db-9492d3772e78@r5g2000prf.googlegroups.com> <mailman.1166.1310902484.1164.python-list@python.org>
Followup-To comp.lang.python

Followups directed to: comp.lang.python

Show all headers | View raw


Chris Angelico wrote:

> On Sun, Jul 17, 2011 at 5:47 PM, Xah Lee <xahlee@gmail.com> wrote:
>> the problem is to write a script that can check a dir of text files
>> (and all subdirs) and reports if a file has any mismatched matching
>> brackets.
> 
> I wonder will it be possible to code the whole thing as a single
> regular expression... I'm pretty sure it could be done as a sed or awk
> script, but I'm insufficiently expert in either to do the job.

Did you notice the excessive crosspost?  Please do not feed the troll.

In the classical sense is not possible, as classical regular expressions 
have no concept of recursion.  Indeed, matching brackets are a textbook 
example for a non-regular¹ context-free language L = {a^n b^n; n > 0} that 
can only be recognized by a pushdown automaton (PDA).  (Finite automata 
"cannot count".)

However, it is possible (and done) to use classical regular expressions or 
non-recursive Regular Expressions (note the different character case) to 
match tokens more efficiently with a PDA implementation.  This is commonly 
called a parser.  (Programming languages tend to be specified in terms of a 
context-free grammar – they tend to be context-free languages –, which is 
why a parser is a part of a compiler or interpreter.  See for example 
Python.²)

It is possible, with Perl-compatible Regular Expressions (PCRE), provided 
that you have enough memory, to use such an extended Regular Expression (not 
to be confused with EREs³)⁴:

  \((([^()]*|(?R))*)\)

However, even Python 3.2 does not support those expressions (although it 
supports some other PCRE patterns, like named subexpressions)⁵, neither do 
standard and forked versions of sed(1) (BREs, EREs, using an NFA) nor awk 
(EREs, using a DFA or NFA).  [That is not to say it would not be possible 
with Python, or sed or awk (both of which are off-topic here), but that more 
than a Regular Expression would be required.]

On this subject, I recommend reading the preview chapters of the second and 
third editions, respectively, of Jeffrey E. F. Friedl's "Mastering Regular 
Expressions", which are available online for free at O'Reilly.com⁶.

HTH.

______
¹ because it can be proved that the pumping lemma for regular languages
  does not apply to it; see also
  <http://en.wikipedia.org/wiki/Chomsky_hierarchy> pp.
² <http://docs.python.org/reference/>
³ <http://en.wikipedia.org/wiki/Regular_expression>
⁴ Cf. <http://php.net/manual/en/regexp.reference.recursive.php>
⁵ <http://docs.python.org/library/re.html>
⁶ <http://oreilly.com/catalog/regex/chapter/ch04.html>,
  <http://oreilly.com/catalog/9780596528126/preview#preview>
-- 
PointedEars

Bitte keine Kopien per E-Mail. / Please do not Cc: me.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-17 00:47 -0700
  Re: a little parsing challenge ☺ Raymond Hettinger <python@rcn.com> - 2011-07-17 02:48 -0700
    Re: a little parsing challenge ☺ Robert Klemme <shortcutter@googlemail.com> - 2011-07-17 15:20 +0200
      Re: a little parsing challenge ☺ mhenn <michihenn@hotmail.com> - 2011-07-17 15:55 +0200
        Re: a little parsing challenge ☺ Robert Klemme <shortcutter@googlemail.com> - 2011-07-17 18:01 +0200
          Re: a little parsing challenge ☺ Robert Klemme <shortcutter@googlemail.com> - 2011-07-17 18:54 +0200
    Re: a little parsing challenge ☺ Thomas Boell <tboell@domain.invalid> - 2011-07-17 17:49 +0200
      Re: a little parsing challenge ☺ Raymond Hettinger <python@rcn.com> - 2011-07-17 12:16 -0700
    Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-18 07:39 -0700
      Re: a little parsing challenge ☺ Robert Klemme <shortcutter@googlemail.com> - 2011-07-20 08:23 +0200
      Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-20 03:31 -0700
        Re: a little parsing challenge ☺ "Uri Guttman" <uri@StemSystems.com> - 2011-07-20 12:31 -0400
          Re: a little parsing challenge ☺ rusi <rustompmody@gmail.com> - 2011-07-20 10:30 -0700
          Re: a little parsing challenge ☺ merlyn@stonehenge.com (Randal L. Schwartz) - 2011-07-20 12:06 -0700
            Re: a little parsing challenge ☺ Jason Earl <jearl@notengoamigos.org> - 2011-07-20 14:57 -0600
    Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 09:54 -0700
      Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-19 20:07 +0200
        Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 05:58 -0700
          Re: a little parsing challenge ☺ Ian Kelly <ian.g.kelly@gmail.com> - 2011-07-21 08:26 -0600
            Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 08:36 -0700
              Re: a little parsing challenge ☺ python@bdurham.com - 2011-07-21 12:43 -0400
                Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 11:53 -0700
                Re: a little parsing challenge ☺ Terry Reedy <tjreedy@udel.edu> - 2011-07-21 18:37 -0400
          Re: a little parsing challenge ☺ John O'Hagan <research@johnohagan.com> - 2011-07-25 15:57 +1000
      Re: a little parsing challenge ☺ Ian Kelly <ian.g.kelly@gmail.com> - 2011-07-19 12:08 -0600
  Re: a little parsing challenge ☺ Chris Angelico <rosuav@gmail.com> - 2011-07-17 21:34 +1000
    Re: a little parsing challenge ☺ rusi <rustompmody@gmail.com> - 2011-07-17 04:52 -0700
    Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-17 16:15 +0200
      Re: a little parsing challenge ☺ Raymond Hettinger <python@rcn.com> - 2011-07-17 12:18 -0700
        Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-17 22:16 +0200
          Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-17 22:57 +0200
      Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-17 23:43 +0200
      Re: a little parsing challenge ☺ Rouslan Korneychuk <rouslank@msn.com> - 2011-07-18 03:09 -0400
        Re: a little parsing challenge ☺ Stefan Behnel <stefan_ml@behnel.de> - 2011-07-18 09:24 +0200
          Re: a little parsing challenge ☺ Rouslan Korneychuk <rouslank@msn.com> - 2011-07-18 04:04 -0400
        Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-18 18:46 +0200
          Re: a little parsing challenge ☺ Rouslan Korneychuk <rouslank@msn.com> - 2011-07-18 14:14 -0400
        Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 06:23 -0700
          Re: a little parsing challenge ☺ Rouslan Korneychuk <rouslank@msn.com> - 2011-07-21 17:54 -0400
  Re: a little parsing challenge ☺ gene heskett <gheskett@wdtv.com> - 2011-07-17 10:26 -0400
  Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-17 08:31 -0700
    Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 10:49 -0700
      Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-19 20:14 +0200
        Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-21 05:29 -0700
          Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-21 15:21 +0200
      Re: a little parsing challenge ☺ Thomas Jollans <t@jollybox.de> - 2011-07-19 20:17 +0200
  Re: a little parsing challenge ☺ rantingrick <rantingrick@gmail.com> - 2011-07-17 18:52 -0700
  Re: a little parsing challenge ☺ Billy Mays <81282ed9a88799d21e77957df2d84bd6514d9af6@myhashismyemail.com> - 2011-07-18 13:12 -0400
    Re: a little parsing challenge ☺ Ian Kelly <ian.g.kelly@gmail.com> - 2011-07-18 12:10 -0600
      Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-18 23:59 +0200
        Re: a little parsing challenge ☺ Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-07-19 08:09 +0200
        Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 10:32 -0700
    Re: a little parsing challenge ☺ Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-19 09:56 +1000
      Re: a little parsing challenge ☺ Billy Mays <noway@nohow.com> - 2011-07-18 22:07 -0400
        Re: a little parsing challenge ☺ rusi <rustompmody@gmail.com> - 2011-07-18 19:50 -0700
          Re: a little parsing challenge ☺ Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-19 13:11 +1000
            Re: a little parsing challenge ☺ rusi <rustompmody@gmail.com> - 2011-07-18 21:59 -0700
              Re: a little parsing challenge ☺ Chris Angelico <rosuav@gmail.com> - 2011-07-19 15:36 +1000
        Re: a little parsing challenge ☺ MRAB <python@mrabarnett.plus.com> - 2011-07-19 04:08 +0100
        Re: a little parsing challenge ☺ Benjamin Kaplan <benjamin.kaplan@case.edu> - 2011-07-18 20:54 -0700
        Re: a little parsing challenge ☺ Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-19 14:30 +1000
        Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 01:58 -0700
    Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 10:14 -0700
      Re: a little parsing challenge ☺ Billy Mays <81282ed9a88799d21e77957df2d84bd6514d9af6@myhashismyemail.com> - 2011-07-19 13:33 -0400
        Re: a little parsing challenge ☺ Xah Lee <xahlee@gmail.com> - 2011-07-19 11:12 -0700
          Re: a little parsing challenge ☺ Terry Reedy <tjreedy@udel.edu> - 2011-07-19 15:09 -0400
            Re: a little parsing challenge ☺ jmfauth <wxjmfauth@gmail.com> - 2011-07-19 23:29 -0700
              Re: a little parsing challenge ☺ Ian Kelly <ian.g.kelly@gmail.com> - 2011-07-20 01:29 -0600
                Re: a little parsing challenge ☺ jmfauth <wxjmfauth@gmail.com> - 2011-07-20 00:54 -0700
                Re: a little parsing challenge ☺ Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-20 18:18 +1000
  Re: a little parsing challenge ? sln@netherlands.com - 2011-07-18 12:34 -0700
  Re: a little parsing challenge ☺ Mark Tarver <dr.mtarver@gmail.com> - 2011-07-19 22:43 -0700

csiph-web