Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #98364

Re: Regular expressions

Newsgroups comp.lang.python
Date 2015-11-06 11:52 -0800
References (10 earlier) <3f3l3bpm478nbnsec6c9tr6rre0aontkq1@4ax.com> <5e15df62-00b1-4746-83f8-c0821514d20b@googlegroups.com> <563abdda$0$1614$c3e8da3$5496439d@news.astraweb.com> <2fd7d161-b1cb-4274-b8dc-0157916413f1@googlegroups.com> <n1f39d$o44$1@dont-email.me>
Message-ID <a379f9ca-dd27-412c-a005-bfef9b9e6abc@googlegroups.com> (permalink)
Subject Re: Regular expressions
From rurpy@yahoo.com

Show all headers | View raw


On 11/05/2015 01:18 AM, Christian Gollwitzer wrote:
> Am 05.11.15 um 06:59 schrieb rurpy:
>>> Can you call yourself a well-rounded programmer without at least
>>> a basic understanding of some regex library? Well, probably not.
>>> But that's part of the problem with regexes. They have, to some
>>> degree, driven out potentially better -- or at least differently
>>> bad -- pattern matching solutions, such as (E)BNF grammars,
>>> SNOBOL pattern matching, or lowly globbing patterns. Or even
>>> alternative idioms, like Hypercard's "chunking" idioms.
>> 
>> Hmm, very good point.  I wonder why all those "potentially better" 
>> solutions have not been more widely adopted?  A conspiracy by a 
>> secret regex cabal?
> 
> I'm mostly on the pro-side of the regex discussion, but this IS a
> valid point. regexes are not always a good way to express a pattern,
> even if the pattern is regular. The point is, that you can't build
> them up easily piece-by-piece. Say, you want a regex like "first an
> international phone number, then a name, then a second phone number"
> - you will have to *repeat* the pattern for phone number twice. In
> more complex cases this can become a nightmare, like the monster that
> was mentioned before to validate an email.
> 
> A better alternative, then, is PEG for example. You can easily write
> [...]

That is the solution adopted by Perl 6. I have always thought lexing
and parsing solutions for Python were a weak spot in the Python eco-
system and I was about to write that I would love to see a PEG parser
for python when I saw this:

http://fdik.org/pyPEG/

Unfortunately it suffers from the same problem that Pyparsing, Ply
and the rest suffer from: they use Python syntax to express the
parsing rules rather than using a dedicated problem-specific syntax
such as you used to illustrate peg parsing:

> pattern <- phone_number name phone_number phone_number <- '+' [0-9]+
> ( '-' [0-9]+ )* name <-  [[:alpha:]]+

Some here have complained about excessive brevity of regexs but I
much prefer using problem-specific syntax like "(a*)" to having to
express a pattern using python with something like

star = RegexMatchAny()
a_group = RegexGroup('a' + star)
...

and I don't want to have to do something similar with PEG (or Ply
or Pyparsing) to formulate their rules.

>[...]
> As a 12 year old, not knowing anything about pattern recognition, but
> thinking I was the king, as is usual for boys in that age, I sat down
> and manually constructed a recursive descent parser in a BASIC like
> language. It had 1000 lines and took me a few weeks to get it
> correct. Finally the solution was accepted as working, but my
> participation was rejected because the solutions lacked
> documentation. 16 years later I used the problem for a course on
> string processing (that's what the PDF is for), and asked the
> students to solve it using regexes. My own solution consists of 67
> characters, and it took me5 minutes to write it down.
> 
> Admittedly, this problem is constructed, but solving similar tasks by
> regexes is still something that I need to do on a daily basis, when I
> get data from other scientists in odd formats and I need to
> preprocess them. I know people who use a spreadsheet and copy/paste
> millions of datapoints manually becasue they lack the knowledge of
> using such tools.

I think in many cases those most hostile to regexes are the also
those who use them (or need to use them) the least. While my use
of regexes are limited to fairly simple ones they are complicated
enough that I'm sure it would take orders of magnitude longer
to get the same effect in python.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 20:09 -0500
  Re: Regular expressions MRAB <python@mrabarnett.plus.com> - 2015-11-03 01:19 +0000
    Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
  Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-02 20:42 -0600
    Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
      Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-02 22:58 -0500
        Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
          Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:38 -0700
            Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:33 -0800
              Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 19:04 -0700
                Re: Regular expressions Dan Sommers <dan@tombstonezero.net> - 2015-11-04 02:55 +0000
                Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:23 +1100
                Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 20:47 -0700
                Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-04 13:27 +0000
                Re: Regular expressions Nobody <nobody@nowhere.invalid> - 2015-11-04 05:05 +0000
                Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-04 09:57 +0100
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:28 +1100
                Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 20:48 -0600
                Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:03 +1100
                Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-05 09:33 +0100
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 23:05 +1100
                Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-05 08:00 -0600
                Re: Regular expressions Albert van der Horst <albert@spenarnc.xs4all.nl> - 2015-11-05 13:39 +0000
                Re: Regular expressions Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-04 08:00 -0500
                Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-04 08:13 -0700
                Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:00 -0500
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:24 -0800
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:24 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:59 -0800
                Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 09:18 +0100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-06 11:52 -0800
                Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-06 21:36 +0100
                Re: Regular expressions Larry Martell <larry.martell@gmail.com> - 2015-11-06 15:42 -0500
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:34 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 22:27 -0800
                Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:42 -0600
                Re: Regular expressions Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-11-05 20:55 +1300
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:06 +1100
                What does “grep” stand for? (was: Regular expressions) Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 05:24 +1100
                Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 20:38 +0100
                Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:42 +1100
                Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 08:32 +0100
                Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:00 +1100
                Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 10:19 -0500
                Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 18:29 +0000
                Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 14:56 -0500
                Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 20:19 +0000
                Re: What does “grep” stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-05 20:18 -0500
                Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-05 19:36 -0800
                Re: What does “grep” stand for? Dan Sommers <dan@tombstonezero.net> - 2015-11-06 05:31 +0000
                Re: What does “grep” stand for? William Ray Wing <wrw@mac.com> - 2015-11-06 08:25 -0500
                Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-06 19:21 -0800
                Re: What does ???grep??? stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-06 14:15 +0000
                Re: What does ???grep??? stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-06 20:03 -0500
                Re: What does “grep” stand for? (was: Regular expressions) Tim Chase <python.list@tim.thechases.com> - 2015-11-04 13:05 -0600
                Re: Regular expressions Terry Reedy <tjreedy@udel.edu> - 2015-11-04 18:08 -0500
                Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:29 -0500
              Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 21:12 -0600
              Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 14:26 +1100
              Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:48 +1100
                Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 08:21 +0100
                Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 19:47 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:43 -0800
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:38 -0800
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 01:52 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:13 -0800
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:33 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:42 -0800
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:26 +1100
                Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:07 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:54 -0800
                Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-05 10:14 +0100
                Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:02 -0500
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 11:54 +1100
                Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-05 10:07 -0500
                Re: Regular expressions rurpy@yahoo.com - 2015-11-06 12:46 -0800
          Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-03 18:15 +1100
            Re: Regular expressions Nick Sarbicki <nick.a.sarbicki@gmail.com> - 2015-11-03 08:43 +0000
            Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:22 -0800
      Re: Regular expressions Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-03 12:38 +0000
      Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:53 -0600
      Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-03 10:34 -0500
        Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-03 11:10 -0500
          Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 03:20 +1100
            Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:35 +1100
              Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-04 12:41 +0100
    Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 14:56 +0000
  Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 20:51 -0700
    Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
      Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:33 -0700
      Re: Regular expressions Robin Koch <robin.koch@t-online.de> - 2015-11-03 23:58 +0100
  Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 10:25 +0100
  Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:50 -0600
  Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 15:00 +0100
    Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 17:12 +0200
      Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 16:35 +0100
        Re: Irregular last line in a text file, was Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 18:42 +0200
      Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 10:56 -0600
        Re: Irregular last line in a text file, was Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:39 +1100
          Re: Irregular last line in a text file, was Re: Regular expressions Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-04 10:07 +0000
          Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:33 -0600
      Re: Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 18:44 +0100
      Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:33 -0700
      Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:39 -0700
      Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 13:45 -0600
        Re: Irregular last line in a text file, was Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 22:15 +0000

csiph-web