Groups > comp.lang.python > #98121 > unrolled thread

Regular expressions

Started by	Seymore4Head <Seymore4Head@Hotmail.invalid>
First post	2015-11-02 20:09 -0500
Last post	2015-11-03 22:15 +0000
Articles	20 on this page of 106 — 30 participants

Back to article view | Back to comp.lang.python

  Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 20:09 -0500
    Re: Regular expressions MRAB <python@mrabarnett.plus.com> - 2015-11-03 01:19 +0000
      Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
    Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-02 20:42 -0600
      Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
        Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-02 22:58 -0500
          Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
            Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:38 -0700
              Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:33 -0800
                Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 19:04 -0700
                  Re: Regular expressions Dan Sommers <dan@tombstonezero.net> - 2015-11-04 02:55 +0000
                    Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:23 +1100
                      Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 20:47 -0700
                        Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-04 13:27 +0000
                      Re: Regular expressions Nobody <nobody@nowhere.invalid> - 2015-11-04 05:05 +0000
                      Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-04 09:57 +0100
                        Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:28 +1100
                          Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 20:48 -0600
                          Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:03 +1100
                          Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-05 09:33 +0100
                            Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 23:05 +1100
                              Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-05 08:00 -0600
                          Re: Regular expressions Albert van der Horst <albert@spenarnc.xs4all.nl> - 2015-11-05 13:39 +0000
                      Re: Regular expressions Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-04 08:00 -0500
                      Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-04 08:13 -0700
                        Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:00 -0500
                          Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:24 -0800
                            Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:24 +1100
                              Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:59 -0800
                                Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 09:18 +0100
                                  Re: Regular expressions rurpy@yahoo.com - 2015-11-06 11:52 -0800
                                    Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-06 21:36 +0100
                                      Re: Regular expressions Larry Martell <larry.martell@gmail.com> - 2015-11-06 15:42 -0500
                            Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:34 +1100
                              Re: Regular expressions rurpy@yahoo.com - 2015-11-04 22:27 -0800
                      Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:42 -0600
                        Re: Regular expressions Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-11-05 20:55 +1300
                          Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:06 +1100
                      What does “grep” stand for? (was: Regular expressions) Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 05:24 +1100
                        Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 20:38 +0100
                          Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:42 +1100
                            Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 08:32 +0100
                              Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:00 +1100
                          Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 10:19 -0500
                            Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 18:29 +0000
                              Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 14:56 -0500
                                Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 20:19 +0000
                                  Re: What does “grep” stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-05 20:18 -0500
                                    Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-05 19:36 -0800
                                      Re: What does “grep” stand for? Dan Sommers <dan@tombstonezero.net> - 2015-11-06 05:31 +0000
                                      Re: What does “grep” stand for? William Ray Wing <wrw@mac.com> - 2015-11-06 08:25 -0500
                                        Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-06 19:21 -0800
                                    Re: What does ???grep??? stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-06 14:15 +0000
                                      Re: What does ???grep??? stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-06 20:03 -0500
                      Re: What does “grep” stand for? (was: Regular expressions) Tim Chase <python.list@tim.thechases.com> - 2015-11-04 13:05 -0600
                      Re: Regular expressions Terry Reedy <tjreedy@udel.edu> - 2015-11-04 18:08 -0500
                        Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:29 -0500
                Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 21:12 -0600
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 14:26 +1100
                Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:48 +1100
                  Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 08:21 +0100
                    Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 19:47 +1100
                      Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:43 -0800
                  Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:38 -0800
                    Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 01:52 +1100
                      Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:13 -0800
                        Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:33 +1100
                          Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:42 -0800
                        Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:26 +1100
                          Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:07 +1100
                          Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:54 -0800
                        Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-05 10:14 +0100
                  Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:02 -0500
                    Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 11:54 +1100
                      Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-05 10:07 -0500
                        Re: Regular expressions rurpy@yahoo.com - 2015-11-06 12:46 -0800
            Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-03 18:15 +1100
              Re: Regular expressions Nick Sarbicki <nick.a.sarbicki@gmail.com> - 2015-11-03 08:43 +0000
              Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:22 -0800
        Re: Regular expressions Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-03 12:38 +0000
        Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:53 -0600
        Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-03 10:34 -0500
          Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-03 11:10 -0500
            Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 03:20 +1100
              Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:35 +1100
                Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-04 12:41 +0100
      Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 14:56 +0000
    Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 20:51 -0700
      Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
        Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:33 -0700
        Re: Regular expressions Robin Koch <robin.koch@t-online.de> - 2015-11-03 23:58 +0100
    Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 10:25 +0100
    Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:50 -0600
    Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 15:00 +0100
      Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 17:12 +0200
        Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 16:35 +0100
          Re: Irregular last line in a text file, was Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 18:42 +0200
        Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 10:56 -0600
          Re: Irregular last line in a text file, was Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:39 +1100
            Re: Irregular last line in a text file, was Re: Regular expressions Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-04 10:07 +0000
            Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:33 -0600
        Re: Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 18:44 +0100
        Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:33 -0700
        Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:39 -0700
        Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 13:45 -0600
          Re: Irregular last line in a text file, was Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 22:15 +0000

Page 2 of 6 — ← Prev page 1 [2] 3 4 5 6 Next page →

#98297

From	Steven D'Aprano <steve@pearwood.info>
Date	2015-11-05 23:05 +1100
Message-ID	<563b45fa$0$1593$c3e8da3$5496439d@news.astraweb.com>
In reply to	#98291

On Thu, 5 Nov 2015 07:33 pm, Peter Otten wrote:

> Steven D'Aprano wrote:
> 
>> On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote:
>> 
>>> I tried Tim's example
>>> 
>>> $ seq 5 | grep '1*'
>>> 1
>>> 2
>>> 3
>>> 4
>>> 5
>>> $
>> 
>> I don't understand this. What on earth is grep matching? How does "4"
>> match "1*"?
> 
> Look for zero or more "1".

Doh!

Oh the shame, I knew that. Somehow I tangled myself in a knot, thinking that
it had to be 1 *followed by* zero or more characters. But of course it's
not a glob, it's a regex.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#98300

From	Tim Chase <python.list@tim.thechases.com>
Date	2015-11-05 08:00 -0600
Message-ID	<mailman.52.1446732098.16136.python-list@python.org>
In reply to	#98297

On 2015-11-05 23:05, Steven D'Aprano wrote:
> Oh the shame, I knew that. Somehow I tangled myself in a knot,
> thinking that it had to be 1 *followed by* zero or more characters.
> But of course it's not a glob, it's a regex.

But that's a good reminder of fnmatch/glob modules too.  Sometimes
all you need is to express a simple glob, in which case using a
regexp can cloud the clarity.

The overarching principle is to go for clarity & simplicity, rather
than favoring built-ins/glob/regex/parser modules all the time.

Want to test for presence in a string?  Just use the builtin "a in b"
test.  At the beginning/end?  Use .startswith()/.endswith() for
clarity.  Need to check if a string is purely
digits/alpha/alphanumerics/etc?  Use the
string .is{alnum,alpha,decimal,digit,identifier,lower,numeric,printable,space,title,upper}
methods on the string.

For simple wild-carding, use the fnmatch module to do simple
globbing.

For more complex pattern matching, you've got regexps.

Finally, for occasions when you're searching for repeated/nested
structures, using an add-on module like pyparsing will give you
clearer code.

Oh, and with regexps, people should be less afraid of verbose
multi-line strings with commenting

  r = re.compile(r"""
    ^                       # start of the string
    (?P<year>\d{4})         # capture 4 digits
    -                       # a literal dash
    (?P<month>\d{1,2})      # capture 1-2 digits
    -                       # another literal dash
    (?P<day>\d{1,2})        # capture 1-2 digits
    _                       # a literal underscore
    (?P<accountnum>         # capture the account-number
      [A-Z]{1,3}               # 1-3 letters
      \d+                      # followed by 1+ digits
      )
    \.txt                   # the extension of the file (ignored)
    $                       # the end of the string
    """, re.VERBOSE)

They are a LOT easier to come back to if you haven't touched the code
for a year.

-tkc

[toc] | [prev] | [next] | [standalone]

#98299

From	Albert van der Horst <albert@spenarnc.xs4all.nl>
Date	2015-11-05 13:39 +0000
Message-ID	<563b5c28$0$23757$e4fe514c@news.xs4all.nl>
In reply to	#98265

Steven D'Aprano <steve@pearwood.info> writes:

>On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote:

>> I tried Tim's example
>>
>> $ seq 5 | grep '1*'
>> 1
>> 2
>> 3
>> 4
>> 5
>> $

>I don't understand this. What on earth is grep matching? How does "4"
>match "1*"?


>> which surprised me because I remembered that there usually weren't any
>> matching lines when I invoked grep instead of egrep by mistake. So I tried
>> another one
>>
>> $ seq 5 | grep '[1-3]+'
>> $
>>
>> and then headed for the man page. Apparently there is a subset called
>> "basic regular expressions":
>>
>> """
>>   Basic vs Extended Regular Expressions
>>        In basic regular expressions the meta-characters ?, +, {, |, (,
>>        and ) lose their special meaning; instead use  the  backslashed
>>        versions \?, \+, \{, \|, \(, and \).
>> """

>None of this appears relevant, as the metacharacter * is not listed. So
>what's going on?

* is so fundamental that it never looses it special meaning.
Same for [ .

* means zero more of the preceeding char.
This makes + superfluous (a mere convenience) as
    [1-3]+
can be expressed as
    [1-3][1-3]*

Note that [1-3]* matches the empty string. This happens a lot.

Groetjes Albert




>--
>Steven
-- 
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

[toc] | [prev] | [next] | [standalone]

#98225

From	Dennis Lee Bieber <wlfraed@ix.netcom.com>
Date	2015-11-04 08:00 -0500
Message-ID	<mailman.16.1446642069.16136.python-list@python.org>
In reply to	#98203

On Wed, 04 Nov 2015 14:23:04 +1100, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> declaimed the following:

>
>I don't even know what grep stands for. 
>
	As I recall, something like: General(ized) Regular Expression Parser
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]

#98231

From	Michael Torrie <torriem@gmail.com>
Date	2015-11-04 08:13 -0700
Message-ID	<mailman.19.1446650043.16136.python-list@python.org>
In reply to	#98203

On 11/04/2015 01:57 AM, Peter Otten wrote:
> and then headed for the man page. Apparently there is a subset
> called "basic regular expressions":
>
> """>   Basic vs Extended Regular Expressions
>        In basic regular expressions the meta-characters ?, +, {, |, (,
>        and ) lose their special meaning; instead use  the  backslashed
>        versions \?, \+, \{, \|, \(, and \).
> """

Good catch. I think this must have been what my brain was thinking when
I commented about grep and regular expressions earlier. I checked the
man page but didn't read down far enough.

I was still technically wrong though.

It's neat to learn so much on these tangents that the python list goes
on frequently. Hope the OP is still lurking, reading all these comments,
though I suspect he's not.

[toc] | [prev] | [next] | [standalone]

#98251

From	Seymore4Head <Seymore4Head@Hotmail.invalid>
Date	2015-11-04 18:00 -0500
Message-ID	<3f3l3bpm478nbnsec6c9tr6rre0aontkq1@4ax.com>
In reply to	#98231

On Wed, 04 Nov 2015 08:13:51 -0700, Michael Torrie <torriem@gmail.com>
wrote:

>On 11/04/2015 01:57 AM, Peter Otten wrote:
>> and then headed for the man page. Apparently there is a subset
>> called "basic regular expressions":
>>
>> """>   Basic vs Extended Regular Expressions
>>        In basic regular expressions the meta-characters ?, +, {, |, (,
>>        and ) lose their special meaning; instead use  the  backslashed
>>        versions \?, \+, \{, \|, \(, and \).
>> """
>
>Good catch. I think this must have been what my brain was thinking when
>I commented about grep and regular expressions earlier. I checked the
>man page but didn't read down far enough.
>
>I was still technically wrong though.
>
>It's neat to learn so much on these tangents that the python list goes
>on frequently. Hope the OP is still lurking, reading all these comments,
>though I suspect he's not.
>
I am still here, but I have to admit I am not picking up too much.

[toc] | [prev] | [next] | [standalone]

#98259

From	rurpy@yahoo.com
Date	2015-11-04 16:24 -0800
Message-ID	<5e15df62-00b1-4746-83f8-c0821514d20b@googlegroups.com>
In reply to	#98251

On Wednesday, November 4, 2015 at 4:05:06 PM UTC-7, Seymore4Head wrote:
>[...]
> I am still here, but I have to admit I am not picking up too much.

The "take away" I recommend is: the folks here are often way 
overly negative regarding regular expressions and that you not
ignore them, but take them with a BIG grain of salt and continue 
learning about and using regexs.

You will find they are an indispensable tool, not just in Python 
programming but in many aspects of computer use.

[toc] | [prev] | [next] | [standalone]

#98264

From	Steven D'Aprano <steve@pearwood.info>
Date	2015-11-05 13:24 +1100
Message-ID	<563abdda$0$1614$c3e8da3$5496439d@news.astraweb.com>
In reply to	#98259

On Thu, 5 Nov 2015 11:24 am, rurpy@yahoo.com wrote:

> You will find they are an indispensable tool, not just in Python
> programming but in many aspects of computer use.

You will find them a useful tool, but not indispensable by any means.

Hint:

- How many languages make arithmetic a built-in part of the language? Almost
all of them. I don't know of any language that doesn't let you express
something like "1 + 1" using built-in functions or syntax. Arithmetic is
much closer to indispensable.

- How many languages make regular expressions a built-in part of the
language? Almost none of them. There's Perl, obviously, and its
predecessors sed and awk, and probably a few others, but most languages
relegate regular expressions to a library.

- How many useful programs can be written with regexes? Clearly there are
many. Some of them would even be quite difficult without regexes. (In
effect, you would have to invent your own pattern-matching code.)

- How many useful programs can be written without regexes? Clearly there are
also many. Every time you write a Python program and fail to import re,
you've written one.

Can you call yourself a well-rounded programmer without at least a basic
understanding of some regex library? Well, probably not. But that's part of
the problem with regexes. They have, to some degree, driven out potentially
better -- or at least differently bad -- pattern matching solutions, such
as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or
even alternative idioms, like Hypercard's "chunking" idioms.

When all you have is a hammer, everything looks like a nail.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#98281

From	rurpy@yahoo.com
Date	2015-11-04 21:59 -0800
Message-ID	<2fd7d161-b1cb-4274-b8dc-0157916413f1@googlegroups.com>
In reply to	#98264

On 11/04/2015 07:24 PM, Steven D'Aprano wrote:
> On Thu, 5 Nov 2015 11:24 am, wrote:
>
>> You will find they are an indispensable tool, not just in Python
>> programming but in many aspects of computer use.
>
> You will find them a useful tool, but not indispensable by any means.
>
> Hint:
>
> - How many languages make arithmetic a built-in part of the language? Almost
> all of them. I don't know of any language that doesn't let you express
> something like "1 + 1" using built-in functions or syntax. Arithmetic is
> much closer to indispensable.

By my count there are 2377.  That's counting rpn languages where it is
1 1 +.  If you don't count them it is 2250.

> - How many languages make regular expressions a built-in part of the
> language? Almost none of them. There's Perl, obviously, and its
> predecessors sed and awk, and probably a few others, but most languages
> relegate regular expressions to a library.

Yes, like python relegates io to a library.  
Clearly useful but not indispensable, after all who *really* needs 
anything beyond print() and input().  And that stuff in math like sin()
and exp().  How many programs use that geeky trig stuff?  Definitely not 
indispensable.  In fact, now that you pointed it out to me, clearly all
that stdlib stuff is dispensable, all one really needs to write 
"real programmer" programs is just core python.  Who the hell needs "sys"!

> - How many useful programs can be written with regexes? Clearly there are
> many. Some of them would even be quite difficult without regexes. (In
> effect, you would have to invent your own pattern-matching code.)

Lucky for me then that there are regexes.

> - How many useful programs can be written without regexes? Clearly there are
> also many. Every time you write a Python program and fail to import re,
> you've written one.

By golly, you're right.  Not every program I write uses regexes.
Who would have thought?!  However, you failed to establish that 
the programs I write without re are useful.

> Can you call yourself a well-rounded programmer without at least a basic
> understanding of some regex library? Well, probably not. But that's part of
> the problem with regexes. They have, to some degree, driven out potentially
> better -- or at least differently bad -- pattern matching solutions, such
> as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or
> even alternative idioms, like Hypercard's "chunking" idioms.

Hmm, very good point.  I wonder why all those "potentially better" 
solutions have not been more widely adopted?  A conspiracy by a 
secret regex cabal? 

> When all you have is a hammer, everything looks like a nail.

Lucky for us then, that we have more than just hammers!

Sorry for the flippant response (well, not really) but I find your 
arguments pedantic beyond the point of absurdity.  For me, regular 
expressions are indispensable in that if they were not available in 
Python I would not use Python.  The same is true of a number of other 
stdlib modules.  I don't give a rat's ass whether they are in a 
"library" that has to be explicitly requested with import or a 
"library" that is automatically loaded at startup.

[toc] | [prev] | [next] | [standalone]

#98290

From	Christian Gollwitzer <auriocus@gmx.de>
Date	2015-11-05 09:18 +0100
Message-ID	<n1f39d$o44$1@dont-email.me>
In reply to	#98281

Am 05.11.15 um 06:59 schrieb rurpy@yahoo.com:
>> Can you call yourself a well-rounded programmer without at least a basic
>> understanding of some regex library? Well, probably not. But that's part of
>> the problem with regexes. They have, to some degree, driven out potentially
>> better -- or at least differently bad -- pattern matching solutions, such
>> as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or
>> even alternative idioms, like Hypercard's "chunking" idioms.
>
> Hmm, very good point.  I wonder why all those "potentially better"
> solutions have not been more widely adopted?  A conspiracy by a
> secret regex cabal?

I'm mostly on the pro-side of the regex discussion, but this IS a valid 
point. regexes are not always a good way to express a pattern, even if 
the pattern is regular. The point is, that you can't build them up 
easily piece-by-piece. Say, you want a regex like "first an 
international phone number, then a name, then a second phone number" - 
you will have to *repeat* the pattern for phone number twice. In more 
complex cases this can become a nightmare, like the monster that was 
mentioned before to validate an email.

A better alternative, then, is PEG for example. You can easily write

pattern <- phone_number name phone_number
phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
name <-  [[:alpha:]]+

or something similar using a PEG parser. It has almost the same 
quantifiers as a Regex, is much more readable, runs in linear time over 
all inputs and can parse languages with the approximately the same 
complexity as the Knuth style parsers (LR(k) etc.), but without 
ambiguity. I'm really astonished that PEG parsing is not better 
supported in the world of computing, instead most people choose to stick 
to the lexer+scanner combination

Finally, an anecdote from my "early" life of computing. In 1990, when I 
was 12 years old, I participated in an annual competition of computer 
science for high school students. I was learning how to program without 
formal training, and solved one problem where a grammar was depicted as 
a flowchart and the task was to write parser for it, to check the 
validity of input strings. The grammar is depicted here (problem 1):

http://www.auriocus.de/StringKurs/RegEx/uebungen1.pdf

As a 12 year old, not knowing anything about pattern recognition, but 
thinking I was the king, as is usual for boys in that age, I sat down 
and manually constructed a recursive descent parser in a BASIC like 
language. It had 1000 lines and took me a few weeks to get it correct. 
Finally the solution was accepted as working, but my participation was 
rejected because the solutions lacked documentation. 16 years later I 
used the problem for a course on string processing (that's what the PDF 
is for), and asked the students to solve it using regexes. My own 
solution consists of 67 characters, and it took me5 minutes to write it 
down.

Admittedly, this problem is constructed, but solving similar tasks by 
regexes is still something that I need to do on a daily basis, when I 
get data from other scientists in odd formats and I need to preprocess 
them. I know people who use a spreadsheet and copy/paste millions of 
datapoints manually becasue they lack the knowledge of using such tools.

	Christian

[toc] | [prev] | [next] | [standalone]

#98364

From	rurpy@yahoo.com
Date	2015-11-06 11:52 -0800
Message-ID	<a379f9ca-dd27-412c-a005-bfef9b9e6abc@googlegroups.com>
In reply to	#98290

On 11/05/2015 01:18 AM, Christian Gollwitzer wrote:
> Am 05.11.15 um 06:59 schrieb rurpy:
>>> Can you call yourself a well-rounded programmer without at least
>>> a basic understanding of some regex library? Well, probably not.
>>> But that's part of the problem with regexes. They have, to some
>>> degree, driven out potentially better -- or at least differently
>>> bad -- pattern matching solutions, such as (E)BNF grammars,
>>> SNOBOL pattern matching, or lowly globbing patterns. Or even
>>> alternative idioms, like Hypercard's "chunking" idioms.
>> 
>> Hmm, very good point.  I wonder why all those "potentially better" 
>> solutions have not been more widely adopted?  A conspiracy by a 
>> secret regex cabal?
> 
> I'm mostly on the pro-side of the regex discussion, but this IS a
> valid point. regexes are not always a good way to express a pattern,
> even if the pattern is regular. The point is, that you can't build
> them up easily piece-by-piece. Say, you want a regex like "first an
> international phone number, then a name, then a second phone number"
> - you will have to *repeat* the pattern for phone number twice. In
> more complex cases this can become a nightmare, like the monster that
> was mentioned before to validate an email.
> 
> A better alternative, then, is PEG for example. You can easily write
> [...]

That is the solution adopted by Perl 6. I have always thought lexing
and parsing solutions for Python were a weak spot in the Python eco-
system and I was about to write that I would love to see a PEG parser
for python when I saw this:

http://fdik.org/pyPEG/

Unfortunately it suffers from the same problem that Pyparsing, Ply
and the rest suffer from: they use Python syntax to express the
parsing rules rather than using a dedicated problem-specific syntax
such as you used to illustrate peg parsing:

> pattern <- phone_number name phone_number phone_number <- '+' [0-9]+
> ( '-' [0-9]+ )* name <-  [[:alpha:]]+

Some here have complained about excessive brevity of regexs but I
much prefer using problem-specific syntax like "(a*)" to having to
express a pattern using python with something like

star = RegexMatchAny()
a_group = RegexGroup('a' + star)
...

and I don't want to have to do something similar with PEG (or Ply
or Pyparsing) to formulate their rules.

>[...]
> As a 12 year old, not knowing anything about pattern recognition, but
> thinking I was the king, as is usual for boys in that age, I sat down
> and manually constructed a recursive descent parser in a BASIC like
> language. It had 1000 lines and took me a few weeks to get it
> correct. Finally the solution was accepted as working, but my
> participation was rejected because the solutions lacked
> documentation. 16 years later I used the problem for a course on
> string processing (that's what the PDF is for), and asked the
> students to solve it using regexes. My own solution consists of 67
> characters, and it took me5 minutes to write it down.
> 
> Admittedly, this problem is constructed, but solving similar tasks by
> regexes is still something that I need to do on a daily basis, when I
> get data from other scientists in odd formats and I need to
> preprocess them. I know people who use a spreadsheet and copy/paste
> millions of datapoints manually becasue they lack the knowledge of
> using such tools.

I think in many cases those most hostile to regexes are the also
those who use them (or need to use them) the least. While my use
of regexes are limited to fairly simple ones they are complicated
enough that I'm sure it would take orders of magnitude longer
to get the same effect in python.

[toc] | [prev] | [next] | [standalone]

#98369

From	Christian Gollwitzer <auriocus@gmx.de>
Date	2015-11-06 21:36 +0100
Message-ID	<n1j2s6$1qm$1@dont-email.me>
In reply to	#98364

Am 06.11.15 um 20:52 schrieb rurpy@yahoo.com:
> I have always thought lexing
> and parsing solutions for Python were a weak spot in the Python eco-
> system and I was about to write that I would love to see a PEG parser
> for python when I saw this:
>
> http://fdik.org/pyPEG/
>
> Unfortunately it suffers from the same problem that Pyparsing, Ply
> and the rest suffer from: they use Python syntax to express the
> parsing rules rather than using a dedicated problem-specific syntax
> such as you used to illustrate peg parsing:
>
>> pattern <- phone_number name phone_number
 >> phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
 >> name <-  [[:alpha:]]+

That is actually real syntax of a parser generator used by me for 
another language (Tcl). A calculator example using this package can be 
found here: http://wiki.tcl.tk/39011
(actually it is a retargetable compiler in a few lines - very impressive)

And exactly as you say, it is working well exactly because it doesn't 
try to abuse function composition in the frontend to construct the parser.

Looking through the parser generators listed at 
http://bford.info/packrat/ it seems that waxeye could be interesting 
http://waxeye.org/manual.html#_using_waxeye - however I'm not sure the 
Python backend works with Python 3, maybe there will be unicode issues. 
Another bonus would be a compilable backend, like Cython or similar. The 
pt package mentioned above allows to generate a C module with an 
interface for Tcl. Compiled parsers are approximately 100x faster. I 
would expect a similar speedup for Python parsers.

> Some here have complained about excessive brevity of regexs but I
> much prefer using problem-specific syntax like "(a*)" to having to
> express a pattern using python with something like
>
> star = RegexMatchAny()
> a_group = RegexGroup('a' + star)
> ...

Yeah that is nonsense. Mechanical verbosity never leads to clarity (XML 
anyone?)

> I think in many cases those most hostile to regexes are the also
> those who use them (or need to use them) the least. While my use
> of regexes are limited to fairly simple ones they are complicated
> enough that I'm sure it would take orders of magnitude longer
> to get the same effect in python.

That's also my impression. The "two problems quote" was lame already for 
the first time. If you are satisfied with simple string functions, then 
either you do not have problems where you need regexps/other formal 
parsing tools, or you are very masochistic.

	Christian

[toc] | [prev] | [next] | [standalone]

#98370

From	Larry Martell <larry.martell@gmail.com>
Date	2015-11-06 15:42 -0500
Message-ID	<mailman.97.1446842610.16136.python-list@python.org>
In reply to	#98369

On Fri, Nov 6, 2015 at 3:36 PM, Christian Gollwitzer <auriocus@gmx.de> wrote:
> Am 06.11.15 um 20:52 schrieb rurpy@yahoo.com:
>>
>> I have always thought lexing
>> and parsing solutions for Python were a weak spot in the Python eco-
>> system and I was about to write that I would love to see a PEG parser
>> for python when I saw this:
>>
>> http://fdik.org/pyPEG/
>>
>> Unfortunately it suffers from the same problem that Pyparsing, Ply
>> and the rest suffer from: they use Python syntax to express the
>> parsing rules rather than using a dedicated problem-specific syntax
>> such as you used to illustrate peg parsing:
>>
>>> pattern <- phone_number name phone_number
>
>>> phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
>>> name <-  [[:alpha:]]+
>
> That is actually real syntax of a parser generator used by me for another
> language (Tcl). A calculator example using this package can be found here:
> http://wiki.tcl.tk/39011
> (actually it is a retargetable compiler in a few lines - very impressive)

Ah, Tcl - I wrote many a Tcl script back in the 80s to login to BBSs.

[toc] | [prev] | [next] | [standalone]

#98267

From	Chris Angelico <rosuav@gmail.com>
Date	2015-11-05 11:34 +1100
Message-ID	<mailman.39.1446691567.16136.python-list@python.org>
In reply to	#98259

On Thu, Nov 5, 2015 at 11:24 AM, rurpy--- via Python-list
<python-list@python.org> wrote:
> On Wednesday, November 4, 2015 at 4:05:06 PM UTC-7, Seymore4Head wrote:
>>[...]
>> I am still here, but I have to admit I am not picking up too much.
>
> The "take away" I recommend is: the folks here are often way
> overly negative regarding regular expressions and that you not
> ignore them, but take them with a BIG grain of salt and continue
> learning about and using regexs.
>
> You will find they are an indispensable tool, not just in Python
> programming but in many aspects of computer use.

The "take away" that I recommend is: Rurpy loves to argue in favour of
regular expressions, but as you can see from the other posts, there
are alternatives, which are often FAR superior.

ChrisA

[toc] | [prev] | [next] | [standalone]

#98282

From	rurpy@yahoo.com
Date	2015-11-04 22:27 -0800
Message-ID	<7ffc8ea8-6445-4c59-b3b6-611edfbf4f62@googlegroups.com>
In reply to	#98267

On Wednesday, November 4, 2015 at 7:46:24 PM UTC-7, Chris Angelico wrote:
> On Thu, Nov 5, 2015 at 11:24 AM, rurpy wrote:

> The "take away" that I recommend is: Rurpy loves to argue in favour of
> regular expressions,

No, I don't love it, I quite dislike it.

> but as you can see from the other posts, there
> are alternatives, which are often FAR superior.

No, not FAR superior, just preferable and just in the simple cases,
regexes generally being better in anything beyond simple.

[toc] | [prev] | [next] | [standalone]

#98233

From	Tim Chase <python.list@tim.thechases.com>
Date	2015-11-04 09:42 -0600
Message-ID	<mailman.21.1446652375.16136.python-list@python.org>
In reply to	#98203

On 2015-11-04 09:57, Peter Otten wrote:
> Well, I didn't know that grep uses regular expressions by default.

It doesn't help that grep(1) comes in multiple flavors:

grep:  should use BRE (Basic REs)
fgrep:  same as "grep -F"; uses fixed strings, no REs
egrep:  same as "grep -E"; uses ERE (Extended REs)
grep -P: a GNUism to use PCREs (Perl Compatible REs)

there's also an "rgrep" which is just "grep -r" which I find kinda
silly/redundant. Though frankly I feel the same way about fgrep/egrep
since they just activate a command-line switch.

You get even crazier when you start adding zgrep/zegrep/zfgrep.

-tkc

[toc] | [prev] | [next] | [standalone]

#98287

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2015-11-05 20:55 +1300
Message-ID	<da0gcbFs13sU1@mid.individual.net>
In reply to	#98233

Tim Chase wrote:

> You get even crazier when you start adding zgrep/zegrep/zfgrep.

It's fitting somehow that we should need an RE
to describe all the possible names of the grep
command.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#98289

From	Chris Angelico <rosuav@gmail.com>
Date	2015-11-05 19:06 +1100
Message-ID	<mailman.47.1446710789.16136.python-list@python.org>
In reply to	#98287

On Thu, Nov 5, 2015 at 6:55 PM, Gregory Ewing
<greg.ewing@canterbury.ac.nz> wrote:
> Tim Chase wrote:
>
>> You get even crazier when you start adding zgrep/zegrep/zfgrep.
>
>
> It's fitting somehow that we should need an RE
> to describe all the possible names of the grep
> command.

Regex engine golf: Find the shortest regex that matches the names of
all GNU commands which accept regular expressions, and no other
commands!

ChrisA

[toc] | [prev] | [next] | [standalone]

#98236 — What does “grep” stand for? (was: Regular expressions)

From	Ben Finney <ben+python@benfinney.id.au>
Date	2015-11-05 05:24 +1100
Subject	What does “grep” stand for? (was: Regular expressions)
Message-ID	<mailman.23.1446661467.16136.python-list@python.org>
In reply to	#98203

Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:

> On Wednesday 04 November 2015 13:55, Dan Sommers wrote:
>
> > Its very name indicates that its default mode most certainly is
> > regular expressions.
>
> I don't even know what grep stands for.

“grep” stands for ‘g/RE/p’.

The name is a mnemonic for a compound command in ‘ed’ [0], a text editor
that pre-dates extravagant luxuries like “presenting a full screen of
text at one time”.

In an ‘ed’ session, the user is obliged to keep mental track of the
current line in the text buffer, and even what that text contains during
the session.

Single-letter commands, with various terse parameters such as the range
of lines or some text to insert, are issued at a command prompt one
after another.

For these reasons, the manual page describes ‘ed’ as a “line-oriented
text editor”. Everything is done by specifying lines, blindly, to
commands which then operate on those lines.

The name of the ‘vi’ editor means “visual interface (to a text editor)”,
to proudly declare the innovation of a full screen of text that updates
content during the editing session. That was not available for users of
‘ed’.

A very common command to issue, then, is “actually show me the line of
text I just specified”; the ‘p’ (for “print”) command.

Another very common command is “find the text matching this pattern and
perform these commands on it”, which is ‘g’ (for “global”). The ‘g’
command addresses text matching a regular expression pattern, delimited
by slashes ‘/’.

So, for users with feeble human brains incapable of remembering
perfectly the entire content of the text while it changes and therefore
not always knowing exactly which lines they wanted to operate on without
seeing them all the time, a very frequent combination command is:

g/RE/p

meaning “find lines forward from here that match the regular expression
pattern “RE”, and do nothing to those lines except print them to
standard output”.

Wikipedia has useful pages on both ‘grep’ and ‘ed’
<URL:https://en.wikipedia.org/wiki/Grep>
<URL:https://en.wikipedia.org/wiki/Ed_%28text_editor%29>.

You can see a full specification of how the ‘ed’ interface is to behave
as part of the “Open Group Base Specifications Issue 7”, which is the
specification for Unix.

<URL:http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html>

See the manual for GNU ed which includes an example session to
appreciate just how far things have come.

<URL:https://www.gnu.org/software/ed/manual/ed_manual.html#Introduction-to-line-editing>

Of course, if you yearn for the days of minimalist purity, nothing beats
Ed, man! !man ed

[0] The standard text editor.
<URL:https://www.gnu.org/fun/jokes/ed-msg.txt>

--
\ “If you can't annoy somebody there is little point in writing.” |
`\ —Kingsley Amis |
_o__) |
Ben Finney

[toc] | [prev] | [next] | [standalone]

#98240 — Re: What does “grep” stand for?

From	Christian Gollwitzer <auriocus@gmx.de>
Date	2015-11-04 20:38 +0100
Subject	Re: What does “grep” stand for?
Message-ID	<n1dmn7$fnq$1@dont-email.me>
In reply to	#98236

Am 04.11.15 um 19:24 schrieb Ben Finney:
> The name is a mnemonic for a compound command in ‘ed’ [0], a text editor
> that pre-dates extravagant luxuries like “presenting a full screen of
> text at one time”.
>
>  [... lots of fun facts ...]

Here is another fun fact: The convincing UI of ed was actually so widely 
applied, that even Microsoft included a similar editor into MSDOS, 
called EDLIN. EDLIN, of course, was a bastardized version of ed that 
could do much less and also lacked regular expressions. Needless to say 
that the mighty "VIsual" editor was out 5 years before MSDOS shipped 
EDLIN as the only editor...

In contrast to ed, the stream editor "sed" is used multiple times avery 
day in a typical Unix session inside shell scripts to perform automated 
text processing tasks, including regex replacement.

	Christian

[toc] | [prev] | [next] | [standalone]

Page 2 of 6 — ← Prev page 1 [2] 3 4 5 6 Next page →

csiph-web

Regular expressions

Contents

#98297

#98300

#98299

#98225

#98231

#98251

#98259

#98264

#98281

#98290

#98364

#98369

#98370

#98267

#98282

#98233

#98287

#98289

#98236 — What does “grep” stand for? (was: Regular expressions)

#98240 — Re: What does “grep” stand for?